Nothing Special   »   [go: up one dir, main page]

CN112927236B - Clothing analysis method and system based on channel attention and self-supervision constraint - Google Patents

Clothing analysis method and system based on channel attention and self-supervision constraint Download PDF

Info

Publication number
CN112927236B
CN112927236B CN202110226332.1A CN202110226332A CN112927236B CN 112927236 B CN112927236 B CN 112927236B CN 202110226332 A CN202110226332 A CN 202110226332A CN 112927236 B CN112927236 B CN 112927236B
Authority
CN
China
Prior art keywords
data
output
output data
upsampled
decoded
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110226332.1A
Other languages
Chinese (zh)
Other versions
CN112927236A (en
Inventor
项欣光
左成婷
张冬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Science and Technology
Original Assignee
Nanjing University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Science and Technology filed Critical Nanjing University of Science and Technology
Priority to CN202110226332.1A priority Critical patent/CN112927236B/en
Publication of CN112927236A publication Critical patent/CN112927236A/en
Application granted granted Critical
Publication of CN112927236B publication Critical patent/CN112927236B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

The invention relates to a garment analysis method and system based on channel attention and self-supervision constraint, wherein the method comprises the following steps: acquiring a clothing picture data set, wherein the clothing picture data set comprises a clothing analysis chart; inputting each clothing picture in the clothing picture data set into a neural network, wherein the neural network comprises a channel attention module; based on the channel attention module, performing multiple times of image feature extraction on the input clothing pictures to obtain output feature maps of the clothing pictures; performing iterative training on the neural network based on strong supervision constraint and self-supervision constraint to obtain a trained neural network model; the strong supervision constraint is the strong supervision constraint of the clothing analysis graph corresponding to each clothing picture on the output characteristic graph, and the self-supervision constraint is the self-supervision constraint of the high-level output to the low-level output in each decoding output data in the neural network; and inputting the clothing picture to be analyzed into the trained neural network model, and outputting a clothing analysis picture. The invention reduces the parameter quantity and improves the garment analysis precision.

Description

Clothing analysis method and system based on channel attention and self-supervision constraint
Technical Field
The invention relates to the technical field of computer vision semantic segmentation, in particular to a garment analysis method and system based on channel attention and self-supervision constraint.
Background
Image semantic segmentation is one of core research problems in the field of computer vision, and aims to extract correct high-level semantic information from an image and correctly associate the information with each pixel point. Clothing analysis is an important and more detailed research direction in the field of semantic segmentation, and aims to automatically analyze an acquired image and analyze human clothing in the image so as to replace human eyes to judge and position clothing. The application prospect of clothing analysis is wide, the clothing analysis mainly relates to the fields of virtual clothing changing, pedestrian re-identification, clothing recommendation, clothing retrieval and the like, but the clothing analysis is a new research direction in recent years, and related technologies are few; meanwhile, the deep learning technology is continuously developed, so an advanced garment analysis method following the current development trend is urgently needed.
The existing garment analysis methods are mainly divided into two major categories, one is based on multi-stage methods, and the other is based on end-to-end methods. The multi-stage method mostly takes human body posture estimation and clothing classification as prior knowledge to be added into a clothing analysis network, and takes a conditional random field as a subsequent network auxiliary processing means. The end-to-end method is to directly input the original fashion clothing picture into the clothing analysis network and directly output an analyzed clothing prediction picture by the network. But these garment resolution networks either introduce too much a priori knowledge and subsequent assistance; or noise exists in the introduced prior knowledge, so that the extracted features are not optimal; or require a bulky network architecture. There are also some networks that have been improved from the point of view of simplifying the structure, but the split performance still remains to be improved.
Disclosure of Invention
The invention aims to provide a garment analysis method and system based on channel attention and self-supervision constraint, and garment analysis accuracy is improved.
In order to achieve the purpose, the invention provides the following scheme:
a method of garment parsing based on channel attention and self-supervision constraints, the method comprising:
acquiring a clothing picture data set, wherein the clothing picture data set comprises clothing analytic graphs corresponding to all clothing pictures;
inputting each clothing picture in the clothing picture data set into a neural network, wherein the neural network comprises a channel attention module;
based on the channel attention module, performing multiple times of image feature extraction on the input clothing pictures to obtain output feature maps of the clothing pictures;
performing iterative training on the neural network based on strong supervision constraint and self-supervision constraint to obtain a trained neural network model; the strong supervision constraint is the strong supervision constraint of the clothing analysis graph corresponding to each clothing picture on the output characteristic graph, and the self-supervision constraint is the self-supervision constraint of high-level output on low-level output in each decoding output data in the neural network;
and inputting the clothing picture to be analyzed into the trained neural network model, and outputting a clothing analysis picture.
Optionally, before inputting each clothing picture in the clothing picture data set into the neural network, the method further includes:
and carrying out normalization processing on each clothing picture in the clothing picture data set.
Optionally, the image feature extraction specifically includes:
different convolution operations are carried out on the clothes pictures input into the neural network to obtain first data;
carrying out average pooling on the first data to obtain second data;
performing feature extraction on the second data by using set convolution to obtain first feature data;
performing maximum pooling on the first data to obtain third data;
performing feature extraction on the third data by using the set convolution to obtain second feature data;
adding the first characteristic data and the second characteristic data and inputting the added first characteristic data and the added second characteristic data into a Sigmoid function to obtain fourth data;
and multiplying the fourth data by the first data to obtain output data of the channel attention module.
Optionally, the performing, based on the channel attention module, multiple times of image feature extraction on the input clothing pictures to obtain an output feature map of each clothing picture specifically includes:
repeating the image feature extraction for 7 times to obtain first coded output data, wherein I ═ psiencoder1(A)=ρ7654321(A) )))) where I represents the first encoded output data, ψ)encoder1Representing the acquisition process, p, of the first encoded output datai′Represents the i 'th repetition, i' belongs to {1, 2, 3, 4, 5, 6, 7}, and A represents the clothing picture in the clothing picture data set;
down-sampling the first encoded output data to obtain first down-sampled encoded output data;
repeating the image feature extraction for 6 times to obtain second coded output data, J ═ psiencoder2(I′)=ρ654321(I'))))) where J represents the second encoded output data, ψencoder2Representing the acquisition process of said second encoded output data, I' representing the first downsampled encoded output data;
down-sampling the second coded output data to obtain second down-sampled coded output data;
repeating the image feature extraction for 5 times to obtain third coded output data, wherein K is psiencoder3(J′)=ρ54321(J'))))) where K represents the third encoded output data, ψencoder3Representing the acquisition process of said third encoded output data, J' representing a second downsampled encoded output data;
down-sampling the third encoded output data to obtain third down-sampled encoded output data;
repeating the image feature extraction 4 times to obtain fourth coded output data, wherein L ═ psiencoder4(K′)=ρ4321(K')))) where L represents the fourth encoded output data, ψencoder4Representing the acquisition process of the fourth encoded output data, and K' represents third downsampled encoded output data;
down-sampling the fourth encoded output data to obtain fourth down-sampled encoded output data;
repeating the image feature extraction for 4 times to obtain fifth coded output data, wherein M is psiencoder5(L′)=ρ4321(L')))) where M represents fifth encoded output data, ψencoder5Represents the acquisition process of the fifth encoded output data, and L' represents the fourth downsampled encoded output data;
performing downsampling on the fifth coded output data to obtain fifth downsampled coded output data;
repeating the image feature extraction for 4 times to obtain sixth coded output data N ═ psiencoder6(M′)=ρ4321(M')))) where N represents sixth encoded output data, ψencoder6Representing the acquisition process of the sixth encoded output data, M' representing a fifth downsampled encoded output data;
performing upsampling on the sixth coded output data N to obtain sixth upsampled coded output data N';
repeating the image feature extraction 4 times to obtain fifth decoded output data, O ═ ψdecoder5(N′,M)=ρ4321(concat (N', M)))) where O represents the fifth decoded output data, ψdecoder5Represents the acquisition process of the fifth decoded output data, concat () represents channel fusion, and N' represents the sixth upsampled encoded output data;
performing upsampling on the fifth decoded output data to obtain fifth upsampled coded output data;
repeating the image feature extraction 4 times to obtain fourth decoded output data, P ═ ψdecoder4(O′,L)=ρ4321(concat (O', L)))) where P represents the fourth decoded output data, ψdecoder4Represents the acquisition process of the fourth decoded output data, O' represents the fifth up-sampled encoded output data;
up-sampling the fourth decoded output data to obtain fourth up-sampled encoded output data;
repeating the image feature extraction 5 times to obtain third decoded output data, Q ═ ψdecoder3(P′,K)=ρ54321(concat (P', K)))) where Q represents the third decoded output data, ψdecoder3Represents the acquisition process of the third decoded output data, P' represents the fourth up-sampled encoded output data;
up-sampling the third decoded output data to obtain third up-sampled encoded output data;
repeating the image feature extraction 6 times to obtain second decoded output data, wherein R & ltpsi >decoder2(Q′,J)=ρ654321(concat (Q', J))))) where R represents the second decoded output data, ψdecoder2Representing the acquisition of said second decoded output data, Q' representing third up-sampled encoded output data;
upsampling the second decoded output data to obtain second upsampled coded output data;
repeating the image feature extraction 7 times to obtain first decoded output data, wherein S ═ ψdecoder1(R′,I)=ρ7654321(concat (R', I)))))) where S represents the first decoded output data, ψdecoder1Representing the acquisition process of said first decoded output data, R' representing second up-sampled encoded output data;
upsampling sixth coded output data, fifth decoded output data, fourth decoded output data, third decoded output data, second decoded output data and first decoded output data to the same size as the clothing picture, and respectively counting sixth coded output upsampling data, fifth decoded output upsampling data, fourth decoded output upsampling data, third decoded output upsampling data, second decoded output upsampling data and first decoded output upsampling data;
fusing sixth encoding output up-sampled data, fifth decoding output up-sampled data, fourth decoding output up-sampled data, third decoding output up-sampled data, second decoding output up-sampled data and first decoding output up-sampled data by adopting a channel to obtain fused data;
and performing convolution operation on the fusion data by adopting the set convolution to obtain an output characteristic diagram of the clothing picture.
Optionally, the iteratively training the neural network based on the strong supervision constraint and the self-supervision constraint to obtain a trained neural network model specifically includes:
respectively calculating the clothing analysis graph corresponding to the clothing picture and the cross entropy losses of the output characteristic graph of the clothing picture, the first decoding output up-sampling data, the second decoding output up-sampling data, the third decoding output up-sampling data, the fourth decoding output up-sampling data, the fifth decoding output up-sampling data and the sixth encoding output up-sampling data to obtain a first group of cross entropy losses;
respectively calculating cross entropy losses of the output feature map of the clothing picture and the first decoding output up-sampled data, the second decoding output up-sampled data, the third decoding output up-sampled data, the fourth decoding output up-sampled data, the fifth decoding output up-sampled data and the sixth encoding output up-sampled data to obtain a second group of cross entropy losses;
calculating cross entropy losses of the first decoded output upsampled data and the second decoded output upsampled data, the third decoded output upsampled data, the fourth decoded output upsampled data, the fifth decoded output upsampled data and the sixth encoded output upsampled data, respectively, to obtain a third set of cross entropy losses;
calculating cross entropy losses of the second decoded output upsampled data and the third decoded output upsampled data, the fourth decoded output upsampled data, the fifth decoded output upsampled data and the sixth encoded output upsampled data, respectively, to obtain a fourth set of cross entropy losses;
calculating cross entropy losses of the third decoded output upsampled data and the fourth decoded output upsampled data, the fifth decoded output upsampled data and the sixth encoded output upsampled data, respectively, to obtain a fifth set of cross entropy losses;
calculating cross entropy losses of the fourth decoded output upsampled data, the fifth decoded output upsampled data and the sixth encoded output upsampled data, respectively, to obtain a sixth set of cross entropy losses;
respectively calculating cross entropy losses of the fifth decoded output upsampled data and the sixth encoded output upsampled data to obtain a seventh group of cross entropy losses;
weighting and adding the first group of cross entropy losses, the second group of cross entropy losses, the third group of cross entropy losses, the fourth group of cross entropy losses, the fifth group of cross entropy losses, the sixth group of cross entropy losses and the seventh group of cross entropy losses to obtain a loss function;
and performing iterative training on the neural network by adopting a random gradient descent algorithm based on the loss function.
The invention also discloses a clothing analysis system based on channel attention and self-supervision constraint, which comprises:
the system comprises a data set acquisition module, a data processing module and a data processing module, wherein the data set acquisition module is used for acquiring a clothing picture data set, and the clothing picture data set comprises clothing analytic graphs corresponding to all clothing pictures;
the data input module is used for inputting each clothing picture in the clothing picture data set into a neural network, and the neural network comprises a channel attention module;
the feature extraction module is used for extracting image features of the input clothing pictures for multiple times based on the channel attention module to obtain output feature maps of the clothing pictures;
the neural network training module is used for carrying out iterative training on the neural network based on strong supervision constraint and self-supervision constraint to obtain a trained neural network model; the strong supervision constraint is the strong supervision constraint of the clothing analysis graph corresponding to each clothing picture on the output characteristic graph, and the self-supervision constraint is the self-supervision constraint of high-level output on low-level output in each decoding output data in the neural network;
and the neural network model application module is used for inputting the clothing picture to be analyzed into the trained neural network model and outputting a clothing analysis picture.
Optionally, the system further comprises:
and the preprocessing module is used for carrying out normalization processing on each clothing picture in the clothing picture data set.
Optionally, the image feature extraction specifically includes:
different convolution operations are carried out on the clothes pictures input into the neural network to obtain first data;
carrying out average pooling on the first data to obtain second data;
performing feature extraction on the second data by using set convolution to obtain first feature data;
performing maximum pooling on the first data to obtain third data;
performing feature extraction on the third data by using the set convolution to obtain second feature data;
adding the first characteristic data and the second characteristic data and inputting the added first characteristic data and the added second characteristic data into a Sigmoid function to obtain fourth data;
and multiplying the fourth data by the first data to obtain output data of the channel attention module.
Optionally, the feature extraction module specifically includes:
a first encoding unit for repeating the image feature extraction 7 times to obtain first encoded output data, I ═ ψencoder1(A)=ρ7654321(A) )))) where I represents the first encoded output data, ψ)encoder1Representing the acquisition process, p, of the first encoded output datai′Represents the i 'th repetition, i' belongs to {1, 2, 3, 4, 5, 6, 7}, and A represents the clothing picture in the clothing picture data set;
the first down-sampling unit is used for down-sampling the first coding output data to obtain first down-sampling coding output data;
a second encoding unit for repeating the image feature extraction 6 times to obtain second encoded output data, J ═ ψencoder2(I′)=ρ654321(I'))))) where J represents the second encoded output data, ψencoder2Representing the acquisition process of said second encoded output data, I' representing the first downsampled encoded output data;
the second downsampling unit is used for downsampling the second coding output data to obtain second downsampling coding output data;
a third encoding unit for repeating the image feature extraction 5 times to obtain third encoded output data, K ═ ψencoder3(J′)=ρ54321(J'))))) where K represents the third encoded output data, ψencoder3Representing the acquisition process of said third encoded output data, J' representing a second downsampled encoded output data;
a third downsampling unit, configured to downsample the third encoded output data to obtain third downsampled encoded output data;
a fourth encoding unit for repeating the image feature extraction 4 times to obtain fourth encoded output data, L ═ ψencoder4(K′)=ρ4321(K')))) where L represents the fourth encoded output data, ψencoder4Representing the acquisition process of the fourth encoded output data, and K' represents third downsampled encoded output data;
a fourth downsampling unit, configured to downsample the fourth encoded output data to obtain fourth downsampled encoded output data;
a fifth encoding unit for repeating the image feature extraction 4 times to obtain fifth encoded output data, where M ═ ψencoder5(L′)=ρ4321(L')))) where M represents fifth encoded output data, ψencoder5Represents the acquisition process of the fifth encoded output data, and L' represents the fourth downsampled encoded output data;
a fifth downsampling unit, configured to downsample the fifth encoded output data to obtain fifth downsampled encoded output data;
a sixth encoding unit for repeating the image feature extraction 4 times to obtain sixth encoded output data N ═ ψencoder6(M′)=ρ4321(M')))) where N represents sixth encoded output data, ψencoder6Representing the acquisition process of the sixth encoded output data, M' representing a fifth downsampled encoded output data;
the first up-sampling unit is used for up-sampling the sixth coded output data to obtain sixth up-sampled coded output data;
a fifth decoding unit for repeating the image feature extraction 4 times to obtain fifth decoded output data, O ═ ψdecoder5(N′,M)=ρ4321(concat (N', M)))) where O represents the fifth decoded output data, ψdecoder5Represents the acquisition process of the fifth decoded output data, concat () represents channel fusion, and N' represents the sixth upsampled encoded output data;
a second upsampling unit, configured to upsample the fifth decoded output data to obtain fifth upsampled encoded output data;
a fourth decoding unit for repeating the image feature extraction 4 times to obtain fourth decoded output data, where P ═ ψdecoder4(O′,L)=ρ4321(concat (O', L)))) where P represents the fourth decoded output data, ψdecoder4Represents the acquisition process of the fourth decoded output data, O' represents the fifth up-sampled encoded output data;
a third upsampling unit, configured to upsample the fourth decoded output data to obtain fourth upsampled encoded output data;
a third decoding unit for repeating the image feature extraction 5 times to obtain third decoded output data, Q ═ ψdecoder3(P′,K)=ρ54321(concat (P', K)))) where Q represents the third decoded output data, ψdecoder3Represents the acquisition process of the third decoded output data, P' represents the fourth up-sampled encoded output data;
a fourth upsampling unit, configured to upsample the third decoded output data to obtain third upsampled encoded output data;
a second decoding unit for repeating the image feature extraction 6 times to obtain second decoded output data, wherein R phidecoder2(Q′,J)=ρ654321(concat (Q', J))))) where R represents the second decoded output data, ψdecoder2Representing the acquisition of said second decoded output data, Q' representing third up-sampled encoded output data;
a fifth upsampling unit, configured to upsample the second decoded output data to obtain second upsampled encoded output data;
a first decoding unit for repeating the image feature extraction 7 times to obtain first decoded output data, wherein S ═ ψdecoder1(R′,I)=ρ7654321(concat (R', I)))))) where S represents the first decoded output data, ψdecoder1Representing the acquisition process of said first decoded output data, R' representing second up-sampled encoded output data;
a sixth upsampling unit, configured to upsample sixth encoded output data, fifth decoded output data, fourth decoded output data, third decoded output data, second decoded output data, and first decoded output data to the same size as the clothing picture, and count the sixth encoded output upsampling data, the fifth decoded output upsampling data, the fourth decoded output upsampling data, the third decoded output upsampling data, the second decoded output upsampling data, and the first decoded output upsampling data, respectively;
the data fusion unit is used for fusing sixth encoding output up-sampling data, fifth decoding output up-sampling data, fourth decoding output up-sampling data, third decoding output up-sampling data, second decoding output up-sampling data and first decoding output up-sampling data by adopting a channel to obtain fused data;
and the characteristic diagram output unit is used for carrying out convolution operation on the fusion data by adopting the set convolution to obtain an output characteristic diagram of the clothing image.
Optionally, the neural network training module specifically includes:
a first group of cross entropy loss obtaining units, configured to respectively calculate cross entropy losses of a clothing analysis graph corresponding to the clothing picture and an output feature graph of the clothing picture, the first decoded output upsampled data, the second decoded output upsampled data, the third decoded output upsampled data, the fourth decoded output upsampled data, the fifth decoded output upsampled data, and the sixth encoded output upsampled data, so as to obtain a first group of cross entropy losses;
a second group of cross entropy loss obtaining units, configured to calculate cross entropy losses between the output feature map of the clothing picture and the first decoded output upsampled data, the second decoded output upsampled data, the third decoded output upsampled data, the fourth decoded output upsampled data, the fifth decoded output upsampled data, and the sixth encoded output upsampled data, respectively, to obtain a second group of cross entropy losses;
a third group of cross entropy loss obtaining units, configured to calculate cross entropy losses of the first decoded output upsampled data and the second decoded output upsampled data, the third coded output upsampled data, the fourth decoded output upsampled data, the fifth decoded output upsampled data, and the sixth coded output upsampled data, respectively, to obtain a third group of cross entropy losses;
a fourth group of cross entropy loss obtaining units, configured to calculate cross entropy losses of the second decoded output upsampled data and the third decoded output upsampled data, the fourth decoded output upsampled data, the fifth decoded output upsampled data, and the sixth encoded output upsampled data, respectively, to obtain a fourth group of cross entropy losses;
a fifth group of cross entropy loss obtaining units, configured to calculate cross entropy losses of the third decoded output upsampled data and the fourth decoded output upsampled data, the fifth decoded output upsampled data, and the sixth encoded output upsampled data, respectively, to obtain a fifth group of cross entropy losses;
a sixth group of cross entropy loss obtaining units, configured to calculate cross entropy losses of the fourth decoded output upsampled data, the fifth decoded output upsampled data, and the sixth encoded output upsampled data, respectively, to obtain a sixth group of cross entropy losses;
a seventh group of cross entropy loss obtaining units, configured to calculate cross entropy losses of the fifth decoded output upsampled data and the sixth encoded output upsampled data, respectively, to obtain a seventh group of cross entropy losses;
a loss function obtaining unit, configured to weight and add the first group of cross entropy losses, the second group of cross entropy losses, the third group of cross entropy losses, the fourth group of cross entropy losses, the fifth group of cross entropy losses, the sixth group of cross entropy losses, and the seventh group of cross entropy losses to obtain a loss function;
and the training unit is used for carrying out iterative training on the neural network by adopting a random gradient descent algorithm based on the loss function.
According to the specific embodiment provided by the invention, the invention discloses the following technical effects:
the invention discloses a garment analysis method and system based on channel attention and self-supervision constraint.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without inventive exercise.
FIG. 1 is a schematic flow chart of a garment analysis method based on channel attention and self-supervision constraint according to the present invention;
FIG. 2 is a detailed flowchart of a garment parsing method based on channel attention and self-supervision constraint according to the present invention;
FIG. 3 is a schematic diagram of a neural network according to the present invention;
fig. 4 is a schematic structural diagram of a garment analysis system based on channel attention and self-supervision constraint according to the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The invention aims to provide a garment analysis method and system based on channel attention and self-supervision constraint, and garment analysis accuracy is improved.
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in further detail below.
Fig. 1 is a schematic flow chart of a garment analysis method based on channel attention and self-supervision constraint according to the present invention, and as shown in fig. 1, the garment analysis method based on channel attention and self-supervision constraint includes:
step 101: and acquiring a clothing picture data set, wherein the clothing picture data set comprises clothing analytic graphs corresponding to the clothing pictures.
Step 102: and inputting each clothing picture in the clothing picture data set into a neural network, wherein the neural network comprises a channel attention module.
Before each clothing picture in the clothing picture data set is input into the neural network, the method further comprises the following steps:
and carrying out normalization processing on each clothing picture in the clothing picture data set.
Step 103: and performing multiple times of image feature extraction on the input clothing pictures based on the channel attention module to obtain output feature maps of the clothing pictures.
The image feature extraction in step 103 specifically includes:
and carrying out different convolution operations on the clothes pictures input into the neural network to obtain first data B.
And carrying out average pooling on the first data B to obtain second data C.
And performing feature extraction on the second data C by using set convolution to obtain first feature data D.
And performing maximum pooling on the first data B to obtain third data E.
And performing feature extraction on the third data E by using the set convolution to obtain second feature data F.
And adding the first characteristic data D and the second characteristic data F and inputting the added first characteristic data D and the added second characteristic data F into a Sigmoid function to obtain fourth data H.
Multiplying the fourth data H and the first data B to obtain output data H of the channel attention moduleout
Step 103, specifically, the method further comprises:
repeating the image feature extraction for 7 times to obtain first coded output data, wherein I ═ psiencoder1(A)=ρ7654321(A) )))) where I represents the first encoded output data, ψ)encoder1Representing the acquisition process, p, of the first encoded output datai′Represents the i 'th repetition, i' is e {1, 2, 3, 4, 5, 6, 7}, and A represents the clothing picture in the clothing picture data set.
And performing downsampling on the first coded output data to obtain first downsampled coded output data.
Repeating the image feature extraction for 6 times to obtain second coded output data, J ═ psiencoder2(I′)=ρ654321(I'))))) where J represents the second encoded output data, ψencoder2Representing the acquisition of said second encoded output data and I' representing the first downsampled encoded output data.
And performing downsampling on the second coded output data to obtain second downsampled coded output data.
Repeating the image feature extraction for 5 times to obtain third coded output data, wherein K is psiencoder3(J′)=ρ54321(J'))))) where K represents the third encoded output data, ψencoder3Representing the acquisition of said third encoded output data, and J' represents the second downsampled encoded output data.
And performing downsampling on the third coded output data to obtain third downsampled coded output data.
Repeating the image feature extraction 4 times to obtain fourth coded output data, wherein L ═ psiencoder4(K′)=ρ4321(K')))) where L represents the fourth encoded output data, ψencoder4Representing the acquisition of said fourth encoded output dataAnd taking process, wherein K' represents the third downsampled coding output data.
And performing down-sampling on the fourth coded output data to obtain fourth down-sampled coded output data.
Repeating the image feature extraction for 4 times to obtain fifth coded output data, wherein M is psiencoder5(L′)=ρ4321(L')))) where M represents fifth encoded output data, ψencoder5Representing the acquisition of said fifth encoded output data and L' representing the fourth down-sampled encoded output data.
And performing downsampling on the fifth coded output data to obtain fifth downsampled coded output data.
Repeating the image feature extraction for 4 times to obtain sixth coded output data N ═ psiencoder6(M′)=ρ4321(M')))) where N represents sixth encoded output data, ψencoder6Representing the acquisition of said sixth encoded output data and M' representing the fifth downsampled encoded output data.
And performing upsampling on the sixth coded output data to obtain sixth upsampled coded output data.
Repeating the image feature extraction 4 times to obtain fifth decoded output data, O ═ ψdecoder5(N′,M)=ρ4321(concat (N', M)))) where O represents the fifth decoded output data, ψdecoder5Represents the acquisition process of the fifth decoded output data, concat () represents channel fusion, and N' represents the sixth upsampled encoded output data.
And performing upsampling on the fifth decoding output data to obtain fifth upsampling coding output data.
Repeating the image feature extraction 4 times to obtain fourth decoded output data, P ═ ψdecoder4(O′,L)=ρ4321(concat (O', L)))) where P represents the fourth decoded output data, ψdecoder4Representing the acquisition of said fourth decoded output dataTaking process, O' represents the fifth up-sampled encoded output data.
And performing upsampling on the fourth decoding output data to obtain fourth upsampling coding output data.
Repeating the image feature extraction 5 times to obtain third decoded output data, Q ═ ψdecoder3(P′,K)=ρ54321(concat (P', K)))) where Q represents the third decoded output data, ψdecoder3Representing the acquisition of said third decoded output data and P' representing the fourth up-sampled encoded output data.
And performing upsampling on the third decoded output data to obtain third upsampled coded output data.
Repeating the image feature extraction 6 times to obtain second decoded output data, wherein R & ltpsi >decoder2(Q′,J)=ρ654321(concat (Q', J))))) wherein R represents the second decoded output data, ψdecoder2Representing the acquisition of said second decoded output data, Q' represents the third up-sampled encoded output data.
And performing upsampling on the second decoding output data to obtain second upsampling coding output data.
Repeating the image feature extraction 7 times to obtain first decoded output data, wherein S ═ ψdecoder1(R′,I)=ρ7654321(concat (R', I)))))) where S represents the first decoded output data, ψdecoder1Representing the acquisition of said first decoded output data and R' representing the second up-sampled encoded output data.
The sixth coded output data N, the fifth decoded output data O, the fourth decoded output data P, the third decoded output data Q, the second decoded output data R and the first decoded output data S are all up-sampled to the same size as the clothing picture a, and are respectively the sixth coded output up-sampled data N "(O in fig. 3)6) Fifth decoded output upsamplingSample data O "(O in FIG. 3)5) The fourth decoded output upsampled data P "(O in fig. 3)4) And the third decoded output upsampled data Q "(O in fig. 3)3) Second decoded output upsampled data R "(O in fig. 3)2) And first decoded output upsampled data S' (O in FIG. 3)1)。
Fusing the sixth encoded output upsampled data N ", the fifth decoded output upsampled data O", the fourth decoded output upsampled data P ", the third decoded output upsampled data Q", the second decoded output upsampled data R "and the first decoded output upsampled data S" by using channels to obtain fused data T (O in fig. 3)0)。
Using said set convolution
Figure GDA0003224781920000141
And performing convolution operation on the fusion data to obtain an output characteristic diagram Y of the clothing picture.
Step 104: performing iterative training on the neural network based on strong supervision constraint and self-supervision constraint to obtain a trained neural network model; the strong supervision constraint is the strong supervision constraint of the clothing analysis graph corresponding to each clothing picture on the output characteristic graph, and the self-supervision constraint is the self-supervision constraint of the high-level output to the low-level output in each decoding output data in the neural network.
Step 104, specifically comprising:
respectively calculating the cross entropy losses of the clothing analysis graph A degree corresponding to the clothing picture, the output characteristic graph Y of the clothing picture, the first decoding output up-sampling data S ', the second decoding output up-sampling data R', the third decoding output up-sampling data Q ', the fourth decoding output up-sampling data P', the fifth decoding output up-sampling data O 'and the sixth coding output up-sampling data N', and obtaining a first group of cross entropy losses, wherein the first group of cross entropy losses comprise a loss0,loss1,loss2,loss3,loss4,loss5,loss6
Respectively calculating the output characteristics of the clothing picturesGraph Y obtains a second set of cross-entropy losses with the cross-entropy losses of the first decoded output upsampled data S ', the second decoded output upsampled data R', the third decoded output upsampled data Q ', the fourth decoded output upsampled data P', the fifth decoded output upsampled data O ', and the sixth encoded output upsampled data N'. The second set of cross-entropy losses includes loss01,loss02,loss03,loss04,loss05,loss06
Cross entropy losses of the first decoded output upsampled data S "and the second decoded output upsampled data R", the third decoded output upsampled data Q ", the fourth decoded output upsampled data P", the fifth decoded output upsampled data O ", and the sixth encoded output upsampled data N" are calculated, respectively, to obtain a third set of cross entropy losses. The third set of cross-entropy penalties includes loss12,loss13,loss14,loss15,loss16
Cross-entropy losses of the second decoded output upsampled data R "and the third decoded output upsampled data Q", the fourth decoded output upsampled data P ", the fifth decoded output upsampled data O", and the sixth encoded output upsampled data N "are calculated, respectively, to obtain a fourth set of cross-entropy losses. The fourth set of cross-entropy losses includes loss23,loss24,loss25,loss26
Cross-entropy losses of the third decoded output upsampled data Q "and the fourth decoded output upsampled data P", the fifth decoded output upsampled data O ", and the sixth encoded output upsampled data N" are calculated, respectively, to obtain a fifth set of cross-entropy losses. The fifth set of cross-entropy losses includes loss34,loss35,loss36
Cross entropy losses of the fourth decoded output upsampled data P ' and the fifth decoded output upsampled data O ' and the sixth encoded output upsampled data N ' are calculated, respectively, to obtain a sixth set of cross entropy losses. Crossing in the sixth groupFork entropy losses include loss45,loss46
And respectively calculating the cross entropy loss of the fifth decoding output up-sampling data O 'and the sixth coding output up-sampling data N' to obtain a seventh group of cross entropy losses. The seventh set of cross-entropy losses includes loss56
Weighting the first group of cross entropy loss, the second group of cross entropy loss, the third group of cross entropy loss, the fourth group of cross entropy loss, the fifth group of cross entropy loss, the sixth group of cross entropy loss and the seventh group of cross entropy loss differently and then adding the weighted values to obtain a loss function lossfinal
lossfinal=ωg·[loss0,loss1,loss2,loss3,loss4,loss5,loss6]T+ω0·[loss01,loss02,loss03,loss04,loss05,loss06]T+ω1·[loss12,loss13,loss14,loss15,loss16]T2·[loss23,loss24,loss25,loss26]T3·[loss34,loss35,loss36]T4·[loss45,loss46]T5·loss56. Wherein [. ]]TRepresents a transpose of a matrix; omegag、ω0、ω1、ω2、ω3、ω4Respectively representing the weight coefficient matrixes; omegag=[ωg0,ωg1,ωg2,ωg3,ωg4,ωg5,ωg6,ωg7];ω0=[ω01,ω02,ω03,ω04,ω05,ω06];ω1=[ω12,ω13,ω14,ω15,ω16];ω2=[ω23,ω24,ω25,ω26];ω3=[ω34,ω35,ω36];ω4=[ω45,ω46];ωijRepresenting the value, ω, in the weight matrix5Representing the weight coefficients.
And performing iterative training on the neural network by adopting a random gradient descent algorithm based on the loss function.
Step 105: and inputting the clothing picture to be analyzed into the trained neural network model, and outputting a clothing analysis picture.
The garment analysis method based on channel attention and self-supervision constraint is specifically refined into the following 42 steps, as shown in fig. 2, and specifically comprises the following steps:
step 1: a garment data set (garment picture data set) with labels is collected, wherein the labels refer to garment analytic graphs corresponding to all garment pictures in the data set, 70% of the pictures are randomly selected to serve as a training set, and the rest pictures serve as a test set.
Step 2: normalization processes are to normalize an image pixel value to [0, 1 ] for an image that is to be input into the network]Get training data A (without label), test data Atest(without notation). The step is not carried out on the labeled data in the data set, and the training labeled data A and the testing labeled data A obtained by Stepl are directly used in the subsequent stepstest
Step 3: conv, two different convolution operations were performed on the training data A without labels preprocessed in Step21(Conv2(A) Get data B, Conv)1() And Conv2() Representing two different convolution operations, respectively.
Step 4: the data B obtained in Step3 is transmitted to the channel attention module, and the data C is obtained by performing average pooling.
Step 5: by convolution
Figure GDA0003224781920000161
And performing feature extraction on the data C obtained in Step4 to obtain data D.
Step 6: data E was obtained by maximizing pooling of data B obtained in Step 3.
Step 7: by convolution
Figure GDA0003224781920000162
And (4) performing feature extraction operation on the data E obtained in Step6 to obtain data F.
Step 8: data G is obtained by adding data D obtained at Step5 and data F obtained at Step 7.
Step 9: sending the data G obtained in Step8 into a Sigmoid function to obtain data H, and multiplying the data H by the output data B of Step3 to obtain the output data H of the channel attention moduleout
Step 10: repeating the steps 3 to 9 7 times to obtain the output data I of the coding block 1encoder1(A)=ρ7654321(A) )))) of a plurality of different types of wafers, wherein ψ isencoder1Denotes the entire process of the coding block 1, pi′Represents the i 'th iteration, i' e {1, 2, 3, 4, 5, 6, 7 }.
Step 11: and downsampling the data I obtained at Step10 to obtain data I'.
Step 12: repeating the steps 3 to 9 6 times to obtain the output data J psi of the coding block 2encoder2(I′)=ρ654321(I')))))) where ψ isencoder2The entire process of encoding block 2 is shown.
Step 13: data J obtained at Step12 is down-sampled to obtain data J.
Step 14: repeating the steps 3 to 9 5 times to obtain the output data K psi of the coding block 3encoder3(J′)=ρ54321(J')))) in which ψ is providedencoder3The entire process of the coding block 3 is shown.
Step 15: and 5, downsampling the data K obtained at Step14 to obtain the data K.
Step 16: repeating steps 3 to 9 4 times to obtain the output data L psi of the coding block 4encoder4(K′)=ρ4321(K'))) where ψencoder4Representing a coding block 4The whole process.
Step 17: data L obtained at Step16 is down-sampled to obtain data L.
Step 18: repeating the steps 3 to 9 4 times to obtain the output data M psi of the coding block 5encoder5(L′)=ρ4321(L'))) where ψencoder5The entire process of the coding block 5 is shown.
Step 19: and downsampling the data M obtained at Step18 to obtain data M'.
Step 20: repeating the steps 3 to 9 4 times to obtain the output data N psi of the coding block 6encoder6(M′)=ρ4321(M'))) where ψencoder6The entire process of the coding block 6 is shown.
Step 21: and upsampling the data N obtained at Step20 to obtain data N'.
Step 22: repeating steps 3 to 9 4 times to obtain the output data O ═ ψ of the decoded block 5decoder5(N′,M)=ρ4321(concat (N', M)))) where ψdecoder5Representing the entire process of decoding block 5, concat () representing channel fusion, and M being the data from Step 18.
Step 23: data O obtained at Step22 is up-sampled to obtain data O'.
Step 24: repeating steps 3 to 9 4 times to obtain the output data P ψ of the decoding block 4decoder4(O′,L)=ρ4321(concat (O', L)))) where ψdecoder4Representing the entire process of decoding block 4, concat () representing channel fusion, and L being the data from Step 16.
Step 25: the data P obtained at Step24 is up-sampled to data P'.
Step 26: repeating steps 3 to 9 5 times to obtain the output data Q ψ of the decoding block 3decoder3(P′,K)=ρ54321(concat (P', K)))) where ψdecoder3Indicating the entire processing of the decoded block 3Routine, concat () represents channel fusion, K is the data from Ste 14.
Step 27: data Q from Step26 is up-sampled to data Q'.
Step 28: repeating steps 3 to 9 6 times to obtain the output data R psi of the decoding block 2decoder2(Q′,J)=ρ654321(concat (Q', J))))) where ψdecoder2Represents the entire process of decoding block 2, concat () represents channel fusion, and J is the data resulting from Step 12.
Step 29: and upsampling the data R obtained at Step27 to obtain data R'.
Step 30: repeating the steps 3 to 9 7 times to obtain the output data S phi of the decoding block 1decoder1(R′,I)=ρ7654321(concat (R', I)))))) wherein ψdecoder1Represents the entire process of decoding block 1, concat () represents channel fusion, and I is the data resulting from Step 10.
Step 31: data N, O, P, Q, R, S obtained at steps 20, 22, 24, 26, 28, and 30 is up-sampled to the same size as data a obtained at Step2 to obtain data N ", O", P ", Q", R ", and S", and data T ═ concat (N, O, P, Q, R, and S) is obtained using channel fusion.
Step 32: and (4) convolving the data T obtained at Step31 to obtain output data Y of the whole network.
Step 33: respectively calculating the cross entropy loss of the labeled data A ' of the data A acquired at Step2 and the data S ', R ', Q ', P ', O ', N ' obtained from the data Y, Step31 obtained at Step32
Figure GDA0003224781920000181
Figure GDA0003224781920000182
Obtain loss0,loss1,loss2,loss3,loss4,loss5,loss6. Wherein A ish×AwRepresenting the number of all pixel points of the data A with the height of h and the width of w collected in Step 2; label represents the total number of categories in the dataset; y isijeA represents whether the ith pixel point of the data A is of the jth class or not;
Figure GDA0003224781920000183
and the probability that the ith pixel point of the data A is predicted to be the jth class is represented. For example: for the
Figure GDA0003224781920000184
For the
Figure GDA0003224781920000185
Step 34: the data Y obtained at Step32 are respectively used for calculating the cross entropy loss f of the data S ', R', Q ', P', O ', N' obtained at Step31lossObtain loss01,loss02,loss03,loss04,loss05,loss06
Step 35: the data S 'obtained at Step31 are respectively used for calculating the cross entropy loss f of the data R', Q ', P', O ', N' obtained at Step31lossObtain loss12,loss13,loss14,loss15,loss16
Step 36: the data R ' obtained at Step31 are used for respectively calculating the cross entropy loss f of the data Q ', P ', O ', N ' obtained at Step31lossObtain loss23,loss24,loss25,loss26
Step 37: respectively calculating cross entropy losses f of the data Q 'obtained at Step31 to the data P', O ', N' obtained at Step31lossObtain loss34,loss35,loss36
Step 38: the data P ' obtained at Step31 are respectively used for calculating the cross entropy loss f of the data O ' and N ' obtained at Step31lossObtain loss45,loss46
Step 39: obtained from Step31Data O 'Cross entropy loss function f is calculated separately for data N' obtained at Step31lossObtain loss56
Step 40: all the loss functions obtained from steps 33-39 are combined by applying different weights to obtain the final lossfinal
lossfinal=ωg·[loss0,loss1,loss2,loss3,loss4,loss5,loss6]T0·[loss01,loss02,loss03,loss04,loss05,loss06]T1·[loss12,loss13,loss14,loss15,loss16]T2·[loss23,loss24,loss25,loss26]T3·[loss34,loss35,loss36]T4·[loss45,loss46]T5·loss56. Wherein [. ]]TRepresents a transpose of a matrix; omegagg、ω0、ω1、ω2、ω3、ω4Respectively representing the weight coefficient matrixes; omegag=[ωg0,ωg1,ωg2,ωg3,ωg4,ωg5,ωg6,ωg7];ω0=[ω01,ω02,ω03,ω04,ω05,ω06];ω1=[ω12,ω13,ω14,ω15,ω16];ω2=[ω23,ω24,ω25,ω26];ω3=[ω34,ω35,ω36];ω4=[ω45,ω46];ωijRepresenting the value, ω, in the weight matrix5Representing the weight coefficients.
Step 41: loss function loss obtained based on Step40finalAnd (5) iteratively training the whole network for z times by using a random gradient descent algorithm. Selection evaluationPth corresponding to the model with the highest index (average cross-over ratio, miou) value is stored.
Step 42: test data A obtained in Step2testAnalyzing the trained neural network model best.pth obtained by inputting the data into Step41 to obtain a final prediction graph (clothing analysis graph)
Figure GDA0003224781920000191
The invention mainly has 3 processes: (1) and acquiring data and preprocessing the data. (2) The network structure of the invention is a coding and decoding structure, as shown in fig. 3, the coding part is composed of 6 coding blocks (from left to right in fig. 3, Block-1 to Block-6), the decoding part is composed of 5 decoding blocks (from left to right in fig. 3, Block-5 to Block-1), and the coding Block and the decoding Block are both formed by iterative combination of the same structure. The structure is as follows: two different convolutional layers (Step3) are followed by a connection channel attention module (average pooling layer, convolutional layer, max pooling layer, convolutional layer, Sigmoid layer) (Step4-Step9), for example the structure iterates 7 times to constitute coding block 1. All the coding blocks are connected through downsampling operation (Step11, Step13, Step15, Step17 and Step19), the output results of all the decoding blocks and the output results of the coding blocks 6 are finally upsampled to the size of an input picture (namely the respective prediction results are obtained, for example, the prediction results of the decoding block 1 are obtained after the decoding block 1 is upsampled), channel fusion is carried out (Step31), and then the fused results are processed through convolution operation to obtain the final output of the whole neural network (Step 32). (3) Training a neural network, namely respectively calculating cross entropy losses for the prediction results of 5 decoding blocks and 5 coding blocks 6 output finally by the whole network (Step34), then respectively calculating the cross entropy losses for the prediction results of the decoding blocks 1 and the coding blocks 6 (Step35), then respectively calculating the cross entropy losses for the prediction results of the decoding blocks 2 to 5 and the coding blocks 6 by the decoding block 2 (Step36), then respectively calculating the cross entropy losses for the prediction results of the decoding blocks 4 to 5 and the coding blocks 6 by the decoding block 3 (Step37), then respectively calculating the cross entropy losses for the prediction results of the decoding blocks 5 and the coding blocks 6 by the decoding block 4 (Step38), and finally calculating the cross entropy losses for the prediction results of the coding blocks 6 by the decoding block 5 (Step39), adding the cross entropy loss function results with different weights to obtain a loss function of an auto-supervision part (Step40), wherein the whole network comprises strong supervision besides the auto-supervision part. The strong supervision part calculates cross entropy loss for the final output result of the network and the prediction results of the 5 decoding blocks and the prediction result of the coding block 6 respectively by using the marking data (Step 33). The loss function of the strongly supervised part is also obtained by adding different weights (Step 40). The auto-supervised partial loss function and the strongly supervised partial loss function are added to obtain a final loss function (Step 40). And training the whole network for a fixed number of times by using the designed loss function to obtain a model under the best result of the evaluation index (average cross-over ratio), and analyzing the clothing image of the test set by using the model.
The invention discloses a garment analysis method based on channel attention and self-supervision constraint. Aiming at fashion clothing images, the method provides two modes for improving performance from the perspective of a channel, including a channel attention module and a cross-layer channel self-supervision constraint network. Firstly, standardizing fashion clothing image data containing characters and sending the fashion clothing image data into a network; after passing through a specific convolution layer, inputting the result characteristic diagram into a channel attention module to extract the channel attention; then combining the weighting with the original characteristic diagram to obtain a new characteristic diagram; and then, sending the new feature map into a subsequent processing module for further feature extraction. And finally, in the cross-layer channel self-supervision constraint network, based on the six side branch prediction graphs of the backbone network and the fused prediction graph, respectively applying self-supervision constraint on a lower layer by a higher layer, and simultaneously performing strong supervision constraint on the fusion result and the six side branch prediction graphs by using the labeled image. In particular, there are many forms of self-supervision constraints, one of which may be selected to be combined with a strong supervision constraint to supervise a feature. In addition, under the current fashion garment data set, the method can effectively improve the garment analysis precision by only needing a small number of model parameters.
Fig. 4 is a schematic structural diagram of a garment analysis system based on channel attention and self-supervision constraint according to the present invention, the system includes:
a data set obtaining module 201, configured to obtain a clothing picture data set, where the clothing picture data set includes clothing analysis diagrams corresponding to clothing pictures;
a data input module 202, configured to input each clothing picture in the clothing picture data set into a neural network, where the neural network includes a channel attention module;
the feature extraction module 203 is configured to perform multiple image feature extractions on the input clothing pictures based on the channel attention module to obtain output feature maps of the clothing pictures;
the neural network training module 204 is used for performing iterative training on the neural network based on strong supervision constraint and self-supervision constraint to obtain a trained neural network model; the strong supervision constraint is the strong supervision constraint of the clothing analysis graph corresponding to each clothing picture on the output characteristic graph, and the self-supervision constraint is the self-supervision constraint of high-level output on low-level output in each decoding output data in the neural network;
and the neural network model application module 205 is configured to input the clothing image to be analyzed into the trained neural network model, and output a clothing analysis graph.
The system further comprises:
and the preprocessing module is used for carrying out normalization processing on each clothing picture in the clothing picture data set.
The image feature extraction in the feature extraction module 203 specifically includes:
different convolution operations are carried out on the clothes pictures input into the neural network to obtain first data;
carrying out average pooling on the first data to obtain second data;
performing feature extraction on the second data by using set convolution to obtain first feature data;
performing maximum pooling on the first data to obtain third data;
performing feature extraction on the third data by using the set convolution to obtain second feature data;
adding the first characteristic data and the second characteristic data and inputting the added first characteristic data and the added second characteristic data into a Sigmoid function to obtain fourth data;
and multiplying the fourth data by the first data to obtain output data of the channel attention module.
The feature extraction module 203 specifically further includes:
a first encoding unit for repeating the image feature extraction 7 times to obtain first encoded output data, I ═ ψencoder1(A)=ρ7654321(A) )))) where I represents the first encoded output data, ψ)encoder1Representing the acquisition process, p, of the first encoded output dataiRepresenting the ith repetition, i belongs to {1, 2, 3, 4, 5, 6, 7}, and A represents the clothing picture in the clothing picture data set;
the first down-sampling unit is used for down-sampling the first coding output data to obtain first down-sampling coding output data;
a second encoding unit for repeating the image feature extraction 6 times to obtain second encoded output data, J ═ ψencoder2(I′)=ρ654321(I'))))) where J represents the second encoded output data, ψencoder2Representing the acquisition process of said second encoded output data, I' representing the first downsampled encoded output data;
the second downsampling unit is used for downsampling the second coding output data to obtain second downsampling coding output data;
a third encoding unit for repeating the image feature extraction 5 times to obtain third encoded output data, K ═ ψencoder3(J′)=ρ54321(J'))))) where K represents the third editionCode output data, #encoder3Representing the acquisition process of said third encoded output data, J' representing a second downsampled encoded output data;
a third downsampling unit, configured to downsample the third encoded output data to obtain third downsampled encoded output data;
a fourth encoding unit for repeating the image feature extraction 4 times to obtain fourth encoded output data, L ═ ψencoder4(K′)=ρ4321(K')))) where L represents the fourth encoded output data, ψencoder4Representing the acquisition process of the fourth encoded output data, and K' represents third downsampled encoded output data;
a fourth downsampling unit, configured to downsample the fourth encoded output data to obtain fourth downsampled encoded output data;
a fifth encoding unit for repeating the image feature extraction 4 times to obtain fifth encoded output data, where M ═ ψencoder5(L′)=ρ4321(L')))) where M represents fifth encoded output data, ψencoder5Represents the acquisition process of the fifth encoded output data, and L' represents the fourth downsampled encoded output data;
a fifth downsampling unit, configured to downsample the fifth encoded output data to obtain fifth downsampled encoded output data;
a sixth encoding unit for repeating the image feature extraction 4 times to obtain sixth encoded output data N ═ ψencoder6(M′)=ρ4321(M')))) where N represents sixth encoded output data, ψencoder6Representing the acquisition process of the sixth encoded output data, M' representing a fifth downsampled encoded output data;
the first up-sampling unit is used for up-sampling the sixth coded output data to obtain sixth up-sampled coded output data;
a fifth decoding unit for extracting the image features againObtaining the fifth decoded output data 4 times, O ═ psidecoder5(N′,M)=ρ4321(concat (N', M)))) where O represents the fifth decoded output data, ψdecoder5Represents the acquisition process of the fifth decoded output data, concat () represents channel fusion, and N' represents the sixth upsampled encoded output data;
a second upsampling unit, configured to upsample the fifth decoded output data to obtain fifth upsampled encoded output data;
a fourth decoding unit for repeating the image feature extraction 4 times to obtain fourth decoded output data, where P ═ ψdecoder4(O′,L)=ρ4321(concat (O', L)))) where P represents the fourth decoded output data, ψdecoder4Represents the acquisition process of the fourth decoded output data, O' represents the fifth up-sampled encoded output data;
a third upsampling unit, configured to upsample the fourth decoded output data to obtain fourth upsampled encoded output data;
a third decoding unit for repeating the image feature extraction 5 times to obtain third decoded output data, Q ═ ψdecoder3(P′,K)=ρ54321(concat (P', K)))) where Q represents the third decoded output data, ψdecoder3Represents the acquisition process of the third decoded output data, P' represents the fourth up-sampled encoded output data;
a fourth upsampling unit, configured to upsample the third decoded output data to obtain third upsampled encoded output data;
a second decoding unit for repeating the image feature extraction 6 times to obtain second decoded output data, wherein R phidecoder2(Q′,J)=ρ654321(concat (Q', J))))) where R represents the second decoded output data, ψdecoder2Representing the second decoded output dataQ' represents the third up-sampled encoded output data;
a fifth upsampling unit, configured to upsample the second decoded output data to obtain second upsampled encoded output data;
a first decoding unit for repeating the image feature extraction 7 times to obtain first decoded output data, wherein S ═ ψdecoder1(R′,I)=ρ7654321(concat (R', I)))))) where S represents the first decoded output data, ψdecoder1Representing the acquisition process of said first decoded output data, R' representing second up-sampled encoded output data;
a sixth upsampling unit, configured to upsample sixth encoded output data, fifth decoded output data, fourth decoded output data, third decoded output data, second decoded output data, and first decoded output data to the same size as the clothing picture, and count the sixth encoded output upsampling data, the fifth decoded output upsampling data, the fourth decoded output upsampling data, the third decoded output upsampling data, the second decoded output upsampling data, and the first decoded output upsampling data, respectively;
the data fusion unit is used for fusing sixth encoding output up-sampling data, fifth decoding output up-sampling data, fourth decoding output up-sampling data, third decoding output up-sampling data, second decoding output up-sampling data and first decoding output up-sampling data by adopting a channel to obtain fused data;
and the characteristic diagram output unit is used for carrying out convolution operation on the fusion data by adopting the set convolution to obtain an output characteristic diagram of the clothing image.
The neural network training module 204 specifically includes:
a first group of cross entropy loss obtaining units, configured to respectively calculate cross entropy losses of a clothing analysis graph corresponding to the clothing picture and an output feature graph of the clothing picture, the first decoded output upsampled data, the second decoded output upsampled data, the third decoded output upsampled data, the fourth decoded output upsampled data, the fifth decoded output upsampled data, and the sixth encoded output upsampled data, so as to obtain a first group of cross entropy losses;
a second group of cross entropy loss obtaining units, configured to calculate cross entropy losses between the output feature map of the clothing picture and the first decoded output upsampled data, the second decoded output upsampled data, the third decoded output upsampled data, the fourth decoded output upsampled data, the fifth decoded output upsampled data, and the sixth encoded output upsampled data, respectively, to obtain a second group of cross entropy losses;
a third group of cross entropy loss obtaining units, configured to calculate cross entropy losses of the first decoded output upsampled data and the second decoded output upsampled data, the third decoded output upsampled data, the fourth decoded output upsampled data, the fifth decoded output upsampled data, and the sixth encoded output upsampled data, respectively, to obtain a third group of cross entropy losses;
a fourth group of cross entropy loss obtaining units, configured to calculate cross entropy losses of the second decoded output upsampled data and the third decoded output upsampled data, the fourth decoded output upsampled data, the fifth decoded output upsampled data, and the sixth encoded output upsampled data, respectively, to obtain a fourth group of cross entropy losses;
a fifth group of cross entropy loss obtaining units, configured to calculate cross entropy losses of the third decoded output upsampled data and the fourth decoded output upsampled data, the fifth decoded output upsampled data, and the sixth encoded output upsampled data, respectively, to obtain a fifth group of cross entropy losses;
a sixth group of cross entropy loss obtaining units, configured to calculate cross entropy losses of the fourth decoded output upsampled data, the fifth decoded output upsampled data, and the sixth encoded output upsampled data, respectively, to obtain a sixth group of cross entropy losses;
a seventh group of cross entropy loss obtaining units, configured to calculate cross entropy losses of the fifth decoded output upsampled data and the sixth encoded output upsampled data, respectively, to obtain a seventh group of cross entropy losses;
a loss function obtaining unit, configured to perform different weighting on the first group of cross entropy losses, the second group of cross entropy losses, the third group of cross entropy losses, the fourth group of cross entropy losses, the fifth group of cross entropy losses, the sixth group of cross entropy losses, and the seventh group of cross entropy losses, and then add the weighted values to obtain a loss function;
and the training unit is used for carrying out iterative training on the neural network by adopting a random gradient descent algorithm based on the loss function.
The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. For the system disclosed by the embodiment, the description is relatively simple because the system corresponds to the method disclosed by the embodiment, and the relevant points can be referred to the method part for description.
The principles and embodiments of the present invention have been described herein using specific examples, which are provided only to help understand the method and the core concept of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, the specific embodiments and the application range may be changed. In view of the above, the present disclosure should not be construed as limiting the invention.

Claims (8)

1. A method for analyzing a garment based on channel attention and self-supervision constraint, the method comprising:
acquiring a clothing picture data set, wherein the clothing picture data set comprises clothing analytic graphs corresponding to all clothing pictures;
inputting each clothing picture in the clothing picture data set into a neural network, wherein the neural network comprises a channel attention module;
based on the channel attention module, performing multiple times of image feature extraction on the input clothing pictures to obtain output feature maps of the clothing pictures;
performing iterative training on the neural network based on strong supervision constraint and self-supervision constraint to obtain a trained neural network model; the strong supervision constraint is the strong supervision constraint of the clothing analysis graph corresponding to each clothing picture on the output characteristic graph, and the self-supervision constraint is the self-supervision constraint of high-level output on low-level output in each decoding output data in the neural network;
inputting the clothing picture to be analyzed into the trained neural network model, and outputting a clothing analysis picture;
the method for extracting the image features of the input clothing pictures for multiple times based on the channel attention module to obtain the output feature map of each clothing picture specifically comprises the following steps:
repeating the image feature extraction for 7 times to obtain first coded output data, wherein I ═ psiencoder1(A)=ρ7654321(A) )))) where I represents the first encoded output data, ψ)encoder1Representing the acquisition process, p, of the first encoded output datai′Represents the i 'th repetition, i' belongs to {1, 2, 3, 4, 5, 6, 7}, and A represents the clothing picture in the clothing picture data set;
down-sampling the first encoded output data to obtain first down-sampled encoded output data;
repeating the image feature extraction for 6 times to obtain second coded output data, J ═ psiencoder2(I′)=ρ654321(I'))))) where J represents the second encoded output data, ψencoder2Representing the acquisition process of said second encoded output data, I' representing the first downsampled encoded output data;
down-sampling the second coded output data to obtain second down-sampled coded output data;
repeating the image feature extraction for 5 times to obtain third coded output data, wherein K is psiencoder3(J′)=ρ54321(J'))))) where K represents the third encoded output data, ψencoder3Representing the acquisition process of said third encoded output data, J' representing a second downsampled encoded output data;
down-sampling the third encoded output data to obtain third down-sampled encoded output data;
repeating the image feature extraction 4 times to obtain fourth coded output data, wherein L ═ psiencoder4(K′)=ρ4321(K')))) where L represents the fourth encoded output data, ψencoder4Representing the acquisition process of the fourth encoded output data, and K' represents third downsampled encoded output data;
down-sampling the fourth encoded output data to obtain fourth down-sampled encoded output data;
repeating the image feature extraction for 4 times to obtain fifth coded output data, wherein M is psiencoder5(L′)=ρ4321(L')))) where M represents fifth encoded output data, ψencoder5Represents the acquisition process of the fifth encoded output data, and L' represents the fourth downsampled encoded output data;
performing downsampling on the fifth coded output data to obtain fifth downsampled coded output data;
repeating the image feature extraction for 4 times to obtain sixth coded output data N ═ psiencoder6(M′)=ρ4321(M')))) where N represents sixth encoded output data, ψencoder6Representing the acquisition process of the sixth encoded output data, M' representing a fifth downsampled encoded output data;
performing up-sampling on the sixth encoded output data to obtain sixth up-sampled encoded output data;
repeating the image feature extraction 4 times to obtain fifth decoded output data, O ═ ψdecoder5(N′,M)=ρ4321(concat (N', M)))) where O represents the fifth decoded output data, ψdecoder5Represents the acquisition process of the fifth decoded output data, concat () represents channel fusion, and N' represents the sixth upsampled encoded output data;
performing upsampling on the fifth decoded output data to obtain fifth upsampled coded output data;
repeating the image feature extraction 4 times to obtain fourth decoded output data, P ═ ψdecoder4(O′,L)=ρ4321(concat (O', L)))) where P represents the fourth decoded output data, ψdecoder4Represents the acquisition process of the fourth decoded output data, O' represents the fifth up-sampled encoded output data;
up-sampling the fourth decoded output data to obtain fourth up-sampled encoded output data;
repeating the image feature extraction 5 times to obtain third decoded output data, Q ═ ψdecoder3(P′,K)=ρ54321(concat (P', K)))) where Q represents the third decoded output data, ψdecoder3Represents the acquisition process of the third decoded output data, P' represents the fourth up-sampled encoded output data;
up-sampling the third decoded output data to obtain third up-sampled encoded output data;
repeating the image feature extraction 6 times to obtain second decoded output data, wherein R & ltpsi >decoder2(Q′,J)=ρ654321(concat (Q', J))))) where R represents the second decoded output data, ψdecoder2Representing the acquisition of said second decoded output data, Q' representing third up-sampled encoded output data;
upsampling the second decoded output data to obtain second upsampled coded output data;
repeating the image feature extraction for 7 times to obtain a first decoding output numberAccording to S ═ ψdecoder1(R′,I)=ρ7654321(concat (R', I)))))) where S represents the first decoded output data, ψdecoder1Representing the acquisition process of said first decoded output data, R' representing second up-sampled encoded output data;
upsampling sixth coded output data, fifth decoded output data, fourth decoded output data, third decoded output data, second decoded output data and first decoded output data to the same size as the clothing picture, and respectively counting sixth coded output upsampling data, fifth decoded output upsampling data, fourth decoded output upsampling data, third decoded output upsampling data, second decoded output upsampling data and first decoded output upsampling data;
fusing sixth encoding output up-sampled data, fifth decoding output up-sampled data, fourth decoding output up-sampled data, third decoding output up-sampled data, second decoding output up-sampled data and first decoding output up-sampled data by adopting a channel to obtain fused data;
and performing convolution operation on the fusion data by adopting set convolution to obtain an output characteristic diagram of the clothing picture.
2. The method of claim 1, wherein before inputting each of the garment pictures in the garment picture dataset into a neural network, the method further comprises:
and carrying out normalization processing on each clothing picture in the clothing picture data set.
3. The garment parsing method based on channel attention and self-supervision constraint according to claim 1, wherein the image feature extraction specifically comprises:
different convolution operations are carried out on the clothes pictures input into the neural network to obtain first data;
carrying out average pooling on the first data to obtain second data;
performing feature extraction on the second data by using set convolution to obtain first feature data;
performing maximum pooling on the first data to obtain third data;
performing feature extraction on the third data by using the set convolution to obtain second feature data;
adding the first characteristic data and the second characteristic data and inputting the added first characteristic data and the added second characteristic data into a Sigmoid function to obtain fourth data;
and multiplying the fourth data by the first data to obtain output data of the channel attention module.
4. The garment parsing method based on channel attention and self-supervision constraint according to claim 1, wherein the iteratively training the neural network based on strong supervision constraint and self-supervision constraint to obtain a trained neural network model specifically comprises:
respectively calculating the clothing analysis graph corresponding to the clothing picture and the cross entropy losses of the output characteristic graph of the clothing picture, the first decoding output up-sampling data, the second decoding output up-sampling data, the third decoding output up-sampling data, the fourth decoding output up-sampling data, the fifth decoding output up-sampling data and the sixth encoding output up-sampling data to obtain a first group of cross entropy losses;
respectively calculating cross entropy losses of the output feature map of the clothing picture and the first decoding output up-sampled data, the second decoding output up-sampled data, the third decoding output up-sampled data, the fourth decoding output up-sampled data, the fifth decoding output up-sampled data and the sixth encoding output up-sampled data to obtain a second group of cross entropy losses;
calculating cross entropy losses of the first decoded output upsampled data and the second decoded output upsampled data, the third decoded output upsampled data, the fourth decoded output upsampled data, the fifth decoded output upsampled data and the sixth encoded output upsampled data, respectively, to obtain a third set of cross entropy losses;
calculating cross entropy losses of the second decoded output upsampled data and the third decoded output upsampled data, the fourth decoded output upsampled data, the fifth decoded output upsampled data and the sixth encoded output upsampled data, respectively, to obtain a fourth set of cross entropy losses;
calculating cross entropy losses of the third decoded output upsampled data and the fourth decoded output upsampled data, the fifth decoded output upsampled data and the sixth encoded output upsampled data, respectively, to obtain a fifth set of cross entropy losses;
calculating cross entropy losses of the fourth decoded output upsampled data, the fifth decoded output upsampled data and the sixth encoded output upsampled data, respectively, to obtain a sixth set of cross entropy losses;
respectively calculating cross entropy losses of the fifth decoded output upsampled data and the sixth encoded output upsampled data to obtain a seventh group of cross entropy losses;
weighting and adding the first group of cross entropy losses, the second group of cross entropy losses, the third group of cross entropy losses, the fourth group of cross entropy losses, the fifth group of cross entropy losses, the sixth group of cross entropy losses and the seventh group of cross entropy losses to obtain a loss function;
and performing iterative training on the neural network by adopting a random gradient descent algorithm based on the loss function.
5. A garment parsing system based on channel attention and self-supervision constraints, the system comprising:
the system comprises a data set acquisition module, a data processing module and a data processing module, wherein the data set acquisition module is used for acquiring a clothing picture data set, and the clothing picture data set comprises clothing analytic graphs corresponding to all clothing pictures;
the data input module is used for inputting each clothing picture in the clothing picture data set into a neural network, and the neural network comprises a channel attention module;
the feature extraction module is used for extracting image features of the input clothing pictures for multiple times based on the channel attention module to obtain output feature maps of the clothing pictures;
the neural network training module is used for carrying out iterative training on the neural network based on strong supervision constraint and self-supervision constraint to obtain a trained neural network model; the strong supervision constraint is the strong supervision constraint of the clothing analysis graph corresponding to each clothing picture on the output characteristic graph, and the self-supervision constraint is the self-supervision constraint of high-level output on low-level output in each decoding output data in the neural network;
the neural network model application module is used for inputting the clothing picture to be analyzed into the trained neural network model and outputting a clothing analysis graph;
the feature extraction module specifically includes:
a first encoding unit for repeating the image feature extraction 7 times to obtain first encoded output data, I ═ ψencoder1(A)=ρ7654321(A) )))) where I represents the first encoded output data, ψ)encoder1Representing the acquisition process, p, of the first encoded output datai′Represents the i 'th repetition, i' belongs to {1, 2, 3, 4, 5, 6, 7}, and A represents the clothing picture in the clothing picture data set;
the first down-sampling unit is used for down-sampling the first coding output data to obtain first down-sampling coding output data;
a second encoding unit for repeating the image feature extraction 6 times to obtain second encoded output data, J ═ ψencoder2(I′)=ρ654321(I'))))) where J represents the second encoded output data, ψencoder2Representing the acquisition of said second encoded output data, I' representing the firstDown-sampling the encoded output data;
the second downsampling unit is used for downsampling the second coding output data to obtain second downsampling coding output data;
a third encoding unit for repeating the image feature extraction 5 times to obtain third encoded output data, K ═ ψencoder3(J′)=ρ54321(J'))))) where K represents the third encoded output data, ψencoder3Representing the acquisition process of said third encoded output data, J' representing a second downsampled encoded output data;
a third downsampling unit, configured to downsample the third encoded output data to obtain third downsampled encoded output data;
a fourth encoding unit for repeating the image feature extraction 4 times to obtain fourth encoded output data, L ═ ψencoder4(K′)=ρ4321(K')))) where L represents the fourth encoded output data, ψencoder4Representing the acquisition process of the fourth encoded output data, and K' represents third downsampled encoded output data;
a fourth downsampling unit, configured to downsample the fourth encoded output data to obtain fourth downsampled encoded output data;
a fifth encoding unit for repeating the image feature extraction 4 times to obtain fifth encoded output data, where M ═ ψencoder5(L′)=ρ4321(L')))) where M represents fifth encoded output data, ψencoder5Represents the acquisition process of the fifth encoded output data, and L' represents the fourth downsampled encoded output data;
a fifth downsampling unit, configured to downsample the fifth encoded output data to obtain fifth downsampled encoded output data;
a sixth encoding unit for repeating the image feature extraction 4 times to obtain sixth encoded output data N ═ ψencoder6(M′)=ρ4321(M')))) where N represents sixth encoded output data, ψencoder6Representing the acquisition process of the sixth encoded output data, M' representing a fifth downsampled encoded output data;
the first up-sampling unit is used for up-sampling the sixth coded output data to obtain sixth up-sampled coded output data;
a fifth decoding unit for repeating the image feature extraction 4 times to obtain fifth decoded output data, O ═ ψdecoder5(N′,M)=ρ432(ρ 1(concat (N', M)))) where O denotes fifth decoding output data, ψdecoder5Represents the acquisition process of the fifth decoded output data, concat () represents channel fusion, and N' represents the sixth upsampled encoded output data;
a second upsampling unit, configured to upsample the fifth decoded output data to obtain fifth upsampled encoded output data;
a fourth decoding unit for repeating the image feature extraction 4 times to obtain fourth decoded output data, where P ═ ψdecoder4(O′,L)=ρ4321(concat (O', L)))) where P represents the fourth decoded output data, ψdecoder4Represents the acquisition process of the fourth decoded output data, O' represents the fifth up-sampled encoded output data;
a third upsampling unit, configured to upsample the fourth decoded output data to obtain fourth upsampled encoded output data;
a third decoding unit for repeating the image feature extraction 5 times to obtain third decoded output data, Q ═ ψdecoder3(P′,K)=ρ54321(concat (P', K)))) where Q represents the third decoded output data, ψdecoder2Represents the acquisition process of the third decoded output data, P' represents the fourth up-sampled encoded output data;
a fourth upsampling unit, configured to upsample the third decoded output data to obtain third upsampled encoded output data;
a second decoding unit for repeating the image feature extraction 6 times to obtain second decoded output data, wherein R phidecoder2(Q′,J)=ρ654321(concat (Q', J))))) where R represents the second decoded output data, ψdecoder2Representing the acquisition of said second decoded output data, Q' representing third up-sampled encoded output data;
a fifth upsampling unit, configured to upsample the second decoded output data to obtain second upsampled encoded output data;
a first decoding unit for repeating the image feature extraction 7 times to obtain first decoded output data, wherein S ═ ψdecoder1(R′,I)=ρ7654321(concat (R', I)))))) where S represents the first decoded output data, ψdecoder1Representing the acquisition process of said first decoded output data, R' representing second up-sampled encoded output data;
a sixth upsampling unit, configured to upsample sixth encoded output data, fifth decoded output data, fourth decoded output data, third decoded output data, second decoded output data, and first decoded output data to the same size as the clothing picture, and count the sixth encoded output upsampling data, the fifth decoded output upsampling data, the fourth decoded output upsampling data, the third decoded output upsampling data, the second decoded output upsampling data, and the first decoded output upsampling data, respectively;
the data fusion unit is used for fusing sixth encoding output up-sampling data, fifth decoding output up-sampling data, fourth decoding output up-sampling data, third decoding output up-sampling data, second decoding output up-sampling data and first decoding output up-sampling data by adopting a channel to obtain fused data;
and the characteristic diagram output unit is used for carrying out convolution operation on the fusion data by adopting set convolution to obtain an output characteristic diagram of the clothing image.
6. The channel attention and self-supervision constraint based garment parsing system of claim 5, further comprising:
and the preprocessing module is used for carrying out normalization processing on each clothing picture in the clothing picture data set.
7. The system for analyzing clothes based on channel attention and self-supervision constraint according to claim 5, wherein the image feature extraction specifically comprises:
different convolution operations are carried out on the clothes pictures input into the neural network to obtain first data;
carrying out average pooling on the first data to obtain second data;
performing feature extraction on the second data by using set convolution to obtain first feature data;
performing maximum pooling on the first data to obtain third data;
performing feature extraction on the third data by using the set convolution to obtain second feature data;
adding the first characteristic data and the second characteristic data and inputting the added first characteristic data and the added second characteristic data into a Sigmoid function to obtain fourth data;
and multiplying the fourth data by the first data to obtain output data of the channel attention module.
8. The system for analyzing clothes based on channel attention and self-supervision constraint according to claim 5, wherein the neural network training module specifically comprises:
a first group of cross entropy loss obtaining units, configured to respectively calculate cross entropy losses of a clothing analysis graph corresponding to the clothing picture and an output feature graph of the clothing picture, the first decoded output upsampled data, the second decoded output upsampled data, the third decoded output upsampled data, the fourth decoded output upsampled data, the fifth decoded output upsampled data, and the sixth encoded output upsampled data, so as to obtain a first group of cross entropy losses;
a second group of cross entropy loss obtaining units, configured to calculate cross entropy losses between the output feature map of the clothing picture and the first decoded output upsampled data, the second decoded output upsampled data, the third decoded output upsampled data, the fourth decoded output upsampled data, the fifth decoded output upsampled data, and the sixth encoded output upsampled data, respectively, to obtain a second group of cross entropy losses;
a third group of cross entropy loss obtaining units, configured to calculate cross entropy losses of the first decoded output upsampled data and the second decoded output upsampled data, the third decoded output upsampled data, the fourth decoded output upsampled data, the fifth decoded output upsampled data, and the sixth encoded output upsampled data, respectively, to obtain a third group of cross entropy losses;
a fourth group of cross entropy loss obtaining units, configured to calculate cross entropy losses of the second decoded output upsampled data and the third decoded output upsampled data, the fourth decoded output upsampled data, the fifth decoded output upsampled data, and the sixth encoded output upsampled data, respectively, to obtain a fourth group of cross entropy losses;
a fifth group of cross entropy loss obtaining units, configured to calculate cross entropy losses of the third decoded output upsampled data and the fourth decoded output upsampled data, the fifth decoded output upsampled data, and the sixth encoded output upsampled data, respectively, to obtain a fifth group of cross entropy losses;
a sixth group of cross entropy loss obtaining units, configured to calculate cross entropy losses of the fourth decoded output upsampled data, the fifth decoded output upsampled data, and the sixth encoded output upsampled data, respectively, to obtain a sixth group of cross entropy losses;
a seventh group of cross entropy loss obtaining units, configured to calculate cross entropy losses of the fifth decoded output upsampled data and the sixth encoded output upsampled data, respectively, to obtain a seventh group of cross entropy losses;
a loss function obtaining unit, configured to weight and add the first group of cross entropy losses, the second group of cross entropy losses, the third group of cross entropy losses, the fourth group of cross entropy losses, the fifth group of cross entropy losses, the sixth group of cross entropy losses, and the seventh group of cross entropy losses to obtain a loss function;
and the training unit is used for carrying out iterative training on the neural network by adopting a random gradient descent algorithm based on the loss function.
CN202110226332.1A 2021-03-01 2021-03-01 Clothing analysis method and system based on channel attention and self-supervision constraint Active CN112927236B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110226332.1A CN112927236B (en) 2021-03-01 2021-03-01 Clothing analysis method and system based on channel attention and self-supervision constraint

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110226332.1A CN112927236B (en) 2021-03-01 2021-03-01 Clothing analysis method and system based on channel attention and self-supervision constraint

Publications (2)

Publication Number Publication Date
CN112927236A CN112927236A (en) 2021-06-08
CN112927236B true CN112927236B (en) 2021-10-15

Family

ID=76172932

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110226332.1A Active CN112927236B (en) 2021-03-01 2021-03-01 Clothing analysis method and system based on channel attention and self-supervision constraint

Country Status (1)

Country Link
CN (1) CN112927236B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114511573B (en) * 2021-12-29 2023-06-09 电子科技大学 Human body analysis device and method based on multi-level edge prediction
CN114998934B (en) * 2022-06-27 2023-01-03 山东省人工智能研究院 Clothes-changing pedestrian re-identification and retrieval method based on multi-mode intelligent perception and fusion

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108596902A (en) * 2018-05-04 2018-09-28 北京大学 The full reference image quality appraisement method of multitask based on gating convolutional neural networks
CN108876728A (en) * 2018-04-20 2018-11-23 南京理工大学 Single image to the fog method based on residual error study
CN108932517A (en) * 2018-06-28 2018-12-04 中山大学 A kind of multi-tag clothes analytic method based on fining network model
CN110097519A (en) * 2019-04-28 2019-08-06 暨南大学 Double supervision image defogging methods, system, medium and equipment based on deep learning
CN110751636A (en) * 2019-10-12 2020-02-04 天津工业大学 Fundus image retinal arteriosclerosis detection method based on improved coding and decoding network
CN110930408A (en) * 2019-10-15 2020-03-27 浙江大学 Semantic image compression method based on knowledge reorganization
CN111833277A (en) * 2020-07-27 2020-10-27 大连海事大学 Marine image defogging method with non-paired multi-scale hybrid coding and decoding structure
CN111968088A (en) * 2020-08-14 2020-11-20 西安电子科技大学 Building detection method based on pixel and region segmentation decision fusion
CN112418027A (en) * 2020-11-11 2021-02-26 青岛科技大学 Remote sensing image road extraction method for improving U-Net network

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2008505589A (en) * 2004-06-30 2008-02-21 コメット テクノロジーズ エルエルシー Method of data compression including compression of video data
US10204286B2 (en) * 2016-02-29 2019-02-12 Emersys, Inc. Self-organizing discrete recurrent network digital image codec
CN110889868B (en) * 2019-10-28 2023-04-18 杭州电子科技大学 Monocular image depth estimation method combining gradient and texture features
CN111681252B (en) * 2020-05-30 2022-05-03 重庆邮电大学 Medical image automatic segmentation method based on multipath attention fusion
CN112232391B (en) * 2020-09-29 2022-04-08 河海大学 Dam crack detection method based on U-net network and SC-SAM attention mechanism
CN112381172B (en) * 2020-11-28 2022-09-16 桂林电子科技大学 InSAR interference image phase unwrapping method based on U-net

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108876728A (en) * 2018-04-20 2018-11-23 南京理工大学 Single image to the fog method based on residual error study
CN108596902A (en) * 2018-05-04 2018-09-28 北京大学 The full reference image quality appraisement method of multitask based on gating convolutional neural networks
CN108932517A (en) * 2018-06-28 2018-12-04 中山大学 A kind of multi-tag clothes analytic method based on fining network model
CN110097519A (en) * 2019-04-28 2019-08-06 暨南大学 Double supervision image defogging methods, system, medium and equipment based on deep learning
CN110751636A (en) * 2019-10-12 2020-02-04 天津工业大学 Fundus image retinal arteriosclerosis detection method based on improved coding and decoding network
CN110930408A (en) * 2019-10-15 2020-03-27 浙江大学 Semantic image compression method based on knowledge reorganization
CN111833277A (en) * 2020-07-27 2020-10-27 大连海事大学 Marine image defogging method with non-paired multi-scale hybrid coding and decoding structure
CN111968088A (en) * 2020-08-14 2020-11-20 西安电子科技大学 Building detection method based on pixel and region segmentation decision fusion
CN112418027A (en) * 2020-11-11 2021-02-26 青岛科技大学 Remote sensing image road extraction method for improving U-Net network

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
Jindong Jiang et al.2. RedNet: Residual Encoder-Decoder Network for indoor RGB-D Semantic Segmentation.《Computer Vision and Pattern Recognition》.2018, *
基于深度学习的服装图像语义分析与检索推荐;徐慧等;《纺织高校基础科学学报》;20201027;第33卷(第3期);64-72 *
基于编解码网络和特征编码的图像语义分割方法研究;黎宵;《中国优秀硕士学位论文全文数据库 信息科技辑》;20210115(第1期);I138-1481 *
基于编解码网络的多姿态人脸图像正面化方法;徐海月等;《中国科学:信息科学》;20191231;第49卷(第4期);450–463 *

Also Published As

Publication number Publication date
CN112927236A (en) 2021-06-08

Similar Documents

Publication Publication Date Title
CN110111366B (en) End-to-end optical flow estimation method based on multistage loss
CN112926396B (en) Action identification method based on double-current convolution attention
CN112541503B (en) Real-time semantic segmentation method based on context attention mechanism and information fusion
CN111582316B (en) RGB-D significance target detection method
CN111369565B (en) Digital pathological image segmentation and classification method based on graph convolution network
CN113469094A (en) Multi-mode remote sensing data depth fusion-based earth surface coverage classification method
CN110969124A (en) Two-dimensional human body posture estimation method and system based on lightweight multi-branch network
CN111340814A (en) Multi-mode adaptive convolution-based RGB-D image semantic segmentation method
CN112927236B (en) Clothing analysis method and system based on channel attention and self-supervision constraint
CN114283158A (en) Retinal blood vessel image segmentation method and device and computer equipment
CN115311186B (en) Cross-scale attention confrontation fusion method and terminal for infrared and visible light images
CN111242068B (en) Behavior recognition method and device based on video, electronic equipment and storage medium
CN113450313B (en) Image significance visualization method based on regional contrast learning
CN113782190B (en) Image processing method based on multistage space-time characteristics and mixed attention network
CN117351363A (en) Remote sensing image building extraction method based on transducer
CN111709289A (en) Multi-task deep learning model for improving human body analysis effect
CN109766918A (en) Conspicuousness object detecting method based on the fusion of multi-level contextual information
CN115578280A (en) Construction method of double-branch remote sensing image defogging network
CN115049739A (en) Binocular vision stereo matching method based on edge detection
CN114821434A (en) Space-time enhanced video anomaly detection method based on optical flow constraint
CN113538402A (en) Crowd counting method and system based on density estimation
CN116824308B (en) Image segmentation model training method and related method, device, medium and equipment
CN117727022A (en) Three-dimensional point cloud target detection method based on transform sparse coding and decoding
CN117173595A (en) Unmanned aerial vehicle aerial image target detection method based on improved YOLOv7
CN114841887B (en) Image recovery quality evaluation method based on multi-level difference learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant