CN114529795A

CN114529795A - Clothing key point detection method based on optimized heat map supervision mechanism

Info

Publication number: CN114529795A
Application number: CN202011218825.2A
Authority: CN
Inventors: 张文强; 邓苇; 张睿; 齐立哲
Original assignee: Fudan University
Current assignee: Fudan University
Priority date: 2020-11-04
Filing date: 2020-11-04
Publication date: 2022-05-24

Abstract

According to the clothing key point detection method based on the optimized heat map supervision mechanism, a clothing key point detection training set is calculated through a multi-resolution heat map to obtain predicted heat maps of three scales, high resolution can be kept in an information transmission process due to the fact that a backbone network is HRNet, redundant spatial information can be brought while spatial resolution is increased by adding a Gaussian kernel with gradually-reduced standard deviation in a gradually-focused heat map supervision mechanism, and finally a clothing key point detection network is trained in a back propagation mode through calculating a horizontal loss function and a mean square error to obtain the clothing key point detection network. The clothing key point detection network is trained through the horizontal loss function, a better detection model can be trained on the premise that the calculation performance is not damaged too much, and the clothing key point detection precision is improved.

Description

Clothing key point detection method based on optimized heat map supervision mechanism

Technical Field

The invention relates to a clothing key point detection method based on an optimized heat map supervision mechanism.

Background

The purpose of apparel keypoint detection is to predict the locations of keypoints defined on the garment, functional keypoints such as the corners of the neckline, hem, and cuffs. The positions of the key points are detected as a bottom layer loop in the clothing information identification technology, and the method plays an important role in tasks such as clothing position calibration, attribute prediction and the like.

The key point detection task originates from human body posture estimation and human face key point positioning, and the traditional key point detection task is based on a regression coordinate mode, extracts features from an input image and regresses the features to an output layer to obtain key point coordinates. However, the regression method has problems of key point coupling and the like in prediction, and the prediction performance is poor. The currently proposed framework mainly includes: the prediction method comprises the steps of Deep Fuzzy Alignment (DFA) consisting of convolution neural networks in three stages and Deep Landmark Network (DLAN) adopting a depth keypoint detection Network combining selective expansion convolution and recursive spatial transformation, wherein the two frames are predicted based on a method of directly regressing the coordinates of keypoints. In addition, a global-local embedding module is used for realizing more accurate landmark prediction performance, and a robust model Match R-CNN based on Mask R-CNN is constructed by establishing a large-scale reference data set DeepFashinon 2, so that good detection effect is also realized.

The latest advanced research is to enhance the structural layout relationship between key points in a hierarchical manner by using a structural graph reasoning method. The method for defining different parts of the key points as the nodes of the graph and clustering the features according to the thought of the graph neural network has the advantage that the complex information transmission mode and the inference module are adopted, so that the model capability is increased, but the increased calculation amount is also brought.

In conclusion, in the clothing key point detection task, the advanced method is mostly based on the classic heat map supervision method, so that the feature extraction process is fully designed. The rare method carries out deep research aiming at a heat map supervision method in a prediction link.

Disclosure of Invention

In order to solve the problems, the invention provides a clothing key point detection method based on an optimized heat map supervision mechanism, which adopts the following technical scheme:

the invention provides a clothing key point detection method based on an optimized heat map supervision mechanism, which is characterized by comprising the following steps: step S1-1, calculating the input image aiming at the position of each key point through a pre-trained clothing key point detection network to obtain output heat maps with different resolutions; step S1-2, the output heat map is processed by difference up-sampling and arithmetic mean to obtain the final output heat map; and S1-3, outputting a final output heat map, wherein the training process of the clothing key point detection network specifically comprises the following steps: s2-1, carrying out image preprocessing on a pre-prepared original data set comprising an original image, a clothing boundary frame and key points, and enhancing the obtained image characteristics through data to obtain a clothing key point detection training set comprising processed images; s2-2, aiming at the clothing key point detection training set, obtaining prediction heat maps of three scales through a multi-resolution heat map supervision module in a clothing key point detection network to be trained; step S2-3, calculating a real heat map by a step-by-step focusing heat supervision module aiming at the clothing key point detection training set; step S2-4, calculating a horizontal loss function and a mean square error by comparing the predicted heat map and the real heat map; step S2-5, calculating a total loss function according to the horizontal loss function and the mean square error; and step S2-6, performing back propagation through a mean square error and a total loss function by a preset gradient rotation method, and training the clothing key point detection network to obtain the clothing key point detection network.

The clothing key point detection method based on the optimized heat map supervision mechanism provided by the invention can also have the technical characteristics that the horizontal loss function calculation process is as follows:

where HL means the horizontal loss function, y and

pixel values on the real and predicted heat maps respectively,

is the maximum confidence value in the model prediction heat map. The mean square error is calculated as follows:

where MSE is the mean square error, y and

pixel values on the real heat map and the predicted heat map, respectively.

The clothing key point detection method based on the optimized heat map supervision mechanism provided by the invention can also have the technical characteristics that the total loss function is obtained by jointly calculating the mean square error and the horizontal loss function, and the calculation process is specifically as follows: l is_totalWhere, L is MSE + α × HL_totalThe MSE is the mean square error, HL is the horizontal loss function, and alpha is the hyperparameter.

The invention provides a clothing key point detection method based on an optimized heat map supervision mechanism, which can also have the technical characteristics, wherein a multi-resolution heat map supervision module comprises an up-sampling part, a three-way feature convolution part and a multi-size supervision part, wherein the up-sampling part is used for up-sampling a processed image through twice deconvolution to obtain three scale feature maps under different resolutions, the three-way feature convolution part is used for performing twice convolution on the scale feature maps under the three different resolutions through a convolution kernel of 1 multiplied by 1, and obtaining output heat maps under the three different scales by a mode of matching and predicting the number of key points by adjusting the number of channels, and the multi-size supervision module is used for supervising the output heat maps under the three scales aiming at a single-channel heat map.

The clothing key point detection method based on the optimized heat map supervision mechanism provided by the invention can also have the technical characteristics that the step-by-step focusing heat map supervision module generates a real heat map by performing Gaussian blur on the positions of key points in an original image, the step-by-step focusing heat map supervision module comprises a Gaussian kernel, the Gaussian kernel has a standard deviation which is gradually reduced along with the advancing of training time, and the standard deviation is reduced to a smaller value in a mode of being reduced to a half of the former standard deviation.

The clothing key point detection method based on the optimized heat map supervision mechanism provided by the invention can also have the technical characteristics that the preset gradient rotation method is to perform gradient return on the clothing key point detection network to be trained only by using the mean square error in the first five periods of training, perform gradient return on the clothing key point detection network to be trained through the total loss function and update the parameters of a network model, use a larger standard deviation at the initial stage of training to ensure the convergence speed of the model, and gradually reduce the standard deviation to a smaller value along with the gradual progress of the training period.

The clothing key point detection method based on the optimized heat map supervision mechanism provided by the invention can also have the technical characteristics that the data enhancement processes the image characteristics in the modes of random turning, random rotation and random cutting to obtain a clothing key point detection data set, the random turning is to turn the training data image clockwise and anticlockwise in the vertical direction with preset probability, the random rotation is to rotate the training data image at an angle selected from the range of [ -30 °, +30 ° ] at random, and the random cutting is to randomly cut the area outside a clothing boundary frame in the training data image.

The clothing key point detection method based on the optimized heat map supervision mechanism provided by the invention can also have the technical characteristics that the image preprocessing comprises the following steps: step S2-1-1, obtaining a single-channel heat map through calculation according to the key points; s2-1-2, zooming the resolution of the original image; and S2-1-3, obtaining image characteristics by normalizing the single-channel heat map and the zoomed input image.

Action and Effect of the invention

According to the clothing key point detection method based on the optimization heat map supervision mechanism, a clothing key point detection training set is calculated through a multi-resolution heat map to obtain prediction heat maps of three scales, high resolution can be kept in the information transmission process due to the fact that the adopted backbone network is HRNet, redundant spatial information can be brought while the spatial resolution is increased through a method of adding a Gaussian kernel with the standard deviation gradually reduced in a gradually focusing heat map supervision mechanism, and finally a clothing key point detection network is obtained through calculating a horizontal loss function and a mean square error and training the clothing key point detection network in a back propagation mode. The clothing key point detection network is trained through the horizontal loss function, a better detection model can be trained on the premise that the calculation performance is not damaged too much, and the clothing key point detection precision is improved. By adding a Gaussian kernel with gradually reduced standard deviation in a gradually focused heat map supervision mechanism, redundant spatial information can be brought while the spatial resolution is increased.

Drawings

FIG. 1 is a flow diagram of an apparel keypoint detection network in an embodiment of the invention;

FIG. 2 is a schematic diagram of a clothing keypoint detection network in an embodiment of the invention;

FIG. 3 is a flow diagram of a clothing keypoint detection network training process in an embodiment of the invention;

FIG. 4 is a schematic diagram of a backbone network in an embodiment of the present invention;

FIG. 5 is a schematic diagram of a multi-resolution supervisory module in an embodiment of the present invention;

FIG. 6 is a diagram illustrating the variation of the standard deviation of the progressive focus heat map supervision mechanism in an embodiment of the present invention;

FIG. 7 is a schematic diagram of a horizontal loss function in an embodiment of the present invention; and

FIG. 8 is an effect diagram of a detection result of clothing key points in the embodiment of the invention.

Detailed Description

In order to make the technical means, creation features, achievement objectives and effects of the present invention easy to understand, the following describes a method for detecting key points of clothing based on an optimized heat map supervision mechanism according to the present invention in detail with reference to the following embodiments and accompanying drawings.

< example >

In the embodiment, the input image and the key points pass through the clothing key point detection network, so that a final output heat map capable of accurately positioning the key points is obtained.

Fig. 1 is a flow chart of a clothing key point detection network in an embodiment of the present invention.

Figure 2 is a schematic diagram of a clothing keypoint detection network in an embodiment of the invention.

As shown in fig. 1 and 2, the process of the clothing key point detection network includes steps S1-1 to S1-3.

And step S1-1, calculating the position of each key point of the input image through a pre-trained clothing key point detection network to obtain output heat maps with different resolutions.

And step S1-2, obtaining a final output heat map by the output heat map through difference up-sampling and arithmetic average processing.

In this embodiment, the output heat maps have three different resolutions, and the output heat maps of the three different resolutions are 128 × 8 pixels, 256 × 8 pixels, and 512 × 8 pixels, respectively, where the output heat maps having the resolutions of 128 × 8 pixels and 256 × 8 pixels are upsampled to make the resolution size 512 × 8 pixels, and are combined with the output heat map having the resolution of 512 × 8 pixels to obtain a final output heat map having the resolution of 512 × 8 pixels through arithmetic average processing.

And step S1-3, outputting the final output heat map (model final detection result).

The position coordinate at which the confidence value is the greatest in the heat map corresponds to the position coordinate of the key point. The key points are coordinates of key positions of a collar, cuffs, waistline and a lap in the clothing image, and the key points not only indicate the functional area of the clothing, but also capture the boundary frame of the clothing, so that the design, the pattern and the category of the clothing can be distinguished better.

FIG. 3 is a flow chart of a clothing keypoint detection network training process in an embodiment of the invention.

As shown in fig. 3, the training process of the clothing key point detection network specifically includes the following steps S2-1 to S2-6.

Step S2-1, image preprocessing is carried out on a pre-prepared original data set comprising an original image, a clothing boundary frame and key points, and the obtained image characteristics are enhanced through data, so that a clothing key point detection training set comprising processed images is obtained.

In this embodiment, the original dataset is the DeepFashinon-C dataset. The DeepFashinon-C dataset contained a total of 289222 apparel images, each corresponding to the position of the apparel bounding box and the position of the keypoints.

The coordinates of the keypoints are labels derived from the DeepFashinon-C dataset.

The image preprocessing includes the following steps S2-1-1 to S2-1-3.

And S2-1-1, obtaining a single-channel heat map through calculation according to the key points.

Step S2-1-2, the resolution size of the input image is scaled.

In the present embodiment, the resolution size of the input image is scaled to 512 × 512 pixels.

And S2-1-3, obtaining image characteristics by normalizing the single-channel heat map and the zoomed input image.

In the embodiment, the image preprocessing uses the HRNet network as a backbone network, and includes 4 parallel subnets that are gradually increased from high resolution to low resolution, and detects input data with a resolution size of 3 × 512 × 512.

Fig. 4 is a schematic diagram of a backbone network in an embodiment of the invention.

As shown in fig. 4, the backbone network comprises parallel sub-networks that upsample and downsample the feature patches through a convolution unit. The feature map is feature information of each stage in the network operation process, the convolution unit is convolution calculation for only increasing the depth of the feature pattern block under the condition of not changing the scale of the feature pattern block, the down sampling is convolution calculation for reducing the scale of the feature pattern block, and the up sampling is interpolation calculation for increasing the scale of the feature pattern block.

The 4 parallel subnets are respectively a one-stage parallel subnet, a two-stage parallel subnet, a three-stage parallel subnet and a four-stage parallel subnet.

Only a single branch of a single 128 x 128 resolution feature map exists in the one-stage parallel subnet; two double branches of 128 multiplied by 128 resolution characteristic graphs and 64 multiplied by 64 resolution characteristic graphs exist in the two-stage parallel subnets; three branches of 128 multiplied by 128 resolution characteristic graphs, 64 multiplied by 64 resolution characteristic graphs and 32 multiplied by 32 resolution characteristic graphs exist in the three-stage parallel sub-network; four branches of 128 × 128 resolution feature maps, 64 × 64 resolution feature maps, 32 × 32 resolution feature maps and 16 × 16 resolution feature maps exist in the four-stage parallel sub-network. There is a large number of inter-leg information connections, transfers and computations at each stage. Finally, the 128 × 128 resolution feature map, the 64 × 64 resolution feature map, the 32 × 32 resolution feature map and the 16 × 16 resolution feature map are upsampled to the same resolution 128 × 128 and are combined into an integral final feature map through a combining operation. Because the high resolution is kept and the low resolution is fully extracted, the final feature map has excellent low-layer spatial information and high-layer semantic information, and a solid feature basis can be provided for subsequent prediction.

The network components in HRNet are mainly composed of a multilayer convolution (Convolutional) layer, a Batch Normalization (BN) layer, a Rectified Linear Unit (ReLU), a block operation, and a tran operation.

The convolution layer performs local feature extraction on the image through the convolution kernel in a sliding mode, and parameters of the convolution layer are adjusted iteratively along with model training so as to achieve automatic feature extraction.

The batch normalization layer normalizes the output of the model intermediate layer, so that the convergence of the network model is accelerated while the distribution consistency is kept, and the calculation formula of the output layer is as follows:

wherein x is the input of the batch specification layer, y is the output of the batch specification layer, E [ x ] is the mean value of x, Var [ x ] is the variance of x, and gamma and beta are preset learnable parameters.

The ReLU layer increases the fitting ability of the model by a nonlinear activation mode, and the calculation formula of the output of the ReLU layer is as follows:

f(x)＝max(0,x)

where f (x) is the output of the ReLU layer and x is the input of the bulk specification layer.

block operation is a 3 × 3 convolution operation with stride of 1 and padding of 1; the tran operation connects two parallel subnets of different phases. Changing the number of channels between the same resolution features through 1 multiplied by 1 convolution; different resolution characteristics are subjected to 3 multiplied by 3 convolution operation with stride of 2 and are immediately followed by a BN layer and a ReLU layer, and the number of channels and the resolution are changed simultaneously; the fuse operation fuses feature blocks with different resolutions, reduces the resolution through 3 × 3 convolution with stride of 2, increases the resolution through interpolation up-sampling, changes the number of channels through 1 × 1 convolution after concat fusion, and is immediately followed by a BN layer and a ReLU layer after each convolution.

And performing data enhancement on the image characteristics obtained by image preprocessing.

And the data enhancement processes the preprocessed image in a random turning, random rotation and random cutting mode to obtain the image characteristics.

The random inversion is that the training data image is inverted clockwise and anticlockwise in the vertical direction with a preset probability.

The random rotation is to randomly rotate the training data image by an angle between-30 °, +30 °).

The random clipping is to randomly clip the area outside the clothing bounding box in the training data image.

And S2-2, aiming at the clothing key point detection training set, obtaining the predicted heat maps of three scales through a multi-resolution heat map supervision module in the clothing key point detection network to be trained.

FIG. 5 is a diagram of a multi-resolution supervisory module in an embodiment of the present invention.

As shown in fig. 5, the multi-resolution heat map supervision module includes an upsampling section, a three-way feature convolution section, and a multi-size supervision section.

The up-sampling part is used for up-sampling the processed image by twice deconvolution to obtain scale feature maps (

parts

1, 2 and 3 in the figure) under three different resolutions.

And the three-path characteristic convolution part is used for performing convolution twice on the three scales of characteristic graphs through a convolution kernel of 1 multiplied by 1, and obtaining the prediction heat graphs (respectively a part 4, a part 5 and a part 6 in the graphs) under three different scales in a mode of matching the number of the prediction key points by adjusting the number of channels.

And the multi-size supervision part is used for supervising the output characteristic diagram of the single-channel heat diagram (7 parts, 8 parts and 9 parts in the diagram) in three dimensions of small size, medium size and large size.

And step S2-3, calculating by a step-by-step focusing heat degree supervision module aiming at the clothing key point detection training set to obtain a real heat degree diagram.

The step-by-step focusing heat degree monitoring module generates a real heat map by performing Gaussian blur on the positions of key points in an original image.

The step-by-step focusing heat supervision module comprises a Gaussian kernel.

Wherein the gaussian kernel has a standard deviation that gradually decreases as the training time progresses, and the standard deviation decreases to a smaller value by decreasing to half the previous standard deviation.

Fig. 6 is a variation of the standard deviation of the progressive focus heat map supervision mechanism in an embodiment of the present invention.

As shown in fig. 6, the gaussian kernel in the first five cycles of training uses a larger standard deviation, and the standard deviation of the gaussian kernel gradually decreases in the later period.

In the first five periods of training, gradient return is carried out on the detection network of the key points of the clothing to be trained by using the mean square error, so that the convergence speed of the model is ensured; and in the later period, the total loss function is used for carrying out gradient return on the key point detection network of the clothes to be trained and updating the parameters of the network model.

And step S2-4, calculating a horizontal loss function and a mean square error by comparing the predicted heat map with the real heat map.

FIG. 7 is a schematic of a horizontal loss function in an embodiment of the invention.

As shown in fig. 7, the maximum confidence value in the predicted heat map is calculated by calculating the height gap and the level gap.

Wherein, the calculation process of the horizontal loss function is as follows:

where HL means the horizontal loss function, y and

pixel values on the true heat map and the predicted heat map, respectively.

Is the maximum confidence value in the predicted heat map.

The mean square error is calculated as follows:

where MSE is the mean square error, y and

pixel values on the real heat map and the predicted heat map, respectively.

And step S2-5, calculating a total loss function by the horizontal loss function and the mean square error.

The total loss function is calculated by a mean square error and a horizontal loss function, and the calculation process is as follows:

L_total＝MSE+α×HL

in the formula, L_totalRepresents the total loss function, MSE represents the mean square error, HL represents the horizontal loss function, and α is the hyperparameter.

In the present embodiment, the hyperparameter α in the total loss function is 0.5.

And step S2-6, performing back propagation by a predetermined gradient rotation method through the mean square error and the total loss function, and training the clothing key point detection network to obtain the clothing key point detection network.

The preset gradient rotation method comprises the following steps: and in the first five periods of training, only using the mean square error to perform gradient back transmission on the clothing key point detection network to be trained, performing gradient back transmission on the clothing key point detection network to be trained through the total loss function, and updating parameters of the network model.

And (3) using a larger standard deviation in the initial training stage to ensure the convergence speed of the model, and gradually reducing the standard deviation to a smaller value along with the gradual progress of the training period.

In this embodiment, the training is performed on the backbone network to be trained by an Adam optimizer, wherein the learning rate is set to 0.00011, the learning rate attenuation coefficient is 0.9, the batch size is 16, and the training period is 30.

As shown in fig. 8, in fig. 7, each keypoint in each input image can be accurately identified and marked for each output image, and it can be seen that, when the clothing keypoint detection model of the present embodiment is directed at the keypoint detection task of the public data set deep fast-C, the precision of clothing keypoint detection can be improved on the premise of not impairing the calculation performance too much, so that the average detection normalization error is reduced from 0.0441 to 0.0334, and the precision of the first-line detection method in the academic world is achieved.

Examples effects and effects

According to the clothing key point detection method based on the optimized heat map supervision mechanism, the input images are calculated according to the positions of all key points through a pre-trained clothing key point detection network to obtain output heat maps with different resolutions, and the positions of the key points are used as important bottom-layer space information providing sources and have wide application places; and the final output heat map output is obtained through difference value up-sampling and arithmetic average processing, so that the effective detection of the clothing key points is realized. The training process of the clothing key point detection network comprises the following steps: the method comprises the steps that a clothing key point detection training set is calculated through a multi-resolution heat map to obtain prediction heat maps of three scales, and the adopted backbone network is an HRNet with better generalization, so that high resolution can be kept in the information transmission process; then, a Gaussian kernel with gradually reduced standard deviation is added in a gradually focused heat map supervision mechanism, so that redundant spatial information can be brought while the spatial resolution is increased; and finally, training the clothing key point detection network in a back propagation mode by calculating a horizontal loss function and a mean square error so as to obtain the clothing key point detection network. Therefore, the clothing key point detection network has better detection effect on the premise of not damaging the calculation performance too much, thereby improving the precision of clothing key point detection.

In an embodiment, 1+2y is used as an index, so that a larger gradient can be obtained at a position where y takes a larger value for back propagation, so that the gradient of the computing backbone network is related to the distance between the predicted value and the real value, and a maximum value can be obtained at the confidence peak (i.e. the coordinate position of the key point) of the real heat map at the back propagated gradient.

In the embodiment, the clothing boundary box is considered when data enhancement is carried out, so that the phenomenon that the detection result is inaccurate due to the fact that clothing is cut off by random cutting is avoided.

The above-described embodiments are merely illustrative of specific embodiments of the present invention, and the present invention is not limited to the description of the above-described embodiments.

Claims

1. A clothing key point detection method based on an optimized heat map supervision mechanism is characterized by comprising the following steps:

step S1-1, calculating the position of each key point of an input image through a pre-trained clothing key point detection network to obtain output heat maps with different resolutions;

step S1-2, the output heat map obtains the final output heat map through difference up-sampling and arithmetic average processing;

step S1-3, outputting the final output heat map,

the training process of the clothing key point detection network specifically comprises the following steps:

step S2-1, image preprocessing is carried out on a pre-prepared original data set comprising an original image, a clothing boundary frame and the key points, and the obtained image characteristics are enhanced through data, so that a clothing key point detection training set comprising processed images is obtained;

step S2-2, aiming at the clothing key point detection training set, obtaining predicted heat maps of three scales through a multi-resolution heat map supervision module in the clothing key point detection network to be trained;

step S2-3, calculating a real heat map by a step-by-step focusing heat supervision module aiming at the clothing key point detection training set;

step S2-4, calculating a level loss function and a mean square error by comparing the predicted heat map and the real heat map;

step S2-5, calculating a total loss function according to the horizontal loss function and the mean square error;

and S2-6, performing back propagation by the predetermined gradient rotation method through the mean square error and the total loss function, and training the clothing key point detection network to obtain the clothing key point detection network.

2. The clothing key point detection method based on the optimized heat map supervision mechanism according to claim 1, characterized in that:

wherein, the horizontal loss function calculation process is as follows:

where HL is the horizontal loss function, y and

pixel values on the real heat map and the predicted heat map respectively,

is the maximum confidence value in the predicted heat map,

the mean square error is calculated as follows:

where MSE is the mean square error, y and

pixel values on the real heat map and the predicted heat map, respectively.

3. The clothing key point detection method based on the optimized heat map supervision mechanism according to claim 2, characterized in that:

wherein, the total loss function is obtained by jointly calculating the mean square error and the horizontal loss function, and the calculation process is specifically as follows:

L_total＝MSE+α×HL

in the formula, L_totalFor the total loss function, MSE is the mean square error, HLFor the level loss function, α is a hyperparameter.

4. The clothing key point detection method based on the optimized heat map supervision mechanism according to claim 1, characterized in that:

wherein the multi-resolution heat map supervision module comprises an up-sampling part, a three-way characteristic convolution part and a multi-size supervision part,

the up-sampling part is used for up-sampling the processed image by twice deconvolution to obtain three scale characteristic maps under different resolutions,

the three-way characteristic convolution part is used for carrying out two times of convolution on the scale characteristic graphs under three different resolutions through a convolution kernel of 1 multiplied by 1 and obtaining output heat graphs under three different scales in a mode of matching and predicting the number of key points by adjusting the number of channels,

the multi-scale supervision module supervises the output heat map at three scales for the single channel heat map at three scales.

5. The clothing key point detection method based on the optimized heat map supervision mechanism according to claim 1, characterized in that:

wherein the progressive focus heat surveillance module generates a real heat map by Gaussian blurring of the locations of the keypoints in the original image,

the step-by-step focusing heat supervision module comprises a Gaussian kernel,

the gaussian kernel has a standard deviation that decreases gradually as training time progresses, the standard deviation decreasing to a smaller value by half the previous standard deviation.

6. The clothing key point detection method based on the optimized heat map supervision mechanism according to claim 1, characterized in that:

wherein, the preset gradient rotation method is to perform gradient return on the key point detection network of the clothing to be trained only by using the mean square error in the first five periods of training, perform gradient return on the key point detection network of the clothing to be trained through the total loss function and update the parameters of the network model,

7. The clothing key point detection method based on the optimized heat map supervision mechanism according to claim 1, characterized in that:

wherein the data enhancement processes the image features by random turning, random rotation and random cutting to obtain the clothing key point detection data set,

the random inversion is to invert the training data image clockwise and anticlockwise in the vertical direction with a preset probability,

the random rotation is to rotate the training data image at an angle of-30 °, +30 °,

the random cropping is performed on the region outside the clothing bounding box in the training data image.

8. The clothing key point detection method based on the optimized heat map supervision mechanism according to claim 1, characterized in that:

wherein the image pre-processing comprises the steps of:

step S2-1-1, obtaining a single-channel heat map through calculation according to the key points;

step S2-1-2, zooming the resolution of the original image;

and S2-1-3, performing normalization processing on the single-channel heat map and the scaled input image to obtain the image characteristics.