CN113569627A

CN113569627A - Human body posture prediction model training method, human body posture prediction method and device

Info

Publication number: CN113569627A
Application number: CN202110658308.5A
Authority: CN
Inventors: 杜昂昂; 王志成
Original assignee: Beijing Kuangshi Technology Co Ltd; Beijing Megvii Technology Co Ltd
Current assignee: Beijing Kuangshi Technology Co Ltd; Beijing Megvii Technology Co Ltd
Priority date: 2021-06-11
Filing date: 2021-06-11
Publication date: 2021-10-29
Anticipated expiration: 2041-06-11
Also published as: CN113569627B

Abstract

The application provides a human body posture prediction model training method, a human body posture prediction method and a human body posture prediction device. The method comprises the following steps: acquiring a labeled training set and an unlabeled training set, wherein the labeled training set comprises a plurality of first human body images containing labeled data, and the labeled data is used for representing real posture information in the first human body images; inputting the first human body image into a generator in a human body posture prediction model to obtain a corresponding first human body posture prediction result, and calculating a first loss value of the generator according to the labeling data and the first human body posture prediction result; inputting the second human body image into a generator to obtain a corresponding second human body posture prediction result; calculating a second loss value corresponding to the discriminator in the human body posture prediction model according to the first human body image, the annotation data, the second human body image and the second human body posture prediction result; and optimizing the generator and the discriminator according to the first loss value and the second loss value to obtain a human body posture prediction model.

Description

Human body posture prediction model training method, human body posture prediction method and device

Technical Field

The application relates to the technical field of artificial intelligence, in particular to a human posture prediction model training method, a human posture prediction method and a human posture prediction device.

Background

The human body posture estimation means that the human body posture and key points in the image are predicted. In essence, human pose estimation abstracts the position of various parts of the human body in an image into a set of structured coordinates. The human body posture estimation technology has important application in the fields of human-computer interaction, image retrieval, abnormity detection, action prediction and the like.

Most of the existing human body posture estimation methods rely on a large amount of labeled data. The data size of the labeled data has great influence on the final effect of the human body posture estimation. Although the data volume of the body posture estimation data set is greatly improved compared with the prior art, it is still very difficult to construct a large body posture estimation data set. For example, the MPII human pose estimation dataset contains 2 million 5 thousand images, and around 4 million human poses, which is much less than the million magnitude dataset required for image classification and image detection. This is because the labeling of the human pose estimation data set is more demanding, more elaborate, and more complex, and therefore requires a significant expenditure of labor and time.

Deep learning has gained a lot of attention since birth and has also gained application in the field of human posture estimation. In contrast to deep learning, supervised training using a large amount of training data has become the mainstream of human posture estimation technology. However, relying too much on labeled training data has hindered the advancement of human body posture estimation techniques to some extent. The technologies such as internet and the like are rapidly growing and bring a large amount of data, and it is not imaginable to use manpower to label such a large amount of data. Therefore, how to estimate the human body posture by using massive non-labeled data is also an urgent problem to be solved in the prior art.

Disclosure of Invention

An object of the embodiments of the present application is to provide a human body posture prediction model training method, a human body posture prediction method, and a device, which can combine a plurality of unlabeled second human body images to perform countermeasure training, thereby improving accuracy and reducing the amount of demand for labeled first human body images.

In a first aspect, an embodiment of the present application provides a human body posture prediction model training method, including: acquiring a labeled training set and an unlabeled training set, wherein the labeled training set comprises a plurality of first human body images containing labeled data, and the labeled data is used for representing real posture information in the first human body images; the unmarked training set comprises a plurality of second human body images which do not contain marking data; inputting the first human body image into a generator in a human body posture prediction model to obtain a corresponding first human body posture prediction result, and calculating a first loss value of the generator according to the labeling data and the first human body posture prediction result; inputting the second human body image into the generator to obtain a corresponding second human body posture prediction result; calculating a second loss value corresponding to a discriminator in the human body posture prediction model according to the first human body image, the labeled data, the second human body image and the second human body posture prediction result; and optimizing the generator and the discriminator according to the first loss value and the second loss value to obtain the human body posture prediction model.

Optionally, in the method for training a human body posture prediction model according to the embodiment of the present application, the inputting the first human body image into a generator in the human body posture prediction model to obtain a corresponding first human body posture prediction result, and calculating a first loss value of the generator according to the labeled data and the first human body posture prediction result includes: inputting the first human body image into the generator to obtain a predicted human body posture heat map of a corresponding multi-channel; wherein the predicted human posture heat map of each channel predicts a human body key point position; generating a corresponding reference human body posture heat map based on the annotation data corresponding to the first human body image; calculating the first loss value from the predicted human pose heat map and the reference human pose heat map.

Optionally, in the method for training a human body posture prediction model according to the embodiment of the present application, the calculating a second loss value corresponding to a discriminator in the human body posture prediction model according to the first human body image, the annotation data, the second human body image, and the second human body posture prediction result includes: inputting the first human body image and the reference human body posture heat map into the discriminator as a true data sequence, and inputting the second human body image and the second human body posture prediction result into the discriminator as a false data sequence to respectively obtain discrimination results output by the discriminator; and calculating the second loss value according to the judgment result.

Optionally, in the human body posture prediction model training method according to the embodiment of the present application, the optimizing the generator and the discriminator according to the first loss value and the second loss value includes: and updating the network parameters of the generator according to the first loss value and the second loss value, and updating the network parameters of the discriminator according to the second loss value.

Optionally, in the human body posture detection model training method according to the embodiment of the present application, the obtaining a labeled training set and an unlabeled training set includes: acquiring an original labeling training set containing labeling data and an original unlabeled training set not containing labeling data; respectively carrying out human body detection on the images in the original labeled training set and the images in the original unlabeled training set by using a pre-trained human body detection model to obtain the first human body images in the labeled training set and the second human body images in the unlabeled training set; wherein the first human body image and the second human body image are both single person images.

In a second aspect, an embodiment of the present application provides a method for predicting a human body posture, including: acquiring a third human body image; and inputting the third human body image into a generator in the human body posture prediction model obtained by adopting the human body posture prediction model training method in the first aspect to obtain a third human body posture prediction result.

In a third aspect, an embodiment of the present application provides a human body posture prediction model training device, including: the system comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring a labeled training set and an unlabeled training set, the labeled training set comprises a plurality of first human body images containing labeled data, and the labeled data is used for representing real posture information in the first human body images; the unmarked training set comprises a plurality of second human body images which do not contain marking data; the first input module is used for inputting the first human body image into a generator in a human body posture prediction model to obtain a corresponding first human body posture prediction result, and calculating a first loss value of the generator according to the labeling data and the first human body posture prediction result; the second input module is used for inputting the second human body image into the generator to obtain a corresponding second human body posture prediction result; a calculating module, configured to calculate a second loss value corresponding to a discriminator in the human body posture prediction model according to the first human body image, the annotation data, the second human body image, and the second human body posture prediction result; and the optimization module is used for optimizing the generator and the discriminator according to the first loss value and the second loss value to obtain the human body posture prediction model.

Optionally, in the human body posture detection model training device according to the embodiment of the present application, the first input module is specifically configured to: inputting the first human body image into the generator to obtain a predicted human body posture heat map of a corresponding multi-channel; wherein the predicted human posture heat map of each channel predicts a human body key point position; generating a corresponding reference human body posture heat map based on the annotation data corresponding to the first human body image; calculating the first loss value from the predicted human pose heat map and the reference human pose heat map.

Optionally, in the human body posture detection model training apparatus according to the embodiment of the present application, the calculation module is specifically configured to: inputting the first human body image and the reference human body posture heat map into the discriminator as a true data sequence, and inputting the second human body image and the second human body posture prediction result into the discriminator as a false data sequence to respectively obtain discrimination results output by the discriminator; and calculating the second loss value according to the judgment result.

Optionally, in the human body posture detection model training device according to the embodiment of the present application, the optimization module is specifically configured to: and updating the network parameters of the generator according to the first loss value and the second loss value, and updating the network parameters of the discriminator according to the second loss value.

Optionally, in the human body posture detection model training device according to the embodiment of the present application, the first obtaining module is specifically configured to: the acquiring of the labeled training set and the unlabeled training set includes: acquiring an original labeling training set containing labeling data and an original unlabeled training set not containing labeling data; respectively carrying out human body detection on the images in the original labeled training set and the images in the original unlabeled training set by using a pre-trained human body detection model to obtain the first human body images in the labeled training set and the second human body images in the unlabeled training set; wherein the first human body image and the second human body image are both single person images.

In a fourth aspect, an embodiment of the present application provides a human body posture prediction apparatus, including: the second acquisition module is used for acquiring a third human body image; and the prediction module is used for inputting the third human body image into a generator in the human body posture prediction model obtained by adopting the human body posture prediction model training method in the first aspect to obtain a third human body posture prediction result.

In a fifth aspect, embodiments of the present application provide an electronic device, including a processor and a memory, where the memory stores computer-readable instructions that, when executed by the processor, perform the method as in the first aspect or the method as in the second aspect.

In a sixth aspect, embodiments of the present application provide a storage medium having a computer program stored thereon, where the computer program is executed by a processor to perform the method as described in the first aspect or the method as described in the second aspect.

As can be seen from the above, the human body posture prediction model training method, the human body posture prediction method and the device provided by the embodiment of the present application acquire a labeled training set and an unlabeled training set, where the labeled training set includes a plurality of first human body images containing labeled data, and the labeled data is used to represent real posture information in the first human body images; the unmarked training set comprises a plurality of second human body images which do not contain marking data; inputting the first human body image into a generator in a human body posture prediction model to obtain a corresponding first human body posture prediction result, and calculating a first loss value of the generator according to the labeling data and the first human body posture prediction result; inputting the second human body image into the generator to obtain a corresponding second human body posture prediction result; calculating a second loss value corresponding to a discriminator in the human body posture prediction model according to the first human body image, the labeled data, the second human body image and the second human body posture prediction result; and optimizing the generator and the discriminator according to the first loss value and the second loss value to obtain the human body posture prediction model. Therefore, due to the fact that the unmarked training set is applied in the training process, the anti-confrontation training is performed by combining the plurality of unmarked second human body images under the condition that the marked first human body images are insufficient, the accuracy can be improved, and the demand for the marked first human body images is reduced.

Additional features and advantages of the present application will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by the practice of the embodiments of the present application. The objectives and other advantages of the application may be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments of the present application will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and that those skilled in the art can also obtain other related drawings based on the drawings without inventive efforts.

Fig. 1 is a flowchart of a human body posture prediction model training method according to an embodiment of the present disclosure;

fig. 2 is a flowchart of a human body posture prediction method according to an embodiment of the present application;

fig. 3 is a schematic structural diagram of a human body posture prediction model training device according to an embodiment of the present application;

fig. 4 is a schematic structural diagram of a human body posture prediction apparatus according to an embodiment of the present application;

fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present application, presented in the accompanying drawings, is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present application without making any creative effort, shall fall within the protection scope of the present application.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures. Meanwhile, in the description of the present application, the terms "first", "second", and the like are used only for distinguishing the description, and are not to be construed as indicating or implying relative importance.

In recent years, technical research based on artificial intelligence, such as computer vision, deep learning, machine learning, image processing, and image recognition, has been actively developed. Artificial Intelligence (AI) is an emerging scientific technology for studying and developing theories, methods, techniques and application systems for simulating and extending human Intelligence. The artificial intelligence subject is a comprehensive subject and relates to various technical categories such as chips, big data, cloud computing, internet of things, distributed storage, deep learning, machine learning and neural networks. Computer vision is used as an important branch of artificial intelligence, particularly a machine is used for identifying the world, and the computer vision technology generally comprises the technologies of face identification, living body detection, fingerprint identification and anti-counterfeiting verification, biological feature identification, face detection, pedestrian detection, target detection, pedestrian identification, image processing, image identification, image semantic understanding, image retrieval, character identification, video processing, video content identification, behavior identification, three-dimensional reconstruction, virtual reality, augmented reality, synchronous positioning and map construction (SLAM), computational photography, robot navigation and positioning and the like. With the research and progress of artificial intelligence technology, the technology is applied to various fields, such as security, city management, traffic management, building management, park management, face passage, face attendance, logistics management, warehouse management, robots, intelligent marketing, computational photography, mobile phone images, cloud services, smart homes, wearable equipment, unmanned driving, automatic driving, smart medical treatment, face payment, face unlocking, fingerprint unlocking, testimony verification, smart screens, smart televisions, cameras, mobile internet, live webcasts, beauty treatment, medical beauty treatment, intelligent temperature measurement and the like.

Referring to fig. 1, fig. 1 is a flowchart of a human body posture prediction model training method according to an embodiment of the present disclosure. The human body posture prediction model training method can comprise the following steps:

s101, acquiring a labeled training set and an unlabeled training set.

S102, inputting the first human body image into a generator in the human body posture prediction model to obtain a corresponding first human body posture prediction result, and calculating a first loss value of the generator according to the labeling data and the first human body posture prediction result.

And S103, inputting the second human body image into the generator to obtain a corresponding second human body posture prediction result.

And S104, calculating a second loss value corresponding to the discriminator in the human body posture prediction model according to the first human body image, the annotation data, the second human body image and the second human body posture prediction result.

And S105, optimizing the generator and the discriminator according to the first loss value and the second loss value to obtain a human body posture prediction model.

Specifically, in the step S101, the annotation training set includes a plurality of first human body images including annotation data. As an embodiment, a plurality of first human body images containing annotation data can be obtained from a public database; as another embodiment, a human body image not containing the annotation data may be obtained from the network, and then a plurality of first human body images containing the annotation data may be obtained from the human body image not containing the annotation data.

The annotation data is used for representing real posture information in the first human body image, and the real posture information can include human body key node information (the human body key points include positions of the top of the head, the neck, the left shoulder, the left elbow, the left wrist, the right shoulder, the right elbow, the right wrist and the like), human body posture information and the like.

It will be appreciated that there are a number of ways to derive the first human image from a human image that does not contain annotation data, for example: processing the human body image by utilizing the existing human body posture prediction model; or, labeling the human body image manually, and the like, which is not specifically limited in the embodiment of the present application.

Similarly, in the step S101, the unlabeled training set includes a plurality of second human body images that do not include the labeled data, that is, the second human body images are original images that only include the human body postures that have not been subjected to the labeling processing. It is understood that the second human body image can be obtained through various ways such as disclosing an unlabeled portion (e.g., COCO unlabeled) in the dataset.

Further, both the first human body image and the second human body image may be human body images including only one person. Therefore, as for the scene images with a plurality of persons, the scene images can be preprocessed through a pre-trained human body detection model, then the images of each person are scratched out to obtain the corresponding human body images, and the specific steps are as follows:

the method comprises the following steps of firstly, obtaining an original labeled training set containing labeled data and an original unlabeled training set containing no labeled data.

And secondly, respectively carrying out human body detection on the images in the original labeled training set and the images in the original unlabeled training set by using a pre-trained human body detection model to obtain a first human body image in the labeled training set and a second human body image in the unlabeled training set.

In steps S102 to S105, the body posture prediction model provided in the embodiment of the present application may be a generated countermeasure network (GAN), and the body posture prediction model may include two parts, i.e., a generator and a discriminator.

In step S102, for the first human body image input to the generator, the generator outputs a corresponding first human body posture prediction result. As an embodiment, the first human pose prediction output by the generator may comprise a plurality of channels of human pose heat maps. The body pose heat map for each channel may be used to represent a body keypoint location predicted by the generator, and thus, the body pose heat maps for multiple channels are used to represent different body keypoint locations predicted by the generator, respectively. It will be appreciated that on the human pose heat map, the probability that the location is a predicted human keypoint location may be represented in different colors, for example: the closer the color on the human pose heat map is to the red, the more likely this location is to be the predicted corresponding human keypoint location.

After the generator outputs the first human body posture prediction result, a first loss value of the generator can be calculated according to the first human body posture prediction result and the annotation data contained in the first human body image. The step S102 may specifically include the following steps:

the first step is to input the first human body image into a generator to obtain a predicted human body posture heat map corresponding to multiple channels.

And secondly, generating a corresponding reference human body posture heat map based on the annotation data corresponding to the first human body image.

And thirdly, calculating a first loss value according to the predicted human body posture heat map and the reference human body posture heat map.

In the above steps, first, a first human body image may be input to the generator to obtain a predicted human body posture heat map; then, directly mapping the annotation data corresponding to the first human body image into a real human body posture heat map through an algorithm; finally, a first loss value of the generator may be calculated based on the predicted human pose heat map and the real human pose heat map, wherein the first loss value characterizes a difference between the predicted human pose heat map and the real human pose heat map.

In step S103, for the second human body image input to the generator, the generator outputs a corresponding second human body posture prediction result. Similar to the first human pose prediction, as an embodiment, the second human pose prediction output by the generator may also include a plurality of channels of human pose heatmaps. And the second human body image does not contain the annotation data, so the confidence of the human body posture heat map corresponding to the second human body posture prediction result output by the generator is lower.

In step S104, the step S104 may specifically include the following steps:

the method comprises the following steps that firstly, a first human body image and a reference human body posture heat map are used as true data sequences to be input into a discriminator, and a second human body image and a second human body posture prediction result are used as false data sequences to be input into the discriminator to respectively obtain discrimination results output by the discriminator;

and secondly, calculating a second loss value according to the judgment result.

In the above steps, the discriminator is a key module for assigning a field to the second human body image not containing the labeling data. Similar to the existing confrontation training network model, the input of the discriminator provided by the embodiment of the application comprises two parts of true and false. The first human body image containing the annotation data and the reference human body posture heat map generated according to the corresponding annotation data form the input of a 'true' end; the second human body image not containing the labeling data and the corresponding second human body posture prediction result output by the generator form a false end input. The discriminator makes a true and false discrimination on the input data and outputs a probability between 0 and 1 of true and false, thereby calculating a second loss value corresponding to the discriminator.

In this step S105, the corresponding total loss value can be calculated by using the first loss value and the second loss value obtained in the above embodiments. Then, the generator and the discriminator in the human body posture prediction model provided by the embodiment of the application can be optimized by adopting the total loss value, so that the trained human body posture prediction model is obtained.

As an embodiment, the second penalty value is optimized separately for the arbiter and the generator, while the first penalty value is optimized only for the generator. The step S105 may specifically include the following steps:

and updating the network parameters of the generator according to the first loss value and the second loss value, and updating the network parameters of the discriminator according to the second loss value.

In the above steps, when the discriminator is optimized, the second loss value will force the discriminator to make correct true and false discrimination, and when the input is "true" data, the second loss value will make the output of the discriminator as close to 1 as possible; when the input is "false" data, the second penalty value will bring the output of the discriminator as close to 0 as possible.

When the generator is optimized, the first loss value and the second loss value enable the human posture heat map predicted by the generator to conform to the real human posture heat map distribution (as real as possible).

The first loss value and the second loss value enable the two arbiter and generator modules to compete with each other and jointly advance by alternately updating the network parameters of the arbiter and generator.

Further, in some embodiments, the step of obtaining the label training set in step S101 may include the following sub-steps:

the method comprises the steps of firstly, obtaining a plurality of first original images with annotation data; each of the first raw images includes pose data of a human body.

And secondly, carrying out scaling processing on the plurality of first original images to obtain first human body images with the same size and specification.

In the above steps, the accuracy rate of the subsequent training of the generator can be improved and the loss can be reduced by setting the plurality of first human body images to be the same size specification. Of course, for the unlabeled training set, the same processing manner as that of the labeled training set may be adopted, and details are not described here.

As can be seen from the above, the human body posture detection model training method provided in the embodiment of the present application obtains the labeled training set and the unlabeled training set, where the labeled training set includes a plurality of first human body images containing labeled data, and the labeled data is used to represent real posture information in the first human body images; the unmarked training set comprises a plurality of second human body images which do not contain marking data; inputting the first human body image into a generator in a human body posture prediction model to obtain a corresponding first human body posture prediction result, and calculating a first loss value of the generator according to the labeling data and the first human body posture prediction result; inputting the second human body image into a generator to obtain a corresponding second human body posture prediction result; calculating a second loss value corresponding to the discriminator in the human body posture prediction model according to the first human body image, the annotation data, the second human body image and the second human body posture prediction result; and optimizing the generator and the discriminator according to the first loss value and the second loss value to obtain a human body posture prediction model. Therefore, due to the fact that the unmarked training set is applied in the training process, the anti-confrontation training is performed by combining the plurality of unmarked second human body images under the condition that the marked first human body images are insufficient, the accuracy can be improved, and the demand for the marked first human body images is reduced.

Referring to fig. 2, fig. 2 is a flowchart of a human body posture prediction method according to an embodiment of the present application. The human body posture prediction method can comprise the following steps:

s201, acquiring a third human body image.

S202, inputting the third human body image into a generator in the human body posture prediction model obtained by adopting a human body posture prediction model training method to obtain a third human body posture prediction result.

Specifically, in step S201, a third human body image to be predicted may be obtained, where the third human body image may be an original image that does not include annotation data, and the posture of the human body in the third human body image may be predicted by the human body posture prediction method provided in the embodiment of the present application.

In step S202, the third human body image acquired in step S201 may be input into a generator of a human body posture prediction model trained in advance, where the human body posture prediction model may be obtained by using the human body posture prediction model training method in the foregoing embodiment. The generator can output a third human body posture prediction result corresponding to the third human body image according to the third human body image, and the human body posture in the third human body image can be predicted.

Referring to fig. 3, fig. 3 is a schematic structural diagram of a human body posture prediction model training device 300 according to the present application, including: a first obtaining module 301, a first input module 302, a second input module 303, a calculating module 304, and an optimizing module 305.

The first obtaining module 301 is configured to obtain a labeled training set and an unlabeled training set.

The annotation training set includes a plurality of first human images containing annotation data. As an embodiment, a plurality of first human body images containing annotation data can be obtained from a public database; as another embodiment, a human body image not containing the annotation data may be obtained from the network, and then a plurality of first human body images containing the annotation data may be obtained from the human body image not containing the annotation data.

Similarly, the unlabeled training set includes a plurality of second human body images that do not include the labeled data, that is, the second human body images are original images that only include the human body poses that are not subjected to the labeling processing. It is understood that the second human body image can be obtained through various ways such as disclosing an unlabeled portion (e.g., COCO unlabeled) in the dataset.

The first input module 302 is configured to input a first human body image into a generator in a human body posture prediction model to obtain a corresponding first human body posture prediction result, and calculate a first loss value of the generator according to the annotation data and the first human body posture prediction result.

For a first human body image input to the generator, the generator outputs a corresponding first human body posture prediction result. As an embodiment, the first human pose prediction output by the generator may comprise a plurality of channels of human pose heat maps. The body pose heat map for each channel may be used to represent a body keypoint location predicted by the generator, and thus, the body pose heat maps for multiple channels are used to represent different body keypoint locations predicted by the generator, respectively. It will be appreciated that the closer the color on the human pose heat map is to the red, the more likely it is that the location is the predicted corresponding human keypoint location.

After the generator outputs the first human body posture prediction result, a first loss value of the generator can be calculated according to the first human body posture prediction result and the annotation data contained in the first human body image.

In the human body posture detection model training apparatus 300 according to the embodiment of the present application, the first input module 302 is specifically configured to: inputting the first human body image into a generator to obtain a predicted human body posture heat map corresponding to multiple channels; wherein the predicted human posture heat map of each channel predicts a human body key point position; generating a corresponding reference human body posture heat map based on the annotation data corresponding to the first human body image; a first loss value is calculated from the predicted human pose heat map and the reference human pose heat map.

First, a first human body image may be input to a generator to obtain a predicted human body posture heat map; then, directly mapping the annotation data corresponding to the first human body image into a real human body posture heat map through an algorithm; finally, based on the predicted human pose heat map and the real human pose heat map, a first loss value for the generator may be calculated.

The second input module 303 is configured to input a second human body image into the generator to obtain a corresponding second human body posture prediction result.

For the second human body image input into the generator, the generator outputs a corresponding second human body posture prediction result. Similar to the first human pose prediction, as an embodiment, the second human pose prediction output by the generator may also include a plurality of channels of human pose heatmaps. And the second human body image does not contain the annotation data, so the confidence of the human body posture heat map corresponding to the second human body posture prediction result output by the generator is lower.

The calculating module 304 is configured to calculate a second loss value corresponding to the discriminator in the human body posture prediction model according to the first human body image, the labeled data, the second human body image, and the second human body posture prediction result.

In the human body posture detection model training device 300 according to the embodiment of the present application, the calculation module 304 is specifically configured to: inputting the first human body image and the reference human body posture heat map into a discriminator as a true data sequence, and inputting the second human body image and the second human body posture prediction result into the discriminator as a false data sequence to respectively obtain discrimination results output by the discriminator; and calculating a second loss value according to the judgment result.

In the above steps, the discriminator is a key module for assigning a field to the second human body image not containing the labeling data. Similar to the existing confrontational training network model, the input of the discriminator comprises two parts of true and false. The first human body image containing the annotation data and the reference human body posture heat map generated by the corresponding annotation data form the input of a 'true' end; the second human body image which does not contain the labeling data and the corresponding second human body posture prediction result form a false end input. The discriminator determines whether the input data is true or false, and outputs a probability of true or false between 0 and 1, thereby calculating the resistance loss function.

The optimization module 305 is configured to optimize the generator and the discriminator according to the first loss value and the second loss value to obtain a human body posture prediction model.

The first loss value and the second loss value obtained in the above embodiments may be used to calculate a corresponding total loss function. Then, the generator and the discriminator in the human body posture prediction model provided by the embodiment of the application can be optimized by adopting the total loss function, so that the trained human body posture prediction model is obtained.

As an embodiment, the second penalty value is optimized separately for the arbiter and the generator, while the first penalty value is optimized only for the generator. In the human body posture detection model training device 300 according to the embodiment of the present application, the optimization module 305 is specifically configured to: and updating the network parameters of the generator according to the first loss value and the second loss value, and updating the network parameters of the discriminator according to the second loss value.

When the discriminator is optimized, the second loss value forces the discriminator to make correct true and false discrimination, and when the input data is 'true' data, the output of the discriminator is close to 1 as much as possible; when the input is "false" data, the second penalty value will bring the output of the discriminator as close to 0 as possible.

Further, in some embodiments, the first obtaining module 301 is specifically configured to: acquiring a plurality of first original images with annotation data; each first original image comprises posture data of a human body; and carrying out scaling processing on the plurality of first original images to obtain first human body images with the same size specification.

The accuracy rate of the plurality of first human body images during subsequent training of the generator can be improved by setting the plurality of first human body images to be in the same size specification, and loss is reduced. Of course, correspondingly, for unlabeled training sets.

As can be seen from the above, the human body posture detection model training device 300 provided in the embodiment of the present application obtains a labeled training set and an unlabeled training set, where the labeled training set includes a plurality of first human body images containing labeled data, and the labeled data is used to represent real posture information in the first human body images; the unmarked training set comprises a plurality of second human body images which do not contain marking data; inputting the first human body image into a generator in a human body posture prediction model to obtain a corresponding first human body posture prediction result, and calculating a first loss value of the generator according to the labeling data and the first human body posture prediction result; inputting the second human body image into a generator to obtain a corresponding second human body posture prediction result; calculating a second loss value corresponding to the discriminator in the human body posture prediction model according to the first human body image, the annotation data, the second human body image and the second human body posture prediction result; and optimizing the generator and the discriminator according to the first loss value and the second loss value to obtain a human body posture prediction model. Therefore, due to the fact that the unmarked training set is applied in the training process, the anti-confrontation training is performed by combining the plurality of unmarked second human body images under the condition that the marked first human body images are insufficient, the accuracy can be improved, and the demand for the marked first human body images is reduced.

Referring to fig. 4, fig. 4 is a schematic structural diagram of a human body posture prediction apparatus 400 according to an embodiment of the present application, including: a second acquisition module 401 and a prediction module 402.

The second obtaining module 401 is configured to obtain a third human body image.

The method for predicting the human body posture can be used for predicting the posture of the human body in the third human body image.

The prediction module 402 is configured to input the third human body image into a generator in a human body posture prediction model obtained by using a human body posture prediction model training method, so as to obtain a third human body posture prediction result.

The third human body image acquired in step S201 may be input into a generator of a human body posture prediction model trained in advance, where the human body posture prediction model may be obtained by using the human body posture prediction model training method in the foregoing embodiment. The generator can output a third human body posture prediction result corresponding to the third human body image according to the third human body image, and the human body posture in the third human body image can be predicted.

Referring to fig. 5, fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure, which provides an electronic device 500, including: the processor 501 and the memory 502, the processor 501 and the memory 502 being interconnected and communicating with each other via a communication bus 503 and/or other form of connection mechanism (not shown), the memory 502 storing a computer program executable by the processor 501, the computer program being executable by the processor 501 when the computing device is running, the processor 501 executing the computer program to perform the method of any of the alternative implementations of the embodiments described above.

The embodiment of the present application provides a storage medium, and when being executed by a processor, the computer program performs the method in any optional implementation manner of the above embodiment.

The storage medium may be implemented by any type of volatile or nonvolatile storage device or combination thereof, such as a Static Random Access Memory (SRAM), an Electrically Erasable Programmable Read-Only Memory (EEPROM), an Erasable Programmable Read-Only Memory (EPROM), a Programmable Read-Only Memory (PROM), a Read-Only Memory (ROM), a magnetic Memory, a flash Memory, a magnetic disk, or an optical disk.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions when actually implemented, and for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some communication interfaces, and may be in an electrical, mechanical or other form.

In addition, units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

Furthermore, the functional modules in the embodiments of the present application may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.

In this document, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions.

The above description is only an example of the present application and is not intended to limit the scope of the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims

1. A human posture prediction model training method is characterized by comprising the following steps:

acquiring a labeled training set and an unlabeled training set, wherein the labeled training set comprises a plurality of first human body images containing labeled data, and the labeled data is used for representing real posture information in the first human body images; the unmarked training set comprises a plurality of second human body images which do not contain marking data;

inputting the first human body image into a generator in a human body posture prediction model to obtain a corresponding first human body posture prediction result, and calculating a first loss value of the generator according to the labeling data and the first human body posture prediction result;

inputting the second human body image into the generator to obtain a corresponding second human body posture prediction result;

calculating a second loss value corresponding to a discriminator in the human body posture prediction model according to the first human body image, the labeled data, the second human body image and the second human body posture prediction result;

and optimizing the generator and the discriminator according to the first loss value and the second loss value to obtain the human body posture prediction model.

2. The method for training the human body posture prediction model according to claim 1, wherein the inputting the first human body image into a generator in the human body posture prediction model to obtain a corresponding first human body posture prediction result, and calculating a first loss value of the generator according to the labeled data and the first human body posture prediction result comprises:

inputting the first human body image into the generator to obtain a predicted human body posture heat map of a corresponding multi-channel; wherein the predicted human posture heat map of each channel predicts a human body key point position;

generating a corresponding reference human body posture heat map based on the annotation data corresponding to the first human body image;

calculating the first loss value from the predicted human pose heat map and the reference human pose heat map.

3. The method for training the human body posture prediction model according to claim 2, wherein the calculating a second loss value corresponding to a discriminator in the human body posture prediction model according to the first human body image, the annotation data, the second human body image and the second human body posture prediction result comprises:

inputting the first human body image and the reference human body posture heat map into the discriminator as a true data sequence, and inputting the second human body image and the second human body posture prediction result into the discriminator as a false data sequence to respectively obtain discrimination results output by the discriminator;

and calculating the second loss value according to the judgment result.

4. The human pose prediction model training method of any one of claims 1-3, wherein the optimizing the generator and the discriminator according to the first loss value and the second loss value comprises:

5. The human body posture prediction model training method according to any one of claims 1-4, wherein the obtaining of the labeled training set and the unlabeled training set comprises:

acquiring an original labeling training set containing labeling data and an original unlabeled training set not containing labeling data;

respectively carrying out human body detection on the images in the original labeled training set and the images in the original unlabeled training set by using a pre-trained human body detection model to obtain the first human body images in the labeled training set and the second human body images in the unlabeled training set; wherein the first human body image and the second human body image are both single person images.

6. A human posture prediction method is characterized by comprising the following steps:

acquiring a third human body image;

inputting the third human body image into a generator of the human body posture prediction model obtained by the human body posture prediction model training method of any one of claims 1 to 5 to obtain a third human body posture prediction result.

7. A human body posture prediction model training device is characterized by comprising:

the system comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring a labeled training set and an unlabeled training set, the labeled training set comprises a plurality of first human body images containing labeled data, and the labeled data is used for representing real posture information in the first human body images; the unmarked training set comprises a plurality of second human body images which do not contain marking data;

the first input module is used for inputting the first human body image into a generator in a human body posture prediction model to obtain a corresponding first human body posture prediction result, and calculating a first loss value of the generator according to the labeling data and the first human body posture prediction result;

the second input module is used for inputting the second human body image into the generator to obtain a corresponding second human body posture prediction result;

a calculating module, configured to calculate a second loss value corresponding to a discriminator in the human body posture prediction model according to the first human body image, the annotation data, the second human body image, and the second human body posture prediction result;

and the optimization module is used for optimizing the generator and the discriminator according to the first loss value and the second loss value to obtain the human body posture prediction model.

8. A human body posture prediction apparatus, comprising:

the second acquisition module is used for acquiring a third human body image;

a prediction module, configured to input the third human body image into a generator in a human body posture prediction model obtained by using the human body posture prediction model training method according to any one of claims 1 to 5, so as to obtain a third human body posture prediction result.

9. An electronic device comprising a processor and a memory, the memory storing computer readable instructions that, when executed by the processor, perform the method of any one of claims 1-5 or the method of claim 6.

10. A storage medium having a computer program stored thereon, wherein the computer program, when executed by a processor, performs the method of any one of claims 1-5 or the method of claim 6.