CN110909664A

CN110909664A - Human body key point identification method and device and electronic equipment

Info

Publication number: CN110909664A
Application number: CN201911141765.6A
Authority: CN
Inventors: 刘思阳
Original assignee: Beijing QIYI Century Science and Technology Co Ltd
Current assignee: Beijing QIYI Century Science and Technology Co Ltd
Priority date: 2019-11-20
Filing date: 2019-11-20
Publication date: 2020-03-24

Abstract

The embodiment of the invention provides a method and a device for identifying key points of a human body and electronic equipment, and relates to the field of image processing. The method comprises the following steps: acquiring a target image of a key point of a human body to be identified; inputting the target image into a pre-trained neural network model to obtain a heat map and a displacement map of each human body key point in the target image; determining the coordinates of each human body key point in the target image based on the heat map and the displacement map of each human body key point in the target image according to a preset identification rule; and determining the coordinates of each human body key point in the image based on the coordinates of each human body key point in the target image. By the scheme, the aim of ensuring the recognition accuracy of the human body key point recognition can be fulfilled under the condition of low model complexity.

Description

Human body key point identification method and device and electronic equipment

Technical Field

The invention relates to the technical field of image processing, in particular to a human body key point identification method and device and electronic equipment.

Background

Human body key point identification is the implementation basis of action identification, abnormal behavior detection, security protection and the like, and is mainly used for positioning human body key parts such as the head, the neck, the shoulders, the hands and the like from a given image.

In the prior art, when a human body key point is identified, a target image of the human body key point to be identified is acquired, a heat map of a single piece of each human body key point in the target image is generated through a pre-trained neural network model, and then, the coordinate of each human body key point is determined based on the heat map of the single piece of each human body key point. The heat map of any human body key point is a probability distribution map of possible positions of the human body key point.

The inventor finds that the prior art at least has the following problems in the process of implementing the invention:

the recognition accuracy and the model complexity of the human key points can not be ensured simultaneously, and are specifically embodied as follows: if the accuracy of the identification of the key points is high, the size of the heat map needs to be large, which results in a high complexity of the neural network model utilized.

Therefore, how to ensure the identification accuracy of the human body key points under the condition of low model complexity is an urgent problem to be solved.

Disclosure of Invention

The embodiment of the invention aims to provide a method and a device for identifying key points of a human body and electronic equipment, so as to achieve the aim of ensuring the identification accuracy of the key points of the human body under the condition of low model complexity.

The specific technical scheme is as follows:

in a first aspect, an embodiment of the present invention provides a method for identifying key points of a human body, including:

acquiring a target image of a key point of a human body to be identified;

inputting the target image into a pre-trained neural network model to obtain a heat map and a displacement map of each human body key point in the target image; each point in the displacement graph of any human body key point is used for representing the offset distance of the position of the point relative to the position of a target point, and the target point is a mapping point of the human body key point in the displacement graph; the neural network model is a model obtained by training based on a sample image, and a true-value heat map and a true-value displacement map of each human body key point in the sample image;

determining the coordinates of each human body key point in the target image based on the heat map and the displacement map of each human body key point in the target image according to a preset identification rule; wherein the predetermined identification rule is: and for each human body key point, determining a candidate region based on the heat map of the human body key point, and determining the coordinates of the human body key point from the candidate region based on the displacement map of the human body key point.

Optionally, the displacement graph of any human body key point comprises a displacement graph in the x-axis direction and a displacement graph in the y-axis direction;

the step of determining the coordinates of each human body key point in the target image based on the heat map and the displacement map of each human body key point in the target image according to a preset identification rule comprises the following steps:

determining the coordinates of each human body key point in the target image based on the heat map and the displacement map of each human body key point in the target image according to a preset first calculation formula;

wherein the predetermined first calculation formula includes:

I_x＝h_x×s1+ox×t1；

I_y＝h_y×s2+oy×t2；

wherein (I)_x，I_y) Is the coordinate of a key point I of the human body, (h)_x，h_y) The coordinates of the pixel point with the largest value in the heat map of the human body key point I are shown, ox is the serial number of the row with the value of 0 in the displacement map of the human body key point I in the x-axis direction, and oy is the serial number of the row with the value of 0 in the displacement map of the human body key point I in the y-axis direction; s1 is the reduction coefficient of the heat map output by the neural network model relative to the input image in the x-axis direction, and s2 is the reduction coefficient of the heat map output by the neural network model relative to the input image in the y-axis direction; t1 is a reduction coefficient of the displacement map outputted by the neural network model with respect to the outputted heat map in the x-axis direction, and t2 is a reduction coefficient of the displacement map outputted by the neural network model with respect to the outputted heat map in the y-axis directionThe reduction factor of the heat map.

Optionally, the training process of the neural network model includes:

acquiring a plurality of sample images and coordinates of each human body key point in each sample image;

generating a true value heat map and a true value displacement map of each human body key point in each sample image by using the coordinates of each human body key point in the sample image aiming at each sample image;

inputting each sample image into the neural network model in training respectively to obtain a predicted heat map and a predicted displacement map of each human body key point in each sample image;

calculating a comprehensive loss value based on the difference between a true value heat map and a predicted heat map of each human body key point in each sample image and the difference between a true value displacement map and a predicted displacement map;

judging whether the neural network model in training converges or not based on the comprehensive loss value, and if so, finishing the training to obtain the trained neural network model; otherwise, adjusting the network parameters of the neural network model, and continuing to train the neural network model.

Optionally, the manner of generating, for each sample image, a true-value heat map of each human body keypoint in the sample image by using the coordinate of each human body keypoint in the sample image includes:

generating a truth-value heat map of each human body key point in each sample image by using the coordinates of each human body key point in the sample image according to a preset truth-value heat map generation mode for each sample image;

the generation mode of the truth value heat map comprises the following steps:

generating a matrix M aiming at each human body key point of a truth-value heat map to be generated, wherein the size of the matrix M is the same as that of the truth-value heat map to be generated;

traversing each element in the matrix M, when traversing each element, calculating a value reference value of the element according to a preset second calculation formula, if the value reference value of the element is larger than a preset threshold value, setting the value of the element in the matrix M as 0, otherwise, calculating the value of the element according to a preset third calculation formula, and setting the value of the element in the matrix M as the calculated value;

after traversing all elements in the matrix M, taking the current matrix M as a true value heat map of the key point of the human body;

the second calculation formula includes:

wherein d is_abIs the value reference value of the element P (b, a) in the matrix M, a is the serial number of the row where the element P is located, b is the serial number of the column where the element P is located,

(x_i′，y_i' is the coordinate of a human body key point i in the sample image, round () is a function for rounding operation, α 1 is the reduction coefficient of the heat map output by the neural network model relative to the input image in the x-axis direction, α 2 is the reduction coefficient of the heat map output by the neural network model relative to the input image in the y-axis direction, and the human body key point i is the human body key point of a true value heat map to be generated;

the third calculation formula includes:

wherein, M [ a ]][b]Is the value of the element P.

Optionally, for each sample image, a manner of generating a truth displacement map of each human body keypoint in the sample image by using the coordinate of each human body keypoint in the sample image includes:

generating a truth value displacement map of each human body key point in each sample image by using the coordinates of each human body key point in the sample image according to a preset truth value displacement map generation mode for each sample image;

the generation mode of the truth value displacement diagram comprises the following steps:

aiming at each human body key point of a truth value displacement diagram to be generated, two matrixes M with the same size are generated_xAnd M_ySaid matrix M_xAnd M_yThe dimension of the displacement diagram is the same as that of the true value displacement diagram to be generated;

traverse the matrix M_xWhen each element in the matrix is traversed to each element, the value of the element is calculated by using a preset fourth calculation formula, and the element is positioned in the matrix M_xSetting the value of (a) as the calculated value; after the traversal is finished, the matrix M_xAfter all elements are in the matrix, the current matrix M is used_xA displacement diagram in the x-axis direction as the key point of the human body;

traverse the matrix M_yWhen each element in the matrix is traversed to each element, the value of the element is calculated by using a preset fifth calculation formula, and the element is positioned in the matrix M_ySetting the value of (a) as the calculated value; after the traversal is finished, the matrix M_yAfter all elements are in the matrix, the current matrix M is used_yA displacement diagram in the y-axis direction as the key point of the human body;

wherein the fourth calculation formula includes:

M_x[a][b]＝b-xi；

the fifth calculation formula includes:

M_y[a][b]＝a-yi；

wherein M is_x[a][b]For the matrix M_xValue of middle element P (b, a), M_y[a][b]For the matrix M_yThe value of the middle element P (b, a), wherein a is the serial number of the row of the element P, and b is the serial number of the column of the element P;

(x_i′，y_i') coordinates of the human body key point i in the sample image, round () is a function for rounding operation, and β 1 is the output position of the neural network model in the x-axis directionβ 2 is the reduction coefficient of the displacement diagram output by the neural network model relative to the input image in the y-axis direction, and the human key point i is the human key point of the displacement diagram to be generated with the truth value.

In a second aspect, an embodiment of the present invention provides a human body key point identification device, including:

the image acquisition module is used for acquiring a target image of a key point of a human body to be identified;

the information identification module is used for inputting the target image into a pre-trained neural network model to obtain a heat map and a displacement map of each human body key point in the target image; each point in the displacement graph of any human body key point is used for representing the offset distance of the position of the point relative to the position of a target point, and the target point is a mapping point of the human body key point in the displacement graph; the neural network model is a model obtained by training based on a sample image, and a true-value heat map and a true-value displacement map of each human body key point in the sample image;

the coordinate determination module is used for determining the coordinates of each human body key point in the target image based on the heat map and the displacement map of each human body key point in the target image according to a preset identification rule; wherein the predetermined identification rule is: and for each human body key point, determining a candidate region based on the heat map of the human body key point, and determining the coordinates of the human body key point from the candidate region based on the displacement map of the human body key point.

the coordinate determination module is specifically configured to:

wherein the predetermined first calculation formula includes:

I_x＝h_x×s1+ox×t1；

I_y＝h_y×s2+oy×t2；

wherein (I)_x，I_y) Is the coordinate of a key point I of the human body, (h)_x，h_y) The coordinates of the pixel point with the largest value in the heat map of the human body key point I are shown, ox is the serial number of the row with the value of 0 in the displacement map of the human body key point I in the x-axis direction, and oy is the serial number of the row with the value of 0 in the displacement map of the human body key point I in the y-axis direction; s1 is the reduction coefficient of the heat map output by the neural network model relative to the input image in the x-axis direction, and s2 is the reduction coefficient of the heat map output by the neural network model relative to the input image in the y-axis direction; t1 is a reduction coefficient of the displacement map output by the neural network model with respect to the output heat map in the x-axis direction, and t2 is a reduction coefficient of the displacement map output by the neural network model with respect to the output heat map in the y-axis direction.

Optionally, the neural network model is trained by a model training module; wherein the model training module comprises:

the human body image obtaining sub-module is used for obtaining a plurality of sample images and coordinates of each human body key point in each sample image;

the truth map generation submodule is used for generating a truth value heat map and a truth value displacement map of each human body key point in each sample image by utilizing the coordinates of each human body key point in the sample image;

the prediction graph generation submodule is used for respectively inputting each sample image into the neural network model in training to obtain a prediction heat graph and a prediction displacement graph of each human body key point in each sample image;

the loss value calculation operator module is used for calculating a comprehensive loss value based on the difference between a true value heat map and a predicted heat map of each human body key point in each sample image and the difference between a true value displacement map and a predicted displacement map;

the judgment submodule is used for judging whether the neural network model in training converges or not based on the comprehensive loss value, and if so, finishing the training to obtain the trained neural network model; otherwise, adjusting the network parameters of the neural network model, and continuing to train the neural network model.

Optionally, the manner in which the true value map generation submodule generates, for each sample image, a true value heat map of each human body key point in the sample image by using the coordinates of each human body key point in the sample image includes:

the generation mode of the truth value heat map comprises the following steps:

the second calculation formula includes:

the third calculation formula includes:

wherein, M [ a ]][b]Is the value of the element P.

Optionally, the mode that the true value map generation submodule generates, for each sample image, a true value displacement map of each human body key point in the sample image by using the coordinates of each human body key point in the sample image includes:

for each sample image, a mode of generating a true value displacement map of each human body key point in the sample image by using the coordinates of each human body key point in the sample image includes:

traverse the matrix M_xWhen each element in the matrix is traversed to each element, the value of the element is calculated by using a preset fourth calculation formula, and the element is positioned in the matrix M_xSetting the value of (a) as the calculated value; after the traversal is finished, the matrix M_xAfter all the elements in the formula (I), theThe matrix M of_xA displacement diagram in the x-axis direction as the key point of the human body;

wherein the fourth calculation formula includes:

M_x[a][b]＝b-xi；

the fifth calculation formula includes:

M_y[a][b]＝a-yi；

(x_i′，y_i') is the coordinate of the human body key point i in the sample image, round () is the function for rounding operation, β 1 is the reduction coefficient of the displacement graph output by the neural network model relative to the input image in the x-axis direction, β 2 is the reduction coefficient of the displacement graph output by the neural network model relative to the input image in the y-axis direction, and the human body key point i is the human body key point of the true value displacement graph to be generated.

In a third aspect, an embodiment of the present invention provides an electronic device, including a processor, a communication interface, a memory, and a communication bus, where the processor and the communication interface complete communication between the memory and the processor through the communication bus;

a memory for storing a computer program;

and the processor is used for realizing the steps of any human body key point identification method when executing the program stored in the memory.

In a fourth aspect, the present invention further provides a computer-readable storage medium, in which a computer program is stored, and the computer program, when executed by a processor, implements the steps of any of the above human body key point identification methods.

In a fifth aspect, embodiments of the present invention further provide a computer program product containing instructions, which when run on a computer, causes the computer to perform the steps of any of the above human keypoint identification methods.

In the scheme provided by the embodiment of the invention, when the key points of the human body are identified, the target image of the key points of the human body to be identified is obtained, and the heat map and the displacement map of each key point of the human body in the target image are generated through a pre-trained neural network model; and determining the coordinates of each human body key point in the target image based on the heat map and the displacement map of each human body key point in the target image according to a preset identification rule. According to the scheme, the mode of combining the heat map and the displacement map is adopted, so that under the condition that a larger candidate area is determined due to the smaller size of the heat map, key points can be further positioned in the candidate area through the displacement map. Therefore, the aim of ensuring the recognition accuracy of the human body key point recognition under the condition of low model complexity can be fulfilled by the scheme.

Of course, not all of the advantages described above need to be achieved at the same time in the practice of any one product or method of the invention.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a flowchart of a method for identifying key points of a human body according to an embodiment of the present invention;

FIG. 2 is a flow chart of a training process of a neural network model provided by an embodiment of the present invention;

FIG. 3(a) is an exemplary schematic diagram illustrating the determination of keypoints based on a heat map and a displacement map;

FIG. 3(b) is a block diagram of an exemplary neural network model;

FIG. 3(c) is a schematic diagram of a training process of a neural network model;

FIG. 3(d) is a schematic diagram showing an exemplary heat map, a displacement map in the x-axis direction, and a displacement map in the y-axis direction;

fig. 4 is a schematic diagram of a human body key point identification device according to an embodiment of the present invention;

fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In order to achieve the purpose of ensuring the identification accuracy of human body key point identification under the condition of low model complexity, the embodiment of the invention provides a human body key point identification method, a human body key point identification device and electronic equipment.

First, a method for identifying key points of a human body according to an embodiment of the present invention will be described.

The execution subject of the human body key point identification method provided by the embodiment of the invention can be a human body key point identification device. The human body key point identification device can be applied to electronic equipment. In a specific application, the electronic device may be a terminal device or a server, which is reasonable.

In addition, the human body key points according to the embodiments of the present invention are used to locate the body key parts of the human body, for example: head, neck, shoulders, hands, legs, and/or feet; and, for any body key part, when the body key part is located, the required key point can be one or more. In different scenes, because the key parts of the body to be positioned are different, the specific positions and the number of the key points of the human body can be different, and the embodiment of the invention does not limit the specific positions and the number.

For convenience of understanding, the processing idea of the scheme provided by the embodiment of the present invention is first described. In order to solve the problems of the prior art, the processing idea of the scheme is as follows: and generating a heat map and a displacement map of each human body key point in the target image to be recognized through a pre-trained neural network model, and further recognizing the human body key points by adopting a mode of combining the heat map and the displacement map. Specifically, the method for identifying the key points of the human body by combining the heat map with the displacement map comprises the following steps: and determining a candidate region from the target image according to the probability distribution in the heat map of the human body key points, and further positioning the human body key points from the candidate region based on the displacement map of the same human body key points. Through the scheme, even if the size of the heat map is small, due to the existence of the displacement map, the precision of the key points of the human body can be high, and therefore the purpose of ensuring the recognition precision of the key points of the human body under the condition that the complexity of the model is low can be achieved. For clarity of the scheme, fig. 3(a) exemplarily shows a schematic diagram for implementing identification of key points of a human body by combining a heat map with a displacement map, wherein a light gray area is a candidate area, a dark gray area is a finally determined key point, and an arrow represents a direction of position offset.

As shown in fig. 1, a method for identifying key points of a human body according to an embodiment of the present invention may include the following steps:

s101, acquiring a target image of a key point of a human body to be identified;

the key target image of the human body to be identified is an image containing a human body area. And the size of the target image can be the size of an input image of a pre-trained neural network model, so that the target image does not need to be subjected to size adjustment when being input into the neural network model, and the coordinates of the human key points obtained by combining the displacement diagram and the heat diagram are the coordinates of the human key points in the target image.

Various modes exist for acquiring the target images of the key points of the human body to be identified. For example, the manner of acquiring the target image of the key point of the human body to be recognized may include:

acquiring an original image; the original image may be a video frame of a video, or an image acquired by a device, or an image downloaded through a network, etc.;

detecting a human body region of the original image;

and extracting the detected human body region from the original image, and performing size adjustment processing on the extracted human body region to obtain a target image of the human body key point to be identified, wherein the size adjustment processing is used for adjusting the size to be the size of the input image of the neural network model. The specific implementation manner adopted for detecting the human body region of the original image can be any manner capable of detecting the human body region from the image. For example: the body region is detected from the original image by using a pre-trained body region detection model, but is not limited thereto.

In addition, since the size of the human body region detected from the original image is not determined, and the input image of the neural network model has a fixed size, in order to be effective as the input content of the neural network model, after the detected human body region is extracted from the original image, the extracted human body region may be subjected to size adjustment processing to obtain the target image of the human body key point to be identified. The algorithm used in the so-called resizing process may be an interpolation algorithm, which may be a bilinear interpolation or a nearest neighbor interpolation algorithm, but is not limited thereto.

S102, inputting the target image into a pre-trained neural network model to obtain a heat map and a displacement map of each human body key point in the target image;

each point in the displacement graph of any human body key point is used for representing the offset distance of the position of the point relative to the position of a target point, and the target point is a mapping point of the human body key point in the displacement graph; the neural network model is a model trained on a sample image and a truth heat map and a truth displacement map of each key point in the sample image.

The displacement map of any one human body key point is a distribution map of the offset distances related to the human body key point. Specifically, the displacement graph of any human body key point comprises a displacement graph in the x-axis direction and a displacement graph in the y-axis direction. Then, each point in the x-axis displacement map of any human body key point is used for representing: in the x-axis direction, the offset distance of the position of the point relative to the position of the target point; and each point in the displacement graph of any human body key point in the y-axis direction is used for representing: and in the y-axis direction, the offset distance of the position of the point relative to the position of the target point.

In addition, in a specific application, in order to ensure a low calculation amount, the size of each true-value heat map is smaller than that of the sample image. In the x-axis direction, the reduction coefficient of each true-value heat map relative to the sample image is a first reduction coefficient, and in the y-axis direction, the reduction coefficient of each true-value heat map relative to the sample image is a second reduction coefficient, wherein the first reduction coefficient and the second reduction coefficient may be the same or different. For example, assuming the sample image size is w x h, the size of each truth heat map may be w x h

Wherein, the values of u1 and u2 can be the same or different. In addition, the first reduction coefficient and the second reduction coefficient may be referred to as a step size; also, when the first and second reduction coefficients are the same, it may be considered that the true-value heat map has a reduction coefficient or a step size with respect to the sample image, i.e., does not distinguish between the x-axis and y-axis directions.

Similarly, the size of each truth shift map is smaller than the size of the sample image. In the x-axis direction, the reduction coefficient of each truth displacement map relative to the sample image isAnd in the y-axis direction, the reduction coefficient of each truth displacement map relative to the sample image is a fourth reduction coefficient, wherein the third reduction coefficient and the fourth reduction coefficient can be the same or different. For example, assuming the sample image size is w x h, the size of each truth displacement map may be w x h

Wherein, the values of u3 and u4 can be the same or different. In addition, the third and fourth reduction coefficients may be referred to as a step size; also, when the third and fourth reduction coefficients are the same, it may be considered that the true-value heat map has a reduction coefficient or a step size with respect to the sample image, i.e., does not distinguish between the x-axis and y-axis directions.

In a specific application, the model structure of the neural network model utilized by the embodiment of the present invention may be various. Illustratively, in one implementation, referring to fig. 3(b), the neural network model may include: feature extraction network and two convolution groups: and the convolution group 1 and the convolution group 2 are used for inputting a certain target image into the characteristic extraction network to obtain a characteristic matrix, namely image characteristics, and then respectively inputting the characteristic matrix into the two convolution groups to obtain a heat map and a displacement map of each human key point in the input image. The feature extraction network may include, but is not limited to, the following networks: and feature extraction networks such as LeNet, AlexNet, VGG, GoogLeNet, ResNet, MobileNet and the like. And the convolution group 1 and the convolution group 2 can be composed of a plurality of convolutions, and the specific number can be set according to actual conditions.

For clarity, taking the neural network model shown in fig. 3(b) as an example, the processing flow of the neural network model on the image is exemplarily described:

will w_f×h_fInputting x 3 image into feature extraction network, and outputting

Wherein 3 is the number of channels of the image, 512 is the number of feature matrices, α is a preset reduction coefficient, and the smaller α is, the higher the network accuracy is;

will be provided with

Is input to convolution group 1, output

Namely, n sheets are output

Wherein n is the number of the key points;

will be provided with

Is input to convolution group 2, output

Namely, n sheets are output

And n pieces of displacement diagram in the x-axis direction

Displacement diagram in the y-axis direction.

In this example, the reduction coefficient of the heat map with respect to the input image is the same in both the x-axis direction and the y-axis direction, α, and the sizes of the heat map and the displacement map are the same.

The structure of the neural network model and the processing flow of the image described above are merely examples, and should not be construed as limiting the embodiments of the present invention.

For clarity of the scheme and clarity of layout, the training process of the neural network model described above will be described as an example.

S103, determining the coordinates of each human body key point in the target image based on the heat map and the displacement map of each human body key point in the target image according to a preset identification rule.

Wherein the predetermined identification rule is: and for each human body key point, determining a candidate region based on the heat map of the human body key point, and determining the coordinates of the human body key point from the candidate region based on the displacement map of the human body key point.

After obtaining the heat map and the displacement map of each human body key point in the target image, since the candidate region can be determined through the heat map, and the key point is located from the candidate region through the displacement map, each human body key point in the target image, that is, the coordinate of each human body key point in the target image can be determined based on the heat map and the displacement map of each human body key point in the target image according to a predetermined identification rule. It can be understood that, because the heat map of any human body key point is the probability distribution map of the possible positions of the human body key point, a candidate region of the human body key point in the target image can be inferred through the pixel point with the highest probability in the heat map. And each point in the displacement map of any human body key point is used for representing the offset distance of the position of the point relative to the position of the target point, so that the human body key point can be further deduced by using the position offset information in the displacement map after the candidate area is determined.

Illustratively, in one implementation, the displacement map of any human body key point comprises a displacement map in the x-axis direction and a displacement map in the y-axis direction;

wherein the predetermined first calculation formula includes:

I_x＝h_x×s1+ox×t1；

I_y＝h_y×s2+oy×t2；

wherein (I)_x，I_y) Is the coordinate of a key point I of the human body, (h)_x，h_y) The coordinate of the pixel point with the maximum value in the heat map of the human body key point I is shown, ox is the serial number of the row with the value of 0 in the displacement map of the human body key point I in the x-axis direction, and oy is the human bodyIn a displacement diagram of the body key point I in the y-axis direction, the serial number of a row with the value of 0 is taken as the serial number of the row; s1 is the reduction coefficient of the heat map output by the neural network model relative to the input image in the x-axis direction, and s2 is the reduction coefficient of the heat map output by the neural network model relative to the input image in the y-axis direction; t1 is the reduction coefficient of the displacement graph output by the neural network model relative to the output heat map in the x-axis direction, and t2 is the reduction coefficient of the displacement graph output by the neural network model relative to the output heat map in the y-axis direction.

In the scheme provided by the embodiment of the invention, due to the adoption of the mode of combining the heat map and the displacement map, under the condition that the size of the heat map is smaller and a larger candidate area is determined, the key points can be further positioned in the candidate area through the displacement map. Therefore, the aim of ensuring the recognition accuracy of the human body key point recognition under the condition of low model complexity can be fulfilled by the scheme.

For clarity of the scheme and clarity of the layout, the following describes an exemplary training process of the neural network model.

Optionally, as shown in fig. 2, the training process of the neural network model may include:

s201, obtaining a plurality of sample images and coordinates of each human body key point in each sample image;

wherein the size of the sample image is the size of the input image of the neural network model. Also, the sample image may be: the image obtained after the human body diagram is subjected to size adjustment is the human body diagram which is a human body image area extracted from an image.

It can be understood that the coordinates of each human body key point in the sample image can be determined by a manual labeling manner, that is, each human body key point is labeled in the sample image by the manual labeling manner, so that the coordinates of each human body key point in the sample image are obtained. Of course, since the sample image may be: the image obtained after resizing the human body diagram, therefore, in an implementation manner, the determining manner of the coordinates of each human body key point in the sample image may include:

step one, determining the coordinates of each human body key point of a reference image in an image library: let P be { P ═ P₁，P₂，P₃，...，P_nIs the combination of key points of the human body of the sample image, wherein n is the number of key points of the human body, P_i＝(x_i，y_i) Is the ith coordinate;

the reference image in the image library may be a video frame in a video, or may be a pre-acquired or downloaded image.

Step two, determining the region information of the human body region in the reference image: let the region information be (x)_bbox，y_bbox，w_bbox，h_bbox)；

Step three, mapping each human body key point in the reference image into the human body region to obtain the coordinates of each human body key point in the human body region: let P '= { P'₁，P’₂，P’₃，...，P’_nIs the combination of each body key point of a body region, P_i’＝(x’_i，y’_i) Is the ith coordinate, x'_i＝x_i-x_bbox，y’_i＝y_i-y_bbox；

And step four, mapping each human body key point in the human body area to the sample image corresponding to the human body area to obtain the coordinate of each human body key point in the sample image.

The specific implementation manner of mapping the coordinates of each human body key point in the human body region to the sample image corresponding to the human body region may be as follows: and (4) carrying out a mapping mode on coordinate points between the images with two sizes. For example: the size of the sample image corresponding to the human body region is m × n, the size of the human body region is (m/d1) × n/d1, for a point k1(x1, y1) in the human body region, a point k1 is mapped to the sample image corresponding to the human body region, and the coordinates of a mapped point of the point k1 in the sample image corresponding to the human body region are (x1 × d1, y1 × d 1).

It can be understood that the coordinates of each human key point in the reference image can be obtained by a manual calibration method.

S202, aiming at each sample image, generating a true value heat map and a true value displacement map of each human body key point in the sample image by using the coordinates of each human body key point in the sample image;

for clarity of the scheme and clarity of layout, the generation modes of the truth-value heat map and the truth-value displacement map of the human body key points are described in the following exemplary.

S203, inputting each sample image into the neural network model in training respectively to obtain a predicted heat map and a predicted displacement map of each human body key point in each sample image;

after each sample image is input to the neural network model under training, the neural network model performs key point information identification for each sample image. Specifically, the neural network model extracts a feature matrix from the received sample image, and then generates a prediction heat map and a prediction displacement map of each human body key point in the sample image based on the extracted feature matrix.

Taking the model structure shown in fig. 3(b) as an example, the process of processing the received sample image by the neural network model is described:

and the feature extraction network in the neural network model extracts features of the sample image to obtain a feature matrix, the feature matrix is respectively input into the convolution group 1 and the convolution group 2, the convolution group 1 conducts convolution processing on the feature matrix to obtain a predicted heat map of each human body key point in the sample image, and meanwhile, the convolution group 2 conducts convolution processing on the feature matrix to obtain a predicted displacement map of each human body key point in the sample image.

S204, calculating a comprehensive loss value based on the difference between a true value heat map and a predicted heat map of each human body key point in each sample image and the difference between a true value displacement map and a predicted displacement map;

the implementation manner of calculating the comprehensive loss value is various based on the difference between the true value heat map and the prediction heat map of each human body key point in each sample image and the difference between the true value displacement map and the prediction displacement map. And subsequently, introducing a calculation mode of the comprehensive loss value by combining a specific implementation mode.

Optionally, in a first implementation manner, the step of calculating the synthetic loss value based on the difference between the true-value heat map and the predicted heat map and the difference between the true-value displacement map and the predicted displacement map of each human body key point in each sample image may include:

step A1, for each sample image, obtaining a first type loss value of each human body key point in the sample image based on the difference between the true value heat map and the predicted heat map of each human body key point in the sample image, and obtaining a second type loss value of each human body key point in the sample image based on the difference between the true value displacement map and the predicted displacement map of each human body key point in the sample image;

step A2, determining the loss value of the human key point relative to the heat map based on each first-type loss value of the human key point and determining the loss value of the human key point relative to the displacement map based on each second-type loss value of the human key point aiming at each human key point;

since the number and positions of the human key points in each sample image are the same, the human key points in each sample image can be considered as the same group of human key points. Then, after step a1 is completed, each human keypoint in the set of human keypoints corresponds to a plurality of first-type loss values and a plurality of second-type loss values. Further, for each human body key point, a loss value with respect to the heat map and a loss value with respect to the displacement map may be calculated using the plurality of corresponding first-type loss values and the plurality of second loss values.

And step A3, weighting the loss value of each human body key point relative to the heat map and the loss value relative to the displacement map to obtain a comprehensive loss value.

Wherein the loss function utilized to calculate the first type loss value and the second type loss value may include, but is not limited to: a square loss function, a logarithmic loss function, or an exponential loss function, etc. In addition, for each human body key point, the loss value related to the heat map and the calculation method related to the displacement map may adopt an averaging method, a summing method, or a weighting method, and the like, which is not limited herein. The weight used for weighting the loss value of the heat map and the loss value of the displacement map for each human body key point may be set according to actual conditions, and is not limited herein.

For this implementation, the training process for the neural network model can be referred to the schematic diagram shown in fig. 3 (c). As shown in fig. 3(c), after the neural network model outputs the predicted heat map and the predicted displacement map, the loss value of the heat map is obtained based on the difference between the predicted heat map and the true heat map of the key points of the human body; obtaining a loss value related to the displacement diagram based on the difference between the predicted displacement diagram and the true value displacement diagram of the key points of the human body; further, a total loss value is obtained from the loss value with respect to the heat map and the loss value with respect to the displacement map.

S205, judging whether the neural network model in training converges or not based on the comprehensive loss value, and if so, finishing the training to obtain the trained neural network model; otherwise, adjusting the network parameters of the neural network model, and continuing to train the neural network model.

Wherein, based on the comprehensive loss value, judging whether the neural network model in training converges specifically may be: and judging whether the comprehensive loss value is smaller than a preset threshold value, if so, judging that the neural network model in training is converged, and otherwise, judging that the neural network model in training is not converged. When the neural network model in training is determined not to be converged, the network parameters of the neural network model can be adjusted, and the neural network model continues to be trained. And continuing to train the neural network model, namely returning to execute the step of inputting each sample image into the trained neural network model to obtain a predicted heat map and a predicted displacement map of each human body key point in each sample image.

The following describes a specific implementation manner of generating a true-value heat map and a true-value displacement map of each human body key point in the sample image by using the coordinates of each human body key point in the sample image, by way of example.

Optionally, in an implementation manner, for each sample image, a manner of generating a true-value heat map of each human body keypoint in the sample image by using the coordinate of each human body keypoint in the sample image includes:

the generation mode of the truth value heat map comprises the following steps:

traversing each element in the matrix M, when traversing each element, calculating a value reference value of the element according to a predetermined second calculation formula, if the value reference value of the element is larger than a predetermined threshold value, setting the value of the element in the matrix M to be 0, otherwise, calculating the value of the element according to a predetermined third calculation formula, and setting the value of the element in the matrix M to be the calculated value;

after traversing all elements in the matrix M, taking the current matrix M as a true value heat map of the key points of the human body;

the second calculation formula includes:

wherein d is_abIs the reference value of the element P (b, a) in the matrix M, a is the serial number of the row where the element P is located, b is the serial number of the column where the element P is located,

(x_i′，y_i') coordinates of the human body key point i in the sample image, round () is a function for rounding operation, and α 1 is the relation between the output heat map of the neural network model and the input image in the x-axis directionA reduction coefficient α 2 is the reduction coefficient of the heat map output by the neural network model relative to the input image in the y-axis direction, wherein the human key point i is the human key point of the heat map to be generated with the true value;

the third calculation formula includes:

wherein, M [ a ]][b]Is the value of the element P.

Wherein the size of the matrix M is the same as the size of the true heat map. Also, the initial value of each point in the matrix M may be 0, but is not limited to this, for example: the initial value may be 1, 10, 100, etc., which is reasonable.

Optionally, in an implementation manner, for each sample image, a manner of generating a true value displacement map of each human body keypoint in the sample image by using the coordinate of each human body keypoint in the sample image includes:

aiming at each human body key point of a truth value displacement diagram to be generated, two matrixes M with the same size are generated_xAnd M_yThe matrix M_xAnd M_yThe dimension of the displacement diagram is the same as that of the true value displacement diagram to be generated;

traverse the matrix M_xWhen each element in the matrix is traversed to each element, the value of the element is calculated by using a preset fourth calculation formula, and the element is positioned in the matrix M_xSetting the value of (a) as the calculated value; after the traversal is finished, the matrix M_xAfter all elements in the matrix, the current matrix M is added_xA true value displacement diagram in the x-axis direction as the key point of the human body;

traverse the matrix M_yEach element of (a) is calculated, using a predetermined fifth calculation formula,calculating the value of the element, and arranging the element in the matrix M_ySetting the value of (a) as the calculated value; after traversing, the matrix M_yAfter all elements in the matrix, the current matrix M is added_yA true value displacement diagram in the y-axis direction as the key point of the human body;

wherein the fourth calculation formula includes:

M_x[a][b]＝b-xi；

the fifth calculation formula includes:

M_y[a][b]＝a-yi；

Wherein, the matrix M_xAnd M_yIs the size of the true displacement map. And, matrix M_xAnd M_yThe initial value of each point in (a) may be 0, but is not limited to this, for example: the initial value may be 1, 10, 100, etc., which is reasonable.

The method for generating the truth-value heat map and the truth-value displacement map of any human body key point is given only as an example and should not be construed as limiting the embodiment of the invention.

The following describes a human body key point identification method provided by the embodiment of the present invention with reference to a specific example.

Assuming that a target image of a human body key point to be identified is an image K, the size of the image K is 125 x 125, the size of an input image of a pre-trained neural network model is 125 x 125, the size of an output heat map is 5 x 5, and the size of a displacement map is 5 x 5.

Then, the process of identifying by using the method for identifying key points of a human body provided by the embodiment of the invention comprises the following steps:

obtaining an image K of a key point of a human body to be identified;

inputting the image K into a pre-trained neural network model to obtain a 5 x 5 heat map and a 5 x 5 displacement map of each human body key point in the image K;

and determining the coordinates of each human body key point in the image K based on the heat map and the displacement map of each human body key point in the image K according to a preset first calculation formula.

The following describes a principle of calculating coordinates of a key point of a human body in an image K based on a heat map and a displacement map, taking the key point of the human body as an example:

suppose that the real coordinates of a human body key point P in an image K are (101, 79). Based on the generation manner of the heat map and the displacement map, the heat map and the displacement map of the human body key point P generated by the neural network model are as shown in fig. 3(d), the value of each point in the heat map given in fig. 3(d) is represented by Pxy, and the displacement maps in the x-axis and y-axis directions are represented by specific numerical values. Furthermore, as can be seen from the way of generating the heat map and the displacement map, since the point mapped to the heat map of the human body key point P is (4, 3), the point in the heat map of the human body key point P having the largest pixel value is (4, 3), the column number having the smallest absolute value of the values in the displacement map in the x-axis direction is 4, and the row number having the smallest absolute value of the values in the displacement map in the y-axis direction is 3, as shown in fig. 3 (d). Then, according to the first calculation formula, the coordinates calculated based on the heat map and the displacement map of the human body key point P are as follows:

p_x＝4×25+4*1＝104；

P_y＝3×25+3*1＝78。

as can be seen from the above, the calculated coordinate information of the human body key point P is (104, 78). As can be seen, the calculated coordinates of the human body key point P are very close to the real coordinates, and can be considered as the coordinates of the human body key point P in the recognized image K.

Compared with the method embodiment, the embodiment of the invention also provides a human body key point identification device. As shown in fig. 4, a human body key point identification device provided in an embodiment of the present invention may include:

an image obtaining module 410, configured to obtain a target image of a key point of a human body to be identified;

the information identification module 420 is configured to input the target image to a pre-trained neural network model to obtain a heat map and a displacement map of each human body key point in the target image; each point in the displacement graph of any human body key point is used for representing the offset distance of the position of the point relative to the position of a target point, and the target point is a mapping point of the human body key point in the displacement graph; the neural network model is a model obtained by training based on a sample image, and a true-value heat map and a true-value displacement map of each human body key point in the sample image;

a coordinate determination module 430, configured to determine coordinates of each human body key point in the target image based on the heat map and the displacement map of each human body key point in the target image according to a predetermined identification rule; wherein the predetermined identification rule is: and for each human body key point, determining a candidate region based on the heat map of the human body key point, and determining the coordinates of the human body key point from the candidate region based on the displacement map of the human body key point.

Optionally, in an implementation, the displacement map of any human body key point includes a displacement map in an x-axis direction and a displacement map in a y-axis direction;

the coordinate determination module 430 is specifically configured to:

wherein the predetermined first calculation formula includes:

I_x＝h_x×s1+ox×t1；

I_y＝h_y×s2+oy×t2；

Optionally, in one implementation, the neural network model is trained by a model training module; wherein the model training module comprises:

the generation mode of the truth value heat map comprises the following steps:

the second calculation formula includes:

wherein d is_abA value reference value of an element P (b, a) in the matrix M, wherein a is a serial number of a row where the element P is located, and b is a sequence of a column where the element P is locatedThe number of the mobile station is,

the third calculation formula includes:

wherein, M [ a ]][b]Is the value of the element P.

traverse the matrix M_xWhen each element in the matrix is traversed to each element, the value of the element is calculated by using a preset fourth calculation formula, and the element is positioned in the matrix M_xGet ofSetting the value as the calculated value; after the traversal is finished, the matrix M_xAfter all elements are in the matrix, the current matrix M is used_xA displacement diagram in the x-axis direction as the key point of the human body;

wherein the fourth calculation formula includes:

M_x[a][b]＝b-xi；

the fifth calculation formula includes:

M_y[a][b]＝a-yi；

Corresponding to the above method embodiment, an electronic device is further provided in the embodiment of the present invention, as shown in fig. 5, including a processor 501, a communication interface 502, a memory 503, and a communication bus 504, where the processor 501, the communication interface 502, and the memory 503 complete mutual communication through the communication bus 504,

a memory 503 for storing a computer program;

the processor 501 is configured to implement the steps of any of the human body key point identification methods described above in the embodiments of the present invention when executing the program stored in the memory 503.

The communication bus mentioned in the electronic device may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown, but this does not mean that there is only one bus or one type of bus.

The communication interface is used for communication between the electronic equipment and other equipment.

The Memory may include a Random Access Memory (RAM) or a Non-Volatile Memory (NVM), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the processor.

The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component.

In another embodiment of the present invention, a computer-readable storage medium is further provided, in which a computer program is stored, and the computer program, when executed by a processor, implements the steps of any of the above human body key point identification methods.

In yet another embodiment of the present invention, there is also provided a computer program product containing instructions which, when run on a computer, cause the computer to perform the steps of any of the above-mentioned human keypoint identification methods.

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the invention to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. The term "comprising", without further limitation, means that the element so defined is not excluded from the group consisting of additional identical elements in the process, method, article, or apparatus that comprises the element.

Each embodiment in this specification is described in a related manner, and the same and similar parts in each embodiment may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the apparatus, the electronic device, the storage medium, and the computer program product embodiment, since they are substantially similar to the method embodiment, the description is relatively simple, and for the relevant points, reference may be made to part of the description of the method embodiment.

The above description is only for the preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims

1. A human body key point identification method is characterized by comprising the following steps:

acquiring a target image of a key point of a human body to be identified;

2. The method according to claim 1, wherein the displacement map of any human body key point comprises a displacement map in an x-axis direction and a displacement map in a y-axis direction;

wherein the predetermined first calculation formula includes:

I_x＝h_x×s1+ox×t1；

I_y＝h_y×s2+oy×t2；

3. The method of claim 1 or 2, wherein the training process of the neural network model comprises:

4. The method of claim 3, wherein for each sample image, generating a true-value heat map for each human keypoint in the sample image using the coordinates of each human keypoint in the sample image comprises:

the generation mode of the truth value heat map comprises the following steps:

the second calculation formula includes:

the third calculation formula includes:

wherein, M [ a ]][b]Is the value of the element P.

5. The method of claim 3,

wherein the fourth calculation formula includes:

M_x[a][b]＝b-xi；

the fifth calculation formula includes:

M_y[a][b]＝a-yi；

(x_i′，y_i') coordinates of the human body key point i in the sample image, round () is a function for rounding operation, and β 1 is the relative displacement graph output by the neural network model in the x-axis directionAnd β 2 is the reduction coefficient of the displacement graph output by the neural network model relative to the input image in the y-axis direction according to the reduction coefficient of the input image, and the human key point i is the human key point of the displacement graph to be generated into a true value.

6. A human body key point recognition device is characterized by comprising:

7. The apparatus of claim 6,

the displacement graph of any human body key point comprises a displacement graph in the x-axis direction and a displacement graph in the y-axis direction;

the coordinate determination module is specifically configured to:

wherein the predetermined first calculation formula includes:

I_x＝h_x×s1+ox×t1；

I_y＝h_y×s2+oy×t2；

8. The apparatus of claim 6 or 7, wherein the neural network model is trained by a model training module; wherein the model training module comprises:

9. The apparatus of claim 8, wherein the true value map generation submodule generates, for each sample image, a true value heat map for each human body key point in the sample image using coordinates of each human body key point in the sample image, including:

the generation mode of the truth value heat map comprises the following steps:

the second calculation formula includes:

the third calculation formula includes:

wherein, M [ a ]][b]Is the value of the element P.

10. The apparatus of claim 8, wherein the truth map generation submodule generates, for each sample image, a truth displacement map for each human keypoint in the sample image using the coordinates of each human keypoint in the sample image, comprising:

wherein the fourth calculation formula includes:

M_x[a][b]＝b-xi；

the fifth calculation formula includes:

M_y[a][b]＝a-yi；

11. An electronic device is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor and the communication interface are used for realizing mutual communication by the memory through the communication bus;

a memory for storing a computer program;

a processor for implementing the method steps of any one of claims 1 to 5 when executing a program stored in the memory.