CN111274852A

CN111274852A - Target object key point detection method and device

Info

Publication number: CN111274852A
Application number: CN201811480075.9A
Authority: CN
Inventors: 刘继文; 赵永乐; 周迅溢
Original assignee: Beijing Orion Star Technology Co Ltd
Current assignee: Beijing Orion Star Technology Co Ltd
Priority date: 2018-12-05
Filing date: 2018-12-05
Publication date: 2020-06-12
Anticipated expiration: 2038-12-05
Also published as: CN111274852B

Abstract

The application provides a method and a device for detecting key points of a target object, wherein the method comprises the following steps: determining the predicted position of a target key point in an image to be detected according to the position of the target key point of a target object in the previous N frames of images to be detected; wherein N is greater than 1; determining a target area corresponding to a target object from an image to be detected according to the predicted position of the target key point; and carrying out key point detection on the target area. According to the method, the position of the target key point in the image to be detected is predicted according to the position of the target key point in the previous multi-frame image of the image to be detected, and then the target area corresponding to the target object in the image to be detected is predicted according to the predicted position, so that the link of detecting the position of the target object by using a detector is omitted, and the detection speed of the key point is greatly improved.

Description

Target object key point detection method and device

Technical Field

The application relates to the technical field of computer vision, in particular to a method and a device for detecting key points of a target object.

Background

With the development of computer technology, key points of a human body can be currently detected from an image including the human body. When the human body key point detection is carried out, the position of the human body can be detected by an independent human body detector, then the human body is pulled out from the background, and then the human body key point detection is carried out.

In the related art, when detecting the position of the human body, the following three methods can be adopted: (1) each frame of image calls a human body detector to detect the position of a human body; (2) calling the human body detector once every multiframe; (3) the first frame is detected by a human detector, and the position of the human body is determined by each subsequent frame of image by using a tracking algorithm.

However, the independent human body detector consumes longer operation time, and the detection speed of key points can be seriously influenced by calling the human body detector every frame; although the average detection time can be reduced by frame-separated detection (every n frames), the reduction degree is limited, and when the human body position in the n frames of images moves greatly in the images, the human body position detection result is inaccurate, so that the human body key point detection is inaccurate; the tracking algorithm is insensitive to small position change and size change of the tracked object, the detection accuracy of key points of a human body is low, and the tracking algorithm has accumulated errors and cannot continuously track.

Therefore, in the related art, the human body key point detection method has the problems of low detection speed and low detection precision.

Disclosure of Invention

The application provides a method and a device for detecting key points of a target object, which are used for solving the problems of low detection speed and low detection precision of the method for detecting the key points of the target object in the related technology.

An embodiment of one aspect of the present application provides a method for detecting a key point of a target object, including:

determining the predicted position of the target key point in the image to be detected according to the position of the target key point of the target object in the previous N frames of images to be detected; wherein N is greater than 1;

determining a target area corresponding to the target object from the image to be detected according to the predicted position of the target key point;

and detecting key points of the target area.

According to the target object key point detection method, the position of the target key point in the image to be detected is predicted according to the position of the target key point of the target object in the previous multi-frame image of the image to be detected, and then the target area of the target object in the image to be detected is determined according to the predicted position, so that the link of utilizing a detector to detect the position of the target object is omitted, the detection speed of the key point is greatly improved, the detection accuracy of the position of the target object is improved, and the detection precision of the key point is further improved.

As one possible implementation of an embodiment of an aspect of the present application,

the method for determining the predicted position of the target key point in the image to be detected according to the position of the target key point of the target object in the previous N frames of images to be detected comprises the following steps:

determining the moving speed information of the target key point according to the position of the target key point in the previous N frames of images; wherein the moving speed information comprises moving speed and/or moving acceleration;

and determining the predicted position of the target key point in the image to be detected according to the moving speed information of the target key point and the position of the target key point in the image of the previous frame of the image to be detected.

As a possible implementation manner in an embodiment of an aspect of the present application, before determining a predicted position of a target keypoint in an image to be detected according to a position of the target keypoint of a target object in an image of previous N frames of the image to be detected, the method further includes:

determining the identification accuracy of each key point of the target object in the previous N frames of images;

and screening the target key points of the target object from the key points according to the identification accuracy of the key points.

As a possible implementation manner of an embodiment of an aspect of the present application, the determining the recognition accuracy of each keypoint of the target object in the previous N frames of images includes:

obtaining the confidence of each key point of the target object in the previous N frames of images; carrying out weighted summation on the confidence degrees of the same key point in the previous N frames of images to obtain the identification accuracy of the corresponding key point;

and/or acquiring the positions of all key points of the target object in the previous N frames of images; determining the relative position of each key point according to the position of each key point in the same frame of image; determining the recognition accuracy of each key point according to the difference degree between the relative position of each key point and the standard relative position of each key point of the target object;

and/or acquiring the positions of all key points of the target object in the previous N frames of images; determining the change of the moving speed information of each key point according to the position of the same key point in different frame images; determining the identification accuracy of each key point according to whether the movement speed information change of each key point conforms to the motion continuity rule or not; wherein the movement speed information comprises a movement speed and/or a movement acceleration.

As a possible implementation manner in an embodiment of an aspect of the present application, the determining, according to the predicted position of the target key point, a target area corresponding to the target object from the image to be detected includes:

determining the predicted positions of all key points of the target object in the image to be detected according to the predicted positions of the target key points and the standard relative positions of all key points of the target object;

and determining a target area corresponding to the target object in the image to be detected according to the predicted position of each key point of the target object in the image to be detected.

As a possible implementation manner of an embodiment of an aspect of the present application, after the detecting the key point of the target area, the method further includes:

obtaining a detection result obtained by detecting key points in the target region, wherein the detection result comprises the position and/or confidence of each key point in the target region;

checking whether the target area contains the target object or not according to the detection result;

and if the target area contains the target object, determining the position of each key point of the target object in the image to be detected according to the position of each key point in the target area.

As a possible implementation manner in an embodiment of an aspect of the present application, the verifying whether the target area includes the target object according to the detection result includes:

determining the relative position of each key point in the target area according to the position of each key point in the target area;

and checking whether the target object is contained in the target region or not according to the difference degree between the relative position of each key point in the target region and the standard relative position of each key point of the target object and the confidence degree of each key point in the target region.

As a possible implementation manner of an embodiment of an aspect of the present application, the method further includes:

if the target area does not contain the target object, determining a target area corresponding to the target object from the image to be detected according to the image to be detected;

and detecting key points of the target area determined according to the image to be detected to obtain the positions of all key points of the target object in the image to be detected.

Another embodiment of the present application provides a target object key point detection apparatus, including:

the first determination module is used for determining the predicted position of a target key point in an image to be detected according to the position of the target key point of a target object in the previous N frames of images to be detected; wherein N is greater than 1;

the second determining module is used for determining a target area corresponding to the target object from the image to be detected according to the predicted position of the target key point;

and the detection module is used for detecting key points of the target area.

As a possible implementation manner of another aspect of this application, the first determining module may include:

a first determining unit, configured to determine moving speed information of the target keypoint according to the position of the target keypoint in the previous N frames of images; wherein the moving speed information comprises moving speed and/or moving acceleration;

and the second determining unit is used for determining the predicted position of the target key point in the image to be detected according to the moving speed information of the target key point and the position of the target key point in the previous frame image of the image to be detected.

As a possible implementation manner of another aspect of the embodiment of the present application, the apparatus further includes:

a third determining module, configured to determine recognition accuracy of each key point of the target object in the previous N frames of images;

and the screening module is used for screening the target key points of the target object from the key points according to the identification accuracy of the key points.

As a possible implementation manner of another aspect of the embodiment of the present application, the third determining module is specifically configured to:

As a possible implementation manner of another aspect of the embodiment of the present application, the second determining module is specifically configured to:

As a possible implementation manner of another aspect of the embodiment of the present application, the apparatus may further include:

an obtaining module, configured to obtain a detection result obtained by performing keypoint detection on the target region, where the detection result includes a position and/or a confidence of each keypoint in the target region;

the checking module is used for checking whether the target area contains the target object or not according to the detection result;

and a fourth determining module, configured to determine, when the target region includes the target object, positions of the key points of the target object in the image to be detected according to the positions of the key points in the target region.

As a possible implementation manner of another embodiment of the present application, the checking module is specifically configured to:

a fifth determining module, configured to determine, according to the to-be-detected image, a target region corresponding to the target object from the to-be-detected image when the target region does not include the target object;

and the detection module is also used for detecting key points of the target area determined according to the image to be detected to obtain the positions of all the key points of the target object in the image to be detected.

Another embodiment of the present application provides an electronic device, including a processor and a memory;

wherein the processor runs a program corresponding to the executable program code by reading the executable program code stored in the memory, so as to implement the target object key point detection method according to the embodiment of the above aspect.

Another embodiment of the present application provides a non-transitory computer-readable storage medium, on which a computer program is stored, and the computer program, when executed by a processor, implements the method for detecting the key point of the target object according to the embodiment of the above aspect.

Additional aspects and advantages of the present application will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the present application.

Drawings

The foregoing and/or additional aspects and advantages of the present application will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:

fig. 1 is a schematic flowchart of a method for detecting key points of a target object according to an embodiment of the present disclosure;

fig. 2 is a schematic flowchart of another method for detecting key points of a target object according to an embodiment of the present disclosure;

fig. 3 is a schematic flowchart of another method for detecting key points of a target object according to an embodiment of the present application;

fig. 4 is a schematic flowchart of another method for detecting a key point of a target object according to an embodiment of the present application;

fig. 5 is a schematic structural diagram of a target object keypoint detection apparatus according to an embodiment of the present disclosure;

fig. 6 is a schematic structural diagram of another target object keypoint detection apparatus provided in the embodiment of the present application;

FIG. 7 illustrates a block diagram of an exemplary electronic device suitable for use in implementing embodiments of the present application.

Detailed Description

Reference will now be made in detail to embodiments of the present application, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are exemplary and intended to be used for explaining the present application and should not be construed as limiting the present application.

A target object keypoint detection method and apparatus according to an embodiment of the present application are described below with reference to the drawings.

Fig. 1 is a schematic flowchart of a method for detecting a key point of a target object according to an embodiment of the present application.

The method for detecting the key points of the target object, provided by the embodiment of the application, can be configured in the electronic device to determine the target area corresponding to the target object in the image to be detected according to the position of the target key points of the target object in the previous N frames of images, and the link of detecting the position of the target object by using a detector is omitted, so that the detection speed of the key points is increased, the detection accuracy of the position of the target object is improved, and the detection precision of the key points is improved.

In this embodiment, the electronic device may be a device having an operating system, such as a personal computer or a server.

As shown in fig. 1, the method for detecting key points of a target object includes:

step 101, determining the predicted position of the target key point in the image to be detected according to the position of the target key point of the target object in the previous N frames of images to be detected.

Wherein N is an integer greater than 1.

In this embodiment, the key point detection may be performed on the target object in each frame of image in the video, and then the image to be detected may be one frame of image in the video. The target object may be a human, an animal, or the like.

The video is composed of images of one frame and one frame, and when the key point of the target object in the video is detected, the detection can be carried out by taking the image of one frame in the video as the object. Specifically, according to the playing sequence of the video, after the current frame image is detected, the next frame image is detected, that is, the next frame image is the image to be detected.

In this embodiment, the target key points may be part of the key points of the target object, or all of the key points of the target object.

For example, the motion of the human body has continuity, and the positions of key points of the human body in adjacent frames of images do not change greatly. In the embodiment of the application, the position of the target key point in the image to be detected is predicted according to the position of the target key point of the target object in the previous N frames of images of the image to be detected.

Specifically, the position of the target key point in each frame of image in the first N frames of image to be detected may be obtained first, and then the position of the target key point in the image to be detected may be predicted by using the position of the target key point in the first N frames of image, where the predicted position is referred to as a predicted position.

For example, the predicted position of the target key point of the target object in the image to be detected is determined by using the position of the target key point of the target object in the previous 2 frames of the image to be detected.

When the predicted position of the target key point in the image to be detected is determined, the motion information of the target key point can be used for prediction, and the following embodiments will be described in detail.

And 102, determining a target area corresponding to the target object from the image to be detected according to the predicted position of the target key point.

In this embodiment, the target area may refer to an area where a target object in the image is located. For example, when the target object is a human, the target region is a region in the image where the human is located.

After the predicted position of the target key point in the image to be detected is determined, the target area can be determined from the image to be detected according to the predicted position of the target key point. For example, the body region in the image to be detected can be predicted from the predicted positions of key points at the nose, elbow joint, shoulder, knee joint of the person.

As an implementation manner, the predicted positions of the key points of the target object in the image to be detected can be determined according to the predicted positions of the key points of the target and the standard relative positions of the key points of the target object; and determining a target area corresponding to the target object in the image to be detected according to the predicted position of each key point of the target object in the image to be detected. For example, the positions of the key points of the human body in the image to be detected can be predicted according to the predicted positions of the key points of the target human body in the image to be detected and the standard relative positions of the key points of the human body, so that the distribution range of the key points of the human body is obtained, and the region where the predicted positions of the key points of the human body form a target region corresponding to the target object in the image to be detected.

As another possible implementation manner, the target area of the image to be detected can be determined according to the predicted position of the target key point in the image to be detected and the relative position of the target key point and the edge of the target object. For example, from the predicted position of the person's forehead and the relative position of the forehead to the edge of the head, the region in which the person's head is located can be determined.

In the related art, since the detector detects the overall position of the target object, and the position of a plurality of key points of the target object can be detected in the present application, the method for predicting the position of the target object by using the target key points in the previous N frames of images in the present application has higher detection accuracy.

And 103, detecting key points of the target area.

In this embodiment, the target area may be input into the keypoint identification model for keypoint detection. The key point identification model can be obtained by utilizing neural network training and is used for detecting each key point of the target object in the image.

Or, determining the positions of the key points in the target area in the image to be detected by using the positions of the key points in the target area in the previous frame or multiple frames of images.

Taking the target area as the human body area as an example, determining the human body area in the image to be detected according to the position of the key point of the target human body, and then detecting the key point of the human body area.

It should be noted that, in this embodiment, when the image to be detected is the first frame image or the second frame image, the method in the prior art may be used to perform the key point detection on the target object in the first frame image, so as to obtain the positions of the key points of the target object.

In the embodiment of the application, the position of the target key point in the image to be detected is predicted according to the position of the target key point of the target object in the previous N (N >1) frame image, and then the target area corresponding to the target object in the image to be detected is determined according to the predicted position, so that the key point detection is carried out on the target area, thereby not only saving the link of detecting the position of the target object by using the detector, greatly improving the detection speed of the key point, but also improving the detection precision of the key point.

In addition, in the related art, if the target object is a human body, when a plurality of human bodies appear in the image to be detected, the positions of the plurality of human bodies detected by the detector cannot be matched with the key points of the plurality of human bodies in the previous frame image. In the embodiment of the application, since the positions of the human bodies in each frame of image are predicted from the corresponding key points of the human bodies in the previous N frames of images, the positions of the human bodies detected in the multi-person scene can be matched with the key points of the human bodies in the previous frame.

In an embodiment of the present application, when determining the predicted position of the target key point in the image to be detected, the position of the target key point in the previous N frames of images may be used to obtain the motion information of the target key point, and then the position of the target key point in the image to be detected is predicted according to the motion information. Fig. 2 is a schematic flow chart of another target object keypoint detection method according to an embodiment of the present disclosure.

As shown in fig. 2, the determining the predicted position of the target key point in the image to be detected according to the position of the target key point in the previous N frames of images to be detected includes:

step 201, determining moving speed information of a target key point according to the position of the target key point in the previous N frames of images; wherein the moving speed information comprises moving speed and/or moving acceleration.

In this embodiment, in the first N frames of images to be detected, the moving speed of the target key point may be determined according to the positions of the same target key point in the two adjacent frames of images, and then the moving acceleration of the target key point may be determined according to the positions of the same key point in the three adjacent frames of images.

Specifically, for each target key point, the moving distance of the target key point in each two adjacent frames of images is calculated according to the position of the same target key point in the previous N frames of images, and then the moving speed of the target key point in each two adjacent frames of images is calculated according to the frame rate and the moving distance of the target key point, wherein the frame rate refers to the number of images displayed by the video per second. Then the distance traveled is divided by the inverse of the frame rate to obtain the velocity of the travel. Further, the average value of the moving speeds is obtained as the moving speed of the target key point according to the moving speeds of the target key point calculated by every two adjacent frame images.

Then, the moving acceleration of the target key point can be determined according to the moving speed of the target key point in the two adjacent images.

For example, when N is 2, the moving speed of the target key point may be determined according to the position of the target key point in the image 2 frames before the image to be detected. When N is 3, assuming that the first 3 images of the image to be detected are A, B, C respectively, the moving speed x of the target key point can be determined according to the positions of the target key point in the a image and the B image₁The moving speed x of the target key point can be determined according to the B image and the C image₂Then according to the moving speed x₁And x₂Can determine the target gateAcceleration of movement of the key point.

Step 202, determining the predicted position of the target key point in the image to be detected according to the moving speed information of the target key point and the position of the target key point in the image of the previous frame of the image to be detected.

In this embodiment, the predicted position of the target key point in the image to be detected can be determined according to the position of the target key point in the image of the previous frame of the image to be detected and the moving speed of the target key point. Specifically, the product of the speed of the target key point and the time interval between the previous frame of image and the image to be detected may be calculated, and then the sum of the product result and the coordinates of the target key point in the previous frame of image may be calculated, that is, the coordinates of the predicted position of the target key point in the image to be detected.

Or calculating the moving speed according to the moving acceleration of the target key point, and further determining the predicted position of the target key point in the image to be detected according to the moving speed and the position of the target key point in the previous frame of image.

If the target object is a human body, after the predicted position of the key point of the target human body in the image to be detected is determined, the human body area in the image to be detected can be determined according to the predicted position, and then key point detection is carried out on the human body area.

In the embodiment of the application, the moving speed information of the target key point is determined according to the position of the target key point in the previous N frames of images, and then the position of the target key point in the image to be detected is predicted according to the moving speed information and the position of the target key point in the previous frame of image.

Further, in order to improve the detection accuracy of the target region in the image to be detected, before the predicted position of the target key point is determined, each key point of the target object may be screened according to the identification accuracy to obtain reliable key points as the target key points, and then the predicted position of the target key point in the image to be detected is determined according to the position of the target key point in the previous N frames of images. Fig. 3 is a schematic flow chart of another target object keypoint detection method according to an embodiment of the present application.

As shown in fig. 3, before determining the predicted position of the target key point in the image to be detected according to the position of the target key point of the target object in the previous N frames of images of the image to be detected, the target object key point detecting method further includes:

step 301, determining the recognition accuracy of each key point of the target object in the previous N frames of images.

In this embodiment, the recognition accuracy of each keypoint can be determined according to the confidence of each keypoint of the target object in the previous N frames of images, the relative position between the keypoints, the motion continuity law, and the like. Wherein the recognition accuracy is used to represent the detection accuracy of the keypoint location.

When the key point detection is carried out, the key point identification model can be adopted to detect the image, and the key point identification model can output the position and the confidence coefficient of each key point in the image. Wherein the confidence level is used to indicate the confidence level of the location of the detected keypoint.

As a possible implementation manner, obtaining confidence coefficients of all key points of the target object obtained by performing key point detection on the previous N frames of images by adopting a key point identification model, and then performing weighted summation on the confidence coefficients of the same key point in the previous N frames of images to obtain the identification accuracy of the corresponding key point. Thus, the recognition accuracy of each key point of the target object can be obtained. The weight value of each frame of image in the previous N frames of images of the same key point can be determined according to actual needs.

Taking the target object as a human body as an example, since the relative positions of the joints of the human body do not change much, for example, the relative positions of the key points at the shoulders are substantially the distance of the shoulder width, and the relative positions of the forehead and the nose are substantially unchanged.

Therefore, as another possible implementation manner, the recognition accuracy of each key point can also be determined according to the relative position between each key point. Specifically, the positions of all key points obtained by performing key point detection on the previous N frames of images by using a key point identification model are obtained, then, for each frame of image in the previous N frames of images, the relative positions of all key points in the corresponding images are determined according to the positions of all key points in the same frame of image, and then the difference degree between the relative positions of all key points in the corresponding images and the standard relative positions of all key points of the target object is calculated. And for each key point, determining the identification accuracy of the key point according to N difference degrees obtained from the previous N frames of images. Wherein the standard relative position can be obtained according to the structure of the target object. For example, the standard relative positions of key points of the human body can be obtained according to the structure of the human body.

For example, the recognition accuracy of the human key point can be determined according to the difference degree between the relative position of the same human key point in the previous N frames of images and the adjacent key point and the standard relative position, and the mapping relationship between the difference degree and the recognition accuracy. Wherein the greater the degree of difference the lower the corresponding recognition accuracy.

Since the motion of the target object is continuous, the velocity or acceleration of the key points of the target object determined from the adjacent frames of images should also be continuous, and if a sudden increase or decrease occurs, the detected key point position may be considered inaccurate.

Then, as another possible implementation manner, the moving speed information, such as the moving speed and/or the moving acceleration, of each key point may also be determined according to the position of the same key point in different frame images, so as to determine the moving speed information change, such as the acceleration change and/or the speed change, of each key point, and determine the identification accuracy according to whether the moving speed information change of each key point conforms to the motion continuity law. If the change of the moving speed information accords with the motion continuity law, the identification accuracy is high. Otherwise, the identification accuracy of the key points is low.

It should be noted that the recognition accuracy of each key point of the target object may be determined by using any two of the three methods or by using the three methods at the same time.

In the embodiment of the application, the identification accuracy is determined through the confidence coefficient of each key point, the relative position among the key points, the motion continuity law and the like, so that the accuracy of determining the identification accuracy is greatly improved.

And 302, screening the key points to obtain target key points of the target object according to the identification accuracy of the key points.

In this embodiment, the key points may be screened according to the identification accuracy of each key point, the key points whose identification accuracy exceeds a preset threshold may be used as target key points, so as to determine a target area in an image to be detected by using the screened target key points, and further perform key point detection on the target area.

Taking the target object as a human body as an example, the screened target key points can be utilized to determine a human body region in the image to be detected, and then key point detection is carried out on the human body region.

In the embodiment of the application, reliable key points are screened out by determining the identification accuracy of each key point and are used as target key points, so that the target area in the image to be detected is determined by using the reliable key points, the detection accuracy of the target area in the image to be detected is greatly improved, and the detection precision of the key points is improved.

In order to avoid the situation that the determined target area does not contain the target object, which leads to inaccurate detection, after the key point detection is performed on the target area, whether the target object is contained in the target area can be also checked. Fig. 4 is a schematic flow chart of another method for detecting a key point of a target object according to an embodiment of the present application.

As shown in fig. 4, after the key point detection is performed on the target area, the method for detecting the key point of the target object further includes:

step 401, obtaining a detection result obtained by performing keypoint detection on a target region, where the detection result includes a position and/or a confidence of each keypoint in the target region.

In this embodiment, when performing the keypoint detection on the target region in the image to be detected, the keypoint detection may be performed by using the keypoint identification model, and the keypoint identification model may output the position and the confidence of the keypoint, so that the confidence of the keypoint in the target region and the position of each keypoint in the target region may be obtained.

Step 402, checking whether the target area contains the target object according to the detection result. If the target object is included, go to step 403; if the target object is not included, go to step 404.

In this embodiment, the relative position of each keypoint in the target region may be determined according to the position of each keypoint in the target region, and then whether the target region includes the target object is verified by using the difference degree between the relative position of each keypoint and the standard relative position of each keypoint of the target object and the confidence of each keypoint in the target region.

For example, if the target object is a human body, it can be checked whether the target area includes the human body using the above method.

Specifically, if the keypoints with the difference degree between the relative position of each keypoint in the target region and the standard relative position of each keypoint within the preset range exceed the preset proportion, and the confidence degrees of the keypoints exceed the preset threshold, it can be considered that the target object is contained in the target region.

For example, if there are more than 90% of human key points in the target region, the difference between the relative position of the human key points and the standard relative position is within a preset range, and the confidence degrees of the human key points exceed a preset threshold, it can be considered that the target region includes a human body.

Step 403, if the target area includes the target object, determining the position of each key point of the target object in the image to be detected according to the position of each key point in the target area.

If the target area contains the target object, the target area in the image to be detected is determined to be more accurate, and then the positions of all key points obtained by detecting the target area can be used as the positions of the key points in the image to be detected.

And step 404, if the target area does not contain the target object, determining a target area corresponding to the target object from the image to be detected according to the image to be detected.

If the target area does not contain the target object, the target area in the image to be detected is determined to be inaccurate, and the image to be detected can be detected by using the detector to determine the target area.

For example, when the target object is a human body and the target area does not contain the human body, the detector may be used to detect the image to be detected, and determine the human body area in the image to be detected.

Step 405, performing key point detection on the target area determined according to the image to be detected to obtain the position of each key point of the target object in the image to be detected.

After the target area is determined by the detector, the target area is extracted from the image to be detected and input into the key point identification model, and key point detection is carried out on the target area by the key point identification model so as to detect the position of each key point of the target object in the target area.

In the above embodiment, whether the target object is included in the target region is verified by using the position and the confidence of each key point in the target region. It is understood that one of the position and the confidence of each key point in the target region may also be used to verify whether the target region contains the target object.

For example, whether the number of key points in the target area, the degree of difference between the relative position and the standard relative position of which is within a preset range, exceeds a preset ratio is judged, and if the number of key points exceeds the preset ratio, the target area can be considered to contain the target object.

For another example, whether the number of key points with the confidence level exceeding the preset threshold in the target area exceeds a preset ratio is judged, and if the number of key points exceeds the preset ratio, the target area can be considered to contain the target object.

In the embodiment of the application, after the key point detection is performed on the target region, whether the target region contains the target object is verified by using the position and/or confidence of the detected key point in the target region, so that the accuracy of the key point detection is further improved.

In order to implement the foregoing embodiments, an apparatus for detecting a key point of a target object is also provided in the embodiments of the present application. Fig. 5 is a schematic structural diagram of a target object keypoint detection apparatus according to an embodiment of the present application.

As shown in fig. 5, the target object key point detecting device includes: a first determination module 510, a second determination module 520, and a detection module 530.

A first determining module 510, configured to determine a predicted position of a target key point in an image to be detected according to a position of the target key point of a target object in N previous frames of images to be detected; wherein N is greater than 1;

a second determining module 520, configured to determine, according to the predicted position of the target key point, a target region corresponding to the target object from the image to be detected;

a detecting module 530, configured to perform keypoint detection on the target region.

Fig. 6 is a schematic structural diagram of another target object keypoint detection apparatus according to an embodiment of the present application.

In a possible implementation manner of this embodiment of this application, as shown in fig. 6, the first determining module 510 may include:

a first determining unit 511, configured to determine moving speed information of a target key point according to a position of the target key point in the previous N frames of images; the moving speed information comprises moving speed and/or moving acceleration;

the second determining unit 512 is configured to determine a predicted position of the target key point in the image to be detected according to the moving speed information of the target key point and the position of the target key point in the previous frame image of the image to be detected.

In a possible implementation manner of the embodiment of the present application, the apparatus further includes:

the third determining module is used for determining the recognition accuracy of each key point of the target object in the previous N frames of images;

In a possible implementation manner of the embodiment of the present application, the third determining module is specifically configured to:

obtaining the confidence coefficient of each key point of the target object in the previous N frames of images; carrying out weighted summation on the confidence degrees of the same key point in the previous N frames of images to obtain the identification accuracy of the corresponding key point;

and/or acquiring the positions of all key points of the target object in the previous N frames of images; determining the change of the moving speed information of each key point according to the position of the same key point in different frame images; determining the identification accuracy of each key point according to whether the movement speed information change of each key point conforms to the motion continuity rule or not; wherein the moving speed information comprises moving speed and/or moving acceleration.

In a possible implementation manner of the embodiment of the present application, the second determining module 520 is specifically configured to:

determining the predicted positions of all key points of the target object in the image to be detected according to the predicted positions of the key points of the target and the standard relative positions of all key points of the target object;

In a possible implementation manner of the embodiment of the present application, the apparatus may further include:

the system comprises an acquisition module, a detection module and a processing module, wherein the acquisition module is used for acquiring a detection result obtained by detecting key points in a target region, and the detection result comprises the position and/or confidence of each key point in the target region;

and the fourth determining module is used for determining the positions of all key points of the target object in the image to be detected according to the positions of all key points in the target area when the target area contains the target object.

In a possible implementation manner of the embodiment of the present application, the check module is specifically configured to:

and checking whether the target region contains the target object or not according to the difference degree between the relative position of each key point in the target region and the standard relative position of each key point of the target object and the confidence degree of each key point in the target region.

the fifth determining module is used for determining a target area corresponding to the target object from the image to be detected according to the image to be detected when the target area does not contain the target object;

the detecting module 530 is further configured to perform key point detection on the target region determined according to the image to be detected, so as to obtain positions of each key point of the target object in the image to be detected.

It should be noted that the foregoing explanation of the embodiment of the method for detecting key points of a target object is also applicable to the device for detecting key points of a target object of the embodiment, and therefore is not repeated herein.

In order to implement the foregoing embodiments, an electronic device is further provided in an embodiment of the present application, including a processor and a memory;

the processor reads the executable program code stored in the memory to run a program corresponding to the executable program code, so as to implement the target object key point detection method described in the above embodiments.

FIG. 7 illustrates a block diagram of an exemplary electronic device suitable for use in implementing embodiments of the present application. The electronic device 12 shown in fig. 7 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present application.

As shown in FIG. 7, electronic device 12 is embodied in the form of a general purpose computing device. The components of electronic device 12 may include, but are not limited to: one or more processors 16 or processor units, a memory 28, and a bus 18 that couples various system components including the memory 28 and the processors 16.

Bus 18 represents one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, and a processor or local bus using any of a variety of bus architectures. These architectures include, but are not limited to, Industry Standard Architecture (ISA) bus, Micro Channel Architecture (MAC) bus, enhanced ISA bus, Video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus, to name a few.

Electronic device 12 typically includes a variety of computer system readable media. Such media may be any available media that is accessible by electronic device 12 and includes both volatile and nonvolatile media, removable and non-removable media.

Memory 28 may include computer system readable media in the form of volatile Memory, such as Random Access Memory (RAM) 30 and/or cache Memory 32. The electronic device 12 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, storage system 34 may be used to read from and write to non-removable, nonvolatile magnetic media (not shown in FIG. 7, and commonly referred to as a "hard drive"). Although not shown in FIG. 7, a disk drive for reading from and writing to a removable, nonvolatile magnetic disk (e.g., a "floppy disk") and an optical disk drive for reading from or writing to a removable, nonvolatile optical disk (e.g., a Compact disk Read Only memory (CD-ROM), a Digital versatile disk Read Only memory (DVD-ROM), or other optical media) may be provided. In these cases, each drive may be connected to bus 18 by one or more data media interfaces. Memory 28 may include at least one program product having a set (e.g., at least one) of program modules that are configured to carry out the functions of embodiments of the application.

A program/utility 40 having a set (at least one) of program modules 42 may be stored, for example, in memory 28, such program modules 42 including, but not limited to, an operating system, one or more application programs, other program modules, and program data, each of which examples or some combination thereof may comprise an implementation of a network environment. Program modules 42 generally perform the functions and/or methodologies of the embodiments described herein.

Electronic device 12 may also communicate with one or more external devices 14 (e.g., keyboard, pointing device, display 24, etc.), with one or more devices that enable a user to interact with electronic device 12, and/or with any devices (e.g., network card, modem, etc.) that enable electronic device 12 to communicate with one or more other computing devices. Such communication may be through an input/output (I/O) interface 22. Also, the electronic device 12 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public Network such as the Internet) via the Network adapter 20. As shown, the network adapter 20 communicates with other modules of the electronic device 12 via the bus 18. It should be understood that although not shown in the figures, other hardware and/or software modules may be used in conjunction with electronic device 12, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.

The processor 16 executes various functional applications and data processing by executing programs stored in the memory 28, for example, implementing the methods mentioned in the foregoing embodiments.

In order to implement the foregoing embodiments, the present application further proposes a non-transitory computer-readable storage medium, on which a computer program is stored, which when executed by a processor implements the target object keypoint detection method as described in the foregoing embodiments.

In the description of the present specification, the terms "first", "second" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implying any number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present application, "plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise.

Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing steps of a custom logic function or process, and alternate implementations are included within the scope of the preferred embodiment of the present application in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present application.

The logic and/or steps represented in the flowcharts or otherwise described herein, e.g., an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Additionally, the computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.

It should be understood that portions of the present application may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. If implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.

It will be understood by those skilled in the art that all or part of the steps carried by the method for implementing the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in a computer readable storage medium, and when the program is executed, the program includes one or a combination of the steps of the method embodiments.

In addition, functional units in the embodiments of the present application may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a stand-alone product, may also be stored in a computer readable storage medium.

The storage medium mentioned above may be a read-only memory, a magnetic or optical disk, etc. Although embodiments of the present application have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present application, and that variations, modifications, substitutions and alterations may be made to the above embodiments by those of ordinary skill in the art within the scope of the present application.

Claims

1. A method for detecting key points of a target object is characterized by comprising the following steps:

determining the predicted position of a target key point in an image to be detected according to the position of the target key point of a target object in the previous N frames of images to be detected; wherein N is greater than 1;

and detecting key points of the target area.

2. The method for detecting the key points of the target object according to claim 1, wherein the determining the predicted positions of the target key points in the image to be detected according to the positions of the target key points of the target object in the previous N frames of images of the image to be detected comprises:

3. The method for detecting key points of a target object according to claim 1, wherein before determining the predicted position of the target key points in the image to be detected according to the positions of the target key points of the target object in the previous N frames of images to be detected, the method further comprises:

4. The method for detecting key points of a target object according to claim 3, wherein the determining the recognition accuracy of each key point of the target object in the previous N frames of images comprises:

5. The method for detecting key points of a target object according to claim 1, wherein the determining a target area corresponding to the target object from the image to be detected according to the predicted position of the key point of the target object comprises:

6. The method for detecting key points of a target object according to any one of claims 1 to 5, wherein after the key point detection of the target region, the method further comprises:

7. The method according to claim 6, wherein the checking whether the target area contains the target object according to the detection result comprises:

8. The method of claim 6, further comprising:

9. A target object keypoint detection apparatus, comprising:

and the detection module is used for detecting key points of the target area.

10. An electronic device comprising a processor and a memory;

wherein the processor runs a program corresponding to the executable program code by reading the executable program code stored in the memory for implementing the target object keypoint detection method according to any one of claims 1 to 8.