CN117690163A - 3D human skeleton key point data enhancement method and system - Google Patents
3D human skeleton key point data enhancement method and system Download PDFInfo
- Publication number
- CN117690163A CN117690163A CN202311739082.7A CN202311739082A CN117690163A CN 117690163 A CN117690163 A CN 117690163A CN 202311739082 A CN202311739082 A CN 202311739082A CN 117690163 A CN117690163 A CN 117690163A
- Authority
- CN
- China
- Prior art keywords
- key point
- missing
- bone
- point data
- data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 34
- 210000000988 bone and bone Anatomy 0.000 claims abstract description 160
- 238000009499 grossing Methods 0.000 claims abstract description 17
- 238000012216 screening Methods 0.000 claims abstract description 13
- 230000002708 enhancing effect Effects 0.000 claims abstract description 12
- 238000010276 construction Methods 0.000 claims abstract description 4
- 230000002159 abnormal effect Effects 0.000 claims description 18
- 238000005457 optimization Methods 0.000 claims description 9
- 210000003557 bones of lower extremity Anatomy 0.000 claims description 7
- 238000013507 mapping Methods 0.000 claims description 7
- 238000012545 processing Methods 0.000 claims description 5
- 238000001914 filtration Methods 0.000 claims description 3
- 238000002372 labelling Methods 0.000 abstract 1
- 238000010586 diagram Methods 0.000 description 7
- 230000007812 deficiency Effects 0.000 description 3
- 230000009286 beneficial effect Effects 0.000 description 2
- 230000000295 complement effect Effects 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 230000004927 fusion Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000006467 substitution reaction Methods 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/34—Smoothing or thinning of the pattern; Morphological operations; Skeletonisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/74—Image or video pattern matching; Proximity measures in feature spaces
- G06V10/75—Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
- G06V10/757—Matching configurations of points or features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/766—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using regression, e.g. by projecting features on hyperplanes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/80—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
- G06V10/806—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/41—Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/20—Movements or behaviour, e.g. gesture recognition
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Multimedia (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Medical Informatics (AREA)
- Evolutionary Computation (AREA)
- Databases & Information Systems (AREA)
- Artificial Intelligence (AREA)
- Human Computer Interaction (AREA)
- Computational Linguistics (AREA)
- Psychiatry (AREA)
- Social Psychology (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a method and a system for enhancing 3D human skeleton key point data, wherein the method comprises the following steps: acquiring 2D bone key point data; determining a 2D bone key index of the missing limb edge connection and the number of missing bone keys; performing frame screening and smoothing on the 2D bone key point data; constructing a gesture priori constraint and data enhancement algorithm of the missing limb; and iterating the SMPLify-X model through the gesture priori constraint and data enhancement algorithm to obtain an optimal 3D skeleton SMPL-X model, and obtaining 3D skeleton key point data and a data annotation file thereof. The system comprises: the system comprises a data acquisition module, a data comparison analysis module, a data smoothing module, an algorithm construction module, an optimal model generation module and a labeling data acquisition module. By using the method, the consistency of the characteristic distribution of the key point data of the body of the missing limb and the key point data of the whole skeleton of the human body can be maintained, and the robustness and generalization capability of the algorithm are enhanced. The invention can be widely applied to the technical field of computer vision.
Description
Technical Field
The invention relates to the technical field of computer vision, in particular to a method and a system for enhancing 3D human skeleton key point data.
Background
Human skeletal keypoint data is an essential basis for some computer vision tasks, such as action classification, behavior recognition, and human re-recognition, among others. Currently, human skeletal key point data is mainly detected from a complete human image or video sequence (e.g., NTU-rgb+d and NTU-rgb+d120 datasets, which are currently widely studied). The existing computer vision algorithm uses complete human skeleton key point data as input, and cannot be well adapted to the human skeleton key point data of the missing limb; for the images of the missing body limbs, the existing human skeleton key point detection algorithm cannot carry out effective complementation and data enhancement on key points of the missing limb parts.
Disclosure of Invention
In order to solve the technical problems, the invention aims to provide a 3D human skeleton key point data enhancement method and system, which can keep consistency of missing limb human body key point data and integral human skeleton key point data characteristic distribution, enhance robustness of an algorithm and improve generalization capability of the algorithm.
The first technical scheme adopted by the invention is as follows: A3D human skeleton key point data enhancement method comprises the following steps:
acquiring 2D skeleton key point data of a person in a missing limb video frame image;
determining a 2D bone key point index of the missing limb edge connection and the number of missing bone key points based on the 2D bone key point data of the missing limb video frame image and the complete 2D bone key point data;
constructing a posture prior constraint and data enhancement algorithm of the missing limb based on the number of the missing bone key points, the 2D bone key point index of the missing limb edge connection and the video behavior category;
and taking 2D skeleton key point data of the missing limb video frame image as input, and iterating the SMPLify-X model through gesture priori constraint and data enhancement algorithm of the missing limb to obtain an optimal 3D skeleton SMPL-X model.
Further, the method also comprises the steps of obtaining the 3D skeleton key point data of the optimal 3D skeleton SMPL-X model and generating a corresponding data annotation file.
Further, after the step of determining the index of the 2D bone keypoints and the number of the missing bone keypoints of the missing limb edge connection based on the 2D bone keypoint data of the missing limb video frame image and the complete 2D bone keypoint data, the method further comprises: and carrying out frame screening and smoothing treatment on the 2D skeleton key point data of the missing limb video frame image to obtain smoothed 2D skeleton key point data.
Further, the step of performing frame screening and smoothing processing on the 2D skeleton key point data of the missing limb video frame image to obtain smoothed 2D skeleton key point data specifically includes:
polynomial regression is carried out on the same 2D bone key points in all frames of the missing limb video, and deviation between the regression value of the 2D bone key points and the initial value of the 2D bone key points is calculated;
determining abnormal bone points based on comparison results of the deviation, the minimum threshold value and the maximum threshold value, and updating values of the 2D bone key points to obtain first 2D bone key point data;
and screening the first 2D bone key point data according to the comparison result of the abnormal bone point number and the maximum abnormal bone point number of the same frame of image to obtain smoothed 2D bone key point data.
Through the preferred step, part of abnormal bone point data is screened out, so that the smoothed 2D bone key point data can keep the complete characteristics of the key points of the body of the missing limb.
Further, the step of constructing a posture prior constraint and data enhancement algorithm of the missing limb based on the number of the missing bone keypoints, the 2D bone keypoint index of the missing limb edge connection and the video behavior category specifically comprises the following steps:
acquiring complete human body posture parameters, and mapping the human body posture parameters according to the indexes of the 2D skeleton key points connected with the edges of the missing limbs to obtain the human body posture parameters corresponding to the key points of the missing bones;
calculating human body posture parameters corresponding to 2D bone key point data of the missing limb video frame image based on the complete human body posture parameters and the human body posture parameters corresponding to the missing bone key points;
constraining the human body posture parameters corresponding to the missing skeleton key points according to the video behavior category, the number of the missing skeleton key points and the 2D skeleton key point indexes connected with the edges of the missing limbs to obtain posture priori constraint parameters of the missing skeleton key points;
constructing constraint items lacking limb priori knowledge based on gesture priori constraint parameters of the key points of the lacking bones;
constructing constraint items of the whole body posture based on the complete human posture parameters;
constructing a body part posture constraint item based on the body posture parameters corresponding to the 2D skeleton key point data of the missing limb video frame image;
and combining the constraint item without the prior knowledge of the limb, the constraint item of the whole body posture and the constraint item of the posture of the body part to obtain the prior constraint of the posture of the limb and a data enhancement algorithm.
Through the optimization step, the posture priori constraint completion and data enhancement are carried out on the human skeleton points of the missing limb, the consistency of the data of the human skeleton key points of the missing limb and the characteristic distribution of the key points of the complete human skeleton is maintained, the robustness of a computer vision algorithm based on the key points of the human skeleton can be well enhanced, and the generalization performance of the algorithm is improved.
Further, the posture prior constraint and data enhancement algorithm of the missing limb has the following expression:
θ‘ m,t =Z(θ m,t ,N m,t ,I e,t ,C)
θ d,t =θ b,t -θ m,t
wherein E (beta) t ,θ b,t ,θ d,t ,θ′ m,t ,K t ,J′ est,t ) Representing an objective optimization function based on the pose a priori constraints of the missing limb and the data enhancement algorithm,constraint item representing whole body posture->Representing body part posture constraints->Constraint item representing lack of a priori knowledge of the limb, +.>Weight value of constraint item representing whole body gesture, < ->Weight value representing body part pose constraints, < +.>Weight value, beta, of constraint item representing lack of prior knowledge of limb t Representing the body type parameter, theta, corresponding to the t-th frame video frame image b,t Representing the complete human body posture parameter, K corresponding to the t-th frame video frame image t Representing camera parameters corresponding to the t-th frame video frame image, J' est,t Representing smoothed 2D bone keypoint data, θ d,t Human body posture parameters corresponding to 2D skeleton key point data representing missing limb video frame image, theta' m,t Posture priori constraint parameters, theta, of missing skeleton key points corresponding to t-th frame video frame image m,t Representing human body posture parameters corresponding to key points of missing bones, N m,t Represents the total bone key point number of the missing limb at the t-th frame, I e,t The set of index values representing the edge-connected skeletal points of all missing limb bones of frame t, C represents the behavior class of the video sequence.
Furthermore, 2D skeleton key point data of the video frame image of the missing limb is taken as input, and the SMPLify-X model is iterated through gesture priori constraint and data enhancement algorithm of the missing limb to obtain an optimal 3D skeleton SMPL-X model, which specifically comprises the following steps:
3D human body posture estimation is carried out on the 2D bone key point data of the missing limb video frame image based on the SMPLify-X model and the gesture priori constraint and data enhancement algorithm of the missing limb, so that an SMPL-X model of the 2D bone key point data is obtained;
acquiring 3D bone key point data based on an SMPL-X model of the 2D bone key point data, and mapping the 3D bone key point data into the 2D bone key point data on an image plane through camera parameters;
matching the 2D bone key point data on the image plane with the 2D bone key point data of the missing limb video frame image to obtain a matching result;
and carrying out optimization iteration on the gesture priori constraint of the missing limb and the weight of the data enhancement algorithm based on the matching result to obtain an optimal 3D skeleton SMPL-X model.
Through the optimized step, the feature fusion of the original bone points and the missing limb bone points is realized, the matching degree of the original bone points and the missing limb bone points is higher, and the posture of the missing limb part and the overall posture of the three-dimensional human body model are more accurate.
The second technical scheme adopted by the invention is as follows: a 3D human skeletal key point data enhancement system, comprising:
the data acquisition module is used for acquiring 2D skeleton key point data of the person in the missing limb video frame image;
the data comparison analysis module is used for determining 2D bone key point indexes of the missing limb edge connection and the number of the missing bone key points based on the 2D bone key point data of the missing limb video frame image and the complete 2D bone key point data;
the data smoothing module is used for carrying out frame screening and smoothing processing on the 2D skeleton key point data of the missing limb video frame image to obtain smoothed 2D skeleton key point data;
the algorithm construction module is used for constructing a gesture priori constraint and data enhancement algorithm of the missing limb based on the number of the missing bone key points, the 2D bone key point index connected with the edge of the missing limb and the video behavior category
The optimal model generation module takes the smoothed 2D skeleton key point data as input, and iterates the SMPLify-X model through the gesture priori constraint of the missing limb and the data enhancement algorithm to obtain an optimal 3D skeleton SMPL-X model;
the annotation data acquisition module is used for acquiring the 3D skeleton key point data of the optimal 3D skeleton SMPL-X model and generating a corresponding data annotation file.
The method and the system have the beneficial effects that: according to the invention, by constructing the gesture priori constraint and data enhancement algorithm of the missing limb, gesture priori constraint completion and data enhancement are carried out on the skeleton points of the human body of the missing limb, the consistency of the integral skeleton key point data and the key point data characteristic distribution of the human body of the missing limb is maintained, the computer vision task algorithm and the model are helped to better understand and capture the key characteristics of input data, the problem caused by data offset is relieved, the algorithm is better adapted to the unseen data, the robustness of the algorithm is enhanced, and the generalization performance of the algorithm under different data distribution is improved; on the premise of ensuring that the bone point characteristics of the original lost limb video frame image are not lost, the characteristic fusion of the original bone point and the lost limb bone point is realized. The matching degree of the original skeleton points and the missing limb skeleton points is higher, and the posture of the missing limb part and the overall posture of the three-dimensional human body model are more accurate.
Drawings
FIG. 1 is a flow chart of the steps of a method for enhancing key point data of a 3D human skeleton according to the present invention;
FIG. 2 is a block diagram of a 3D human skeletal key point data enhancement system of the present invention;
FIG. 3 is a schematic diagram showing the overall steps of a method for enhancing the key point data of 3D human skeleton according to the present invention;
FIG. 4 is a schematic diagram of a complete 2D skeletal keypoint connection relationship of a 3D human skeletal keypoint data enhancement method of the present invention;
FIG. 5 is a schematic diagram of the connection relationship of 2D skeletal key points of a missing limb according to the 3D human skeletal key point data enhancement method of the present invention;
FIG. 6 is a schematic diagram showing the contrast of the effect of the method for enhancing the key point data of the 3D human skeleton according to the present invention with the prior art, and FIG. 6 (a) shows the original image of the missing limb (missing leg limb); FIG. 6 (b) is a schematic diagram showing 2D bone keypoints and connections detected by an OpenPose detector; FIG. 6 (c) is a diagram showing a SMPL-X human 3D model obtained directly using the SMPLify-X model; FIG. 6 (D) shows a SMPL-X human 3D model map obtained by the present invention; FIG. 6 (e) shows a schematic representation of 3D skeletal keypoints extracted from a SMPL-X human 3D model derived directly from the SMPlify-X model; FIG. 6 (f) shows a schematic representation of 3D skeletal key points extracted from the SMPL-X human 3D model of the present invention.
Detailed Description
The invention will now be described in further detail with reference to the drawings and to specific examples. The step numbers in the following embodiments are set for convenience of illustration only, and the order between the steps is not limited in any way, and the execution order of the steps in the embodiments may be adaptively adjusted according to the understanding of those skilled in the art.
Referring to fig. 1 and 3, the present invention provides a 3D human skeleton key point data enhancement method, which includes the following steps:
s1, acquiring 2D skeleton key point data of a person in a missing limb video frame image;
specifically, the missing limb video is processed into T Zhang Queshi limb video frame images at a resolution of 300×256, with a setting of a frame rate of 30 fps. Extracting 2D bone key point coordinate data of the missing limb character in each frame by using an OpenPose algorithm, wherein the extracted 2D bone key point coordinate data in the t-th frame is J est,t 。J est,t For a 25×2 matrix, the rows represent 25 different nodes extracted by openPose, the node numbers are 0-24, the columns represent x-coordinate and y-coordinate of the nodes, and the complete 2D skeleton key point connection relationship is shown in fig. 4.
S2, determining 2D bone key point indexes of the missing limb edge connection and the number of the missing bone key points based on the 2D bone key point data of the missing limb video frame image and the complete 2D bone key point data;
specifically, comparing the 2D skeleton key point data of the missing limb video frame image with the complete 2D skeleton key point data, and obtaining a 2D skeleton key point index at the joint of the existing skeleton and the missing limb through an edge connection index function, wherein the expression is as follows:
I e,t ={n t |n t =Eage(J k,t );k=1,...,N;t=1,...,T}
wherein, age (J) k,t ) Representing edge join index functions, J k,t Represents the kth bone key point detected by the image of the t frame, n t Index value representing the connection of existing bone key points to the edge of the missing limb bone, I e,t The set of index values representing the edge-connected skeleton key points of all the missing skeletons at T frames, n=25, represents the 25 different nodes of the human body detected by openpoise, and T represents the total frame number of the missing limb video sequence.
The number of the key points of the missing bones is obtained through a statistical function, and the expression is as follows:
N m,t =CountBlank(J k,t )
J k,t =J est,t ={(x k,t ,y k,t )|k=1,...,N;t=1,...,T}
wherein N is m,t Indicating the number of missing skeletal keypoints at frame t, countBlank (J) k,t ) Statistical function, x, representing the number of missing bone keypoints in a sequence of bone keypoints k,t 、y k,t Respectively representing the x coordinate and the y coordinate of the skeleton key point at the t-th frame.
Referring to fig. 5, wherein black dots represent openwise detected skeletal keypoints, i.e., 2D skeletal keypoints of a missing limb video frame image; gray points represent bone key points corresponding to the missing limb, three points 8, 9 and 12 represent bone key points connected with edges of the detected bone key points, and the points 8, 9 and 12 are index values of the points.
S3, carrying out frame screening and smoothing treatment on the 2D skeleton key point data of the missing limb video frame image to obtain smoothed 2D skeleton key point data;
specifically, the expression of the frame filtering and smoothing process is as follows:
J′ est,t =Y(J est,t ,L,H,N max )
wherein J' est,t Representing smoothed 2D bone keypoint data, J est,t 2D skeleton key point data representing missing limb video frame images detected by OpenPose algorithm, wherein L represents a minimum threshold, H represents a maximum threshold and N represents a maximum threshold max Represents the maximum number of abnormal skeletal points that can be discarded per frame, Y (J) est,t ,L,H,N max ) A frame filtering and smoothing function based on polynomial regression is represented.
The frame screening and smoothing processing functions of polynomial regression operate specifically as follows:
s3.1, performing polynomial regression on the same 2D skeleton key points in all frames of the missing limb video to obtain regression values, and then calculating the deviation between the 2D skeleton key point regression values in each frame and the initial values of the 2D skeleton key points;
s3.2, determining abnormal bone points based on comparison results of the deviation, the minimum threshold and the maximum threshold, and updating values of the 2D bone key points to obtain first 2D bone key point data;
specifically, if the deviation is greater than the maximum threshold H, the bone point under the time frame is regarded as an abnormal bone point that is allowed to be discarded, and the abnormal bone point is discarded; if the deviation is between the maximum threshold H and the minimum threshold L, the bone point is an abnormal bone point which is not allowed to be discarded, and the regression value of the abnormal bone point is required to replace the initial value at the moment so as to reduce the deviation; if the deviation is smaller than the minimum threshold L, the bone point is a normal bone point, the initial value of the bone point is reserved, and finally the first 2D bone key point data are obtained.
And S3.3, screening the first 2D bone key point data according to the comparison result of the abnormal bone point number and the maximum abnormal bone point number of the same frame of image to obtain smoothed 2D bone key point data.
Specifically, if the number of abnormal skeletal points allowed to be discarded in the same frame is counted to exceed the maximum number of abnormal skeletal points allowed to be discarded in each frame N max The skeleton point data of the frame are regarded as abnormal data, and are deleted to obtain smoothed 2D skeleton key point data J '' est,t 。
S4, constructing a gesture priori constraint and data enhancement algorithm of the missing limb based on the number of the missing bone key points, the 2D bone key point index connected with the edge of the missing limb and the video behavior category;
s4.1, acquiring complete human body posture parameters, and mapping human body posture parameters according to indexes of 2D skeleton key points connected with the edges of the missing limbs to obtain human body posture parameters corresponding to the key points of the missing bones;
specifically, the expression of the human body posture parameters corresponding to the key points of the missing bones is as follows:
θ m,t =F(θ b,t ,I e,t )
wherein θ m,t Representing human body posture parameters corresponding to key points of missing bones, I e,t Index value set, θ, representing the edge of missing bone connecting bone points b,t And F represents a mapping function, and human body posture parameters are mapped by utilizing the indexes of the complete human body posture parameters and the edge connection skeleton points of the missing skeleton to obtain human body posture parameters corresponding to the key points of the missing skeleton.
S4.2, calculating human body posture parameters corresponding to 2D skeleton key point data of the missing limb video frame image based on the complete human body posture parameters and the human body posture parameters corresponding to the missing skeleton key points;
specifically, the human body posture parameters corresponding to the 2D skeleton key point data of the missing limb video frame image have the following calculation expression:
θ d,t =θ b,t -θ m,t
wherein θ d,t And representing the human body posture parameters corresponding to the 2D skeleton key point data of the missing limb video frame image detected by OpenPose.
S4.3, constraining the human body posture parameters corresponding to the missing skeleton key points according to the video behavior category, the number of the missing skeleton key points and the 2D skeleton key point indexes connected with the edges of the missing limbs to obtain posture priori constraint parameters of the missing skeleton key points;
specifically, the posture prior constraint parameters of the key points of the missing bones are expressed as follows:
θ‘ m,t =Z(θ m,t ,N m,t ,I e,t ,C)
wherein, θ' m,t Posture priori constraint parameters theta for representing missing skeleton key points m,t Representing human body posture parameters corresponding to key points of missing bones, N m,t Represents the total bone key point number of the missing limb at the t-th frame, I e,t Representing the edges of all missing limb bones at frame tThe set of index values connecting the skeletal points, C represents the behavior class of the video sequence, Z represents the pose a priori constraint function of the missing limb.
S4.4, constructing constraint items with missing limb priori knowledge based on gesture priori constraint parameters of key points of missing bones, wherein the constraint items with missing limb priori knowledge are expressed as
S4.5, constructing constraint items of the whole body posture based on the whole body posture parameters, wherein the constraint items of the whole body posture are expressed as
S4.6, constructing a body part posture constraint item based on a body posture parameter corresponding to 2D skeleton key point data of the missing limb video frame image, wherein the body part in the body part posture constraint item refers to the body part in the missing limb video frame image in the embodiment, and the body part posture constraint item is expressed as
S4.7, combining the constraint item of the prior knowledge of the missing limb, the constraint item of the whole body posture and the constraint item of the posture of the body part to obtain the prior constraint of the posture of the missing limb and a data enhancement algorithm.
Specifically, the posture prior constraint and data enhancement algorithm of the missing limb has the following expression:
wherein E (beta) t ,θ b,t ,θ d,t ,θ′ m,t ,K t ,J′ est,t ) Representing based on missing limbThe pose a priori constraints and the objective optimization function of the data enhancement algorithm,constraint item representing whole body posture->Representing body part posture constraints->Constraint item representing lack of a priori knowledge of the limb, +.>Weight value of constraint item representing whole body gesture, < ->Weight value representing body part pose constraints, < +.>Weight value, beta, of constraint item representing lack of prior knowledge of limb t Representing the body type parameter, theta, corresponding to the t-th frame video frame image b,t Representing the complete human body posture parameter, K corresponding to the t-th frame video frame image t Representing camera parameters corresponding to the t-th frame video frame image, J' est,t Representing smoothed 2D bone keypoint data, θ d,t Human body posture parameters corresponding to 2D skeleton key point data representing missing limb video frame image, theta' m,t And representing the gesture priori constraint parameters of the missing skeleton key points corresponding to the t-th frame video frame image.
The problem that all parts in the model are fused and penetrated mutually can be well avoided in the optimization process through constraint items of the whole body gesture; the body part posture constraint item can well reserve and further optimize skeleton points corresponding to the existing body part in the optimization process; the constraint item of the deficiency limb priori knowledge can well complement and data enhance the 3D bone points of the deficiency limb part based on the deficiency limb priori knowledge.
S5, taking 2D skeleton key point data of the missing limb video frame image as input, and iterating the SMPLify-X model through gesture priori constraint and data enhancement algorithm of the missing limb to obtain an optimal 3D skeleton SMPL-X model;
specifically, after frame screening and smoothing are carried out on 2D skeleton key point data of a missing limb video frame image, input data is changed into smoothed 2D skeleton key point data; 3D human body posture estimation is carried out on the smoothed 2D bone key point data based on the SMPLify-X model and the gesture priori constraint and data enhancement algorithm of the missing limb, so that an SMPL-X model of the 2D bone key point data is obtained; acquiring 3D bone key point data based on an SMPL-X model of the 2D bone key point data, and mapping the 3D bone key point data into the 2D bone key point data on an image plane through camera parameters; matching the 2D bone key point data on the image plane with the smoothed 2D bone key point data to obtain a matching result; and carrying out optimization iteration on the gesture priori constraint of the missing limb and the weight of the data enhancement algorithm based on the matching result to obtain an optimal 3D skeleton SMPL-X model.
S6, acquiring 3D skeleton key point data of the optimal 3D skeleton SMPL-X model, and generating a corresponding data annotation file.
Referring to fig. 6, comparing the effect of the method of the present invention with the prior art without the method of the present invention, fig. 6 (a) shows an original image of the missing limb (missing leg limb); FIG. 6 (b) shows 2D bone keypoints and connective bone detected by an OpenPose detector; the SMPL-X human body 3D model obtained by combining the gesture priori constraint and data enhancement algorithm of the missing limb provided by the invention with the SMPLify-X model is shown in the figure 6 (D), the gesture priori constraint and data enhancement algorithm of the missing limb provided by the invention are not adopted, the SMPL-X human body 3D model obtained by directly utilizing the SMPLify-X model is shown in the figure 6 (c), and compared with the figure 6 (c) and the figure 6 (D), the method can find that the SMPL-X human body 3D model obtained by the method of the invention predicts the human body gesture more accurately, and the gesture of the missing limb in the image can be restored more. The 3D bone keypoints extracted from the model shown in fig. 6 (c) are shown in fig. 6 (e), the 3D bone keypoints extracted from the model shown in fig. 6 (D) are shown in fig. 6 (f), and comparing fig. 6 (e) with fig. 6 (f), it can be found that the proposed posture priori constraint and data enhancement algorithm of the missing limb realizes excellent performance in the aspect of complement and data enhancement of the 3D bone keypoint data of the missing limb.
As shown in fig. 2, the present invention provides a 3D human skeletal key point data enhancement system, comprising:
the data acquisition module is used for acquiring 2D skeleton key point data of the person in the missing limb video frame image;
the data comparison analysis module is used for determining 2D bone key point indexes of the missing limb edge connection and the number of the missing bone key points based on the 2D bone key point data of the missing limb video frame image and the complete 2D bone key point data;
the data smoothing module is used for carrying out frame screening and smoothing processing on the 2D skeleton key point data of the missing limb video frame image to obtain smoothed 2D skeleton key point data;
the algorithm construction module is used for constructing a gesture priori constraint and data enhancement algorithm of the missing limb based on the number of the missing bone key points, the 2D bone key point index connected with the edge of the missing limb and the video behavior category
The optimal model generation module takes the smoothed 2D skeleton key point data as input, and iterates the SMPLify-X model through the gesture priori constraint of the missing limb and the data enhancement algorithm to obtain an optimal 3D skeleton SMPL-X model;
the annotation data acquisition module is used for acquiring the 3D skeleton key point data of the optimal 3D skeleton SMPL-X model and generating a corresponding data annotation file.
The content in the method embodiment is applicable to the system embodiment, the functions specifically realized by the system embodiment are the same as those of the method embodiment, and the achieved beneficial effects are the same as those of the method embodiment.
While the preferred embodiment of the present invention has been described in detail, the invention is not limited to the embodiment, and various equivalent modifications and substitutions can be made by those skilled in the art without departing from the spirit of the invention, and these modifications and substitutions are intended to be included in the scope of the present invention as defined in the appended claims.
Claims (8)
1. The 3D human skeleton key point data enhancement method is characterized by comprising the following steps of:
acquiring 2D skeleton key point data of a person in a missing limb video frame image;
determining a 2D bone key point index of the missing limb edge connection and the number of missing bone key points based on the 2D bone key point data of the missing limb video frame image and the complete 2D bone key point data;
constructing a posture prior constraint and data enhancement algorithm of the missing limb based on the number of the missing bone key points, the 2D bone key point index of the missing limb edge connection and the video behavior category;
and taking 2D skeleton key point data of the missing limb video frame image as input, and iterating the SMPLify-X model through gesture priori constraint and data enhancement algorithm of the missing limb to obtain an optimal 3D skeleton SMPL-X model.
2. The method for enhancing 3D human skeletal key point data of claim 1, further comprising: and acquiring 3D skeleton key point data of the optimal 3D skeleton SMPL-X model, and generating a corresponding data annotation file.
3. The method for enhancing 3D human bone keypoint data according to claim 1, further comprising, after the step of determining the index of 2D bone keypoints and the number of missing bone keypoints of the missing limb edge connection based on the 2D bone keypoint data of the missing limb video frame image and the complete 2D bone keypoint data: and carrying out frame screening and smoothing treatment on the 2D skeleton key point data of the missing limb video frame image to obtain smoothed 2D skeleton key point data.
4. The method for enhancing 3D human skeleton key point data according to claim 3, wherein the step of performing frame filtering and smoothing on the 2D skeleton key point data of the missing limb video frame image to obtain smoothed 2D skeleton key point data specifically comprises:
polynomial regression is carried out on the same 2D bone key points in all frames of the missing limb video, and deviation between the regression value of the 2D bone key points and the initial value of the 2D bone key points is calculated;
determining abnormal bone points based on comparison results of the deviation, the minimum threshold value and the maximum threshold value, and updating values of the 2D bone key points to obtain first 2D bone key point data;
and screening the first 2D bone key point data according to the comparison result of the abnormal bone point number and the maximum abnormal bone point number of the same frame of image to obtain smoothed 2D bone key point data.
5. The method for enhancing 3D human skeletal keypoint data according to claim 1, wherein said step of constructing a gesture priori constraint and data enhancement algorithm for the missing limb based on the number of the missing skeletal keypoints, the 2D skeletal keypoint index of the missing limb edge connection, and the video behavior class specifically comprises:
acquiring complete human body posture parameters, and mapping the human body posture parameters according to the indexes of the 2D skeleton key points connected with the edges of the missing limbs to obtain the human body posture parameters corresponding to the key points of the missing bones;
calculating human body posture parameters corresponding to 2D bone key point data of the missing limb video frame image based on the complete human body posture parameters and the human body posture parameters corresponding to the missing bone key points;
constraining the human body posture parameters corresponding to the missing skeleton key points according to the video behavior category, the number of the missing skeleton key points and the 2D skeleton key point indexes connected with the edges of the missing limbs to obtain posture priori constraint parameters of the missing skeleton key points;
constructing constraint items lacking limb priori knowledge based on gesture priori constraint parameters of the key points of the lacking bones;
constructing constraint items of the whole body posture based on the complete human posture parameters;
constructing a body part posture constraint item based on the body posture parameters corresponding to the 2D skeleton key point data of the missing limb video frame image;
and combining the constraint item without the prior knowledge of the limb, the constraint item of the whole body posture and the constraint item of the posture of the body part to obtain the prior constraint of the posture of the limb and a data enhancement algorithm.
6. The method for enhancing 3D human skeletal key point data according to claim 1, wherein the posture prior constraint and data enhancement algorithm of the missing limb has the following expression:
θ′ m,t =Z(θ m,t ,N m,t ,l e,t ,C)
θ d,t =θ b,t -θ m,t
wherein E (beta) t ,θ b,t ,θ d,t ,θ′ m,t ,K t ,J′ est,t ) Representing an objective optimization function based on the pose a priori constraints of the missing limb and the data enhancement algorithm,constraint item representing whole body posture->Representing body part posture constraints->Constraint item representing lack of a priori knowledge of the limb, +.>Weight value of constraint item representing whole body gesture, < ->Weight value representing body part pose constraints, < +.>Weight value, beta, of constraint item representing lack of prior knowledge of limb t Representing the body type parameter, theta, corresponding to the t-th frame video frame image b,t Representing the complete human body posture parameter, K corresponding to the t-th frame video frame image t Representing camera parameters corresponding to the t-th frame video frame image, J' est,t Representing smoothed 2D bone keypoint data, θ d,t Human body posture parameters corresponding to 2D skeleton key point data representing missing limb video frame image, theta' m,t Posture priori constraint parameters, theta, of missing skeleton key points corresponding to t-th frame video frame image m,t Representing human body posture parameters corresponding to key points of missing bones, N m,t Represents the total bone key point number of the missing limb at the t-th frame, I e,t The set of index values representing the edge-connected skeletal points of all missing limb bones of frame t, C represents the behavior class of the video sequence.
7. The method for enhancing 3D human skeleton key point data according to claim 1, wherein the step of iterating the SMPLify-X model by taking 2D skeleton key point data of the video frame image of the missing limb as input and using the posture prior constraint of the missing limb and the data enhancing algorithm to obtain the optimal 3D skeleton SMPL-X model specifically comprises:
3D human body posture estimation is carried out on the 2D bone key point data of the missing limb video frame image based on the SMPLify-X model and the gesture priori constraint and data enhancement algorithm of the missing limb, so that an SMPL-X model of the 2D bone key point data is obtained;
acquiring 3D bone key point data based on an SMPL-X model of the 2D bone key point data, and mapping the 3D bone key point data into the 2D bone key point data on an image plane through camera parameters;
matching the 2D bone key point data on the image plane with the 2D bone key point data of the missing limb video frame image to obtain a matching result;
and carrying out optimization iteration on the gesture priori constraint of the missing limb and the weight of the data enhancement algorithm based on the matching result to obtain an optimal 3D skeleton SMPL-X model.
8. A 3D human skeletal key point data enhancement system, comprising:
the data acquisition module is used for acquiring 2D skeleton key point data of the person in the missing limb video frame image;
the data comparison analysis module is used for determining 2D bone key point indexes of the missing limb edge connection and the number of the missing bone key points based on the 2D bone key point data of the missing limb video frame image and the complete 2D bone key point data;
the data smoothing module is used for carrying out frame screening and smoothing processing on the 2D skeleton key point data of the missing limb video frame image to obtain smoothed 2D skeleton key point data;
the algorithm construction module is used for constructing a gesture priori constraint and data enhancement algorithm of the missing limb based on the number of the missing bone key points, the 2D bone key point index connected with the edge of the missing limb and the video behavior category
The optimal model generation module takes the smoothed 2D skeleton key point data as input, and iterates the SMPLify-X model through the gesture priori constraint of the missing limb and the data enhancement algorithm to obtain an optimal 3D skeleton SMPL-X model;
the annotation data acquisition module is used for acquiring the 3D skeleton key point data of the optimal 3D skeleton SMPL-X model and generating a corresponding data annotation file.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311739082.7A CN117690163A (en) | 2023-12-18 | 2023-12-18 | 3D human skeleton key point data enhancement method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311739082.7A CN117690163A (en) | 2023-12-18 | 2023-12-18 | 3D human skeleton key point data enhancement method and system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117690163A true CN117690163A (en) | 2024-03-12 |
Family
ID=90135048
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311739082.7A Pending CN117690163A (en) | 2023-12-18 | 2023-12-18 | 3D human skeleton key point data enhancement method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117690163A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118411764A (en) * | 2024-07-02 | 2024-07-30 | 江西格如灵科技股份有限公司 | Dynamic bone recognition method, system, storage medium and electronic equipment |
-
2023
- 2023-12-18 CN CN202311739082.7A patent/CN117690163A/en active Pending
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118411764A (en) * | 2024-07-02 | 2024-07-30 | 江西格如灵科技股份有限公司 | Dynamic bone recognition method, system, storage medium and electronic equipment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2021036059A1 (en) | Image conversion model training method, heterogeneous face recognition method, device and apparatus | |
WO2019128508A1 (en) | Method and apparatus for processing image, storage medium, and electronic device | |
Luo et al. | Image fusion with contextual statistical similarity and nonsubsampled shearlet transform | |
CN110555412B (en) | End-to-end human body gesture recognition method based on combination of RGB and point cloud | |
CN112598622B (en) | Breast cancer detection method integrating deep multi-instance learning and inter-packet similarity | |
WO2023151237A1 (en) | Face pose estimation method and apparatus, electronic device, and storage medium | |
CN111062329A (en) | Unsupervised pedestrian re-identification method based on augmented network | |
CN117690163A (en) | 3D human skeleton key point data enhancement method and system | |
CN117994480A (en) | Lightweight hand reconstruction and driving method | |
Lu et al. | Image-specific prior adaptation for denoising | |
CN109165551B (en) | Expression recognition method for adaptively weighting and fusing significance structure tensor and LBP characteristics | |
CN110147809A (en) | Image processing method and device, storage medium and vision facilities | |
CN115393241A (en) | Medical image enhancement method and device, electronic equipment and readable storage medium | |
JP6768101B2 (en) | Database integration device, database integration method, database integration program, and data complement device | |
CN107341476A (en) | A kind of unsupervised manikin construction method based on system-computed principle | |
CN114494732A (en) | Gait recognition method and device | |
Huang et al. | Weighted large margin nearest center distance-based human depth recovery with limited bandwidth consumption | |
CN117238018B (en) | Multi-granularity-based incremental deep and wide network living body detection method, medium and equipment | |
CN118279521B (en) | Double-hand grid reconstruction method and system based on multi-head token self-attention | |
KR101436730B1 (en) | 3d face fitting method of unseen face using active appearance model | |
CN116630138B (en) | Image processing method, apparatus, electronic device, and computer-readable storage medium | |
CN112990144B (en) | Data enhancement method and system for pedestrian re-identification | |
Wang et al. | Geometric Analysis of 3D Facial Image Data: A Survey | |
Irijanti et al. | Fast stereo correspondence using small-color census transform | |
Pachnanda et al. | Transfer Learning Model For Invariant Face Recognition |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |