CN116894894A - Method, apparatus, device and storage medium for determining motion of avatar - Google Patents
Method, apparatus, device and storage medium for determining motion of avatar Download PDFInfo
- Publication number
- CN116894894A CN116894894A CN202310728732.1A CN202310728732A CN116894894A CN 116894894 A CN116894894 A CN 116894894A CN 202310728732 A CN202310728732 A CN 202310728732A CN 116894894 A CN116894894 A CN 116894894A
- Authority
- CN
- China
- Prior art keywords
- target
- key point
- target image
- image
- point information
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 44
- 230000006870 function Effects 0.000 claims abstract description 146
- 230000009471 action Effects 0.000 claims abstract description 55
- 238000012545 processing Methods 0.000 claims abstract description 24
- 230000001133 acceleration Effects 0.000 claims description 16
- 238000001514 detection method Methods 0.000 claims description 14
- 238000004590 computer program Methods 0.000 claims description 12
- 230000008447 perception Effects 0.000 claims description 9
- 238000010276 construction Methods 0.000 claims description 7
- 238000013473 artificial intelligence Methods 0.000 abstract description 3
- 230000003190 augmentative effect Effects 0.000 abstract description 2
- 238000013135 deep learning Methods 0.000 abstract description 2
- 238000010586 diagram Methods 0.000 description 11
- 238000004891 communication Methods 0.000 description 9
- 238000005457 optimization Methods 0.000 description 8
- 230000008569 process Effects 0.000 description 8
- 238000013528 artificial neural network Methods 0.000 description 4
- 238000004364 calculation method Methods 0.000 description 4
- 239000011159 matrix material Substances 0.000 description 4
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 230000000694 effects Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 238000003491 array Methods 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000009795 derivation Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000014509 gene expression Effects 0.000 description 1
- 230000005484 gravity Effects 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 230000001953 sensory effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000012549 training Methods 0.000 description 1
- 238000013518 transcription Methods 0.000 description 1
- 230000035897 transcription Effects 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T13/00—Animation
- G06T13/20—3D [Three Dimensional] animation
- G06T13/40—3D [Three Dimensional] animation of characters, e.g. humans, animals or virtual beings
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/46—Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
- G06V10/462—Salient features, e.g. scale invariant feature transforms [SIFT]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/20—Movements or behaviour, e.g. gesture recognition
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Multimedia (AREA)
- Evolutionary Computation (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Psychiatry (AREA)
- Human Computer Interaction (AREA)
- Social Psychology (AREA)
- Artificial Intelligence (AREA)
- Computing Systems (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Software Systems (AREA)
- Processing Or Creating Images (AREA)
Abstract
The disclosure provides an action determining method, device, equipment and storage medium for an avatar, relates to the technical field of artificial intelligence, in particular to the technical fields of computer vision, augmented reality, virtual reality, deep learning and the like, and can be applied to scenes such as metauniverse, digital people and the like. The action determining method of the avatar includes: aiming at a target image in a video of an actual image, processing the target image to acquire initial key point information of the actual image corresponding to the target image; constructing a total objective function based on the initial key point information and the objective key point information of the virtual image to be determined, and determining the objective key point information based on the total objective function and the objective constraint condition corresponding to the real image; and determining an action of the avatar based on the target keypoint information. The present disclosure can improve the motion accuracy of the avatar.
Description
Technical Field
The disclosure relates to the technical field of artificial intelligence, in particular to the technical fields of computer vision, augmented reality, virtual reality, deep learning and the like, and can be applied to scenes such as metauniverse, digital people and the like, in particular to an action determining method, an action determining device, action determining equipment and a storage medium of an virtual image.
Background
Dynamic capture refers to motion capture, typically the transcription of the motion of a real actor (also referred to as a man in life) into an avatar (e.g., a virtual person) in a three-dimensional (3D) game or animation.
Disclosure of Invention
The present disclosure provides an action determining method, apparatus, device and storage medium for an avatar.
According to an aspect of the present disclosure, there is provided an action determining method of an avatar, including: aiming at a target image in a video of an actual image, processing the target image to acquire initial key point information of the actual image corresponding to the target image; constructing a total objective function based on the initial key point information and the objective key point information of the virtual image to be determined; wherein the total objective function is used for representing error information between the initial key point information and the target key point information; determining the target key point information based on the total target function and the target constraint condition corresponding to the real image; and determining an action of the avatar based on the target keypoint information.
According to another aspect of the present disclosure, there is provided an action determining apparatus of an avatar, including: the acquisition module is used for processing the target image aiming at the target image in the video of the real image so as to acquire initial key point information of the real image corresponding to the target image; the construction module is used for constructing a total objective function according to the initial key point information and the objective key point information of the virtual image to be determined; wherein the total objective function is used for representing error information between the initial key point information and the target key point information; the solving module is used for determining the target key point information according to the total target function and the target constraint condition corresponding to the real image; and the determining module is used for determining the action of the virtual image according to the target key point information.
According to another aspect of the present disclosure, there is provided an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of the above aspects.
According to another aspect of the present disclosure, there is provided a non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method according to any one of the above aspects.
According to another aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements a method according to any of the above aspects.
According to the technical scheme, the action precision of the virtual image can be improved.
It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.
Drawings
The drawings are for a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:
FIG. 1 is a schematic diagram according to a first embodiment of the present disclosure;
fig. 2 is a schematic diagram of an application scenario provided according to an embodiment of the present disclosure;
fig. 3 is a schematic overall architecture of an action determining method of an avatar provided according to an embodiment of the present disclosure;
FIG. 4 is a schematic diagram according to a second embodiment of the present disclosure;
FIG. 5 is a schematic diagram according to a third embodiment of the present disclosure;
FIG. 6 is a schematic diagram according to a fourth embodiment of the present disclosure;
fig. 7 is a schematic view of an electronic device for implementing an action determining method of an avatar according to an embodiment of the present disclosure.
Detailed Description
Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
Taking the virtual image as an example, in the related art, a video of a person in the middle is generally processed, 3D position coordinates of key points of the person in the middle are obtained, the 3D position coordinates are input into a driving model, and the output is an action of the virtual person.
In order to improve the motion accuracy of a virtual person, an effort is usually made to improve the accuracy of a driving model, that is, a large number of training samples are usually used to train the driving model to improve the model accuracy.
However, this way of increasing the data volume is costly and the model performance is limited after the data volume has increased to a certain extent.
In order to improve the motion accuracy of the avatar, the present disclosure provides the following embodiments.
Fig. 1 is a schematic diagram according to a first embodiment of the present disclosure. The present embodiment provides an action determining method of an avatar, the method including:
101. and processing the target image aiming at the target image in the video of the real image to acquire initial key point information of the real image corresponding to the target image.
102. Constructing a total objective function based on the initial key point information and the objective key point information of the virtual image to be determined; wherein the overall objective function is used to characterize error information between the initial keypoint information and the target keypoint information.
103. And determining the target key point information based on the total target function and the target constraint condition corresponding to the real image.
104. And determining an action of the avatar based on the target keypoint information.
The real figure is a real figure for driving the action of the virtual figure, is generally a real actor and can be called as a middle person.
The target image refers to an image to be processed in the video, and each image in the video can be used as the target image.
For distinction, the keypoint information of the real character may be referred to as initial keypoint information and the keypoint information of the avatar may be referred to as target keypoint information.
The key point may also be referred to as an articulation point. Taking a human body as an example, one or more part points on the human body can be preset as key points.
The key point information may include location information of the key point, and/or angle information of the key point. Taking a three-dimensional avatar as an example, the position information of the key points refers to 3D position coordinates of the key points. The angle information of the key point may be an included angle between a connecting line between the key point and a set reference point and between the key point and the set reference line, and the included angle may specifically be an euler angle, that is, three-dimensional angle information.
In the embodiment, the target key point information of the avatar may be solved, and then the motion of the avatar may be determined based on the target key point information, unlike the motion of the avatar determined based on the driving model in the related art.
The total objective function is constructed based on the initial key point information and the target key point information, wherein the initial key point information is a known quantity, the target key point information is an unknown quantity (variable), and the target key point information is solved by minimizing the total objective function under a certain constraint condition.
The overall objective function is used to characterize error information between the initial keypoint information and the target keypoint information, and may be constructed based on the absolute value of the difference between the initial keypoint information and the target keypoint information, for example.
Taking the target key point information as the target key point angle information as an example, after the target key point angle information is determined, the action of the virtual image can be determined by adopting a related technology. For example, a correspondence between the key point angle information and the action may be preconfigured, and the action of the avatar may be determined based on the correspondence and the target key point angle information.
In this embodiment, the target key point information is determined based on the constraint condition corresponding to the total objective function and the real image, and the motion of the virtual image is determined based on the target key point information, so that the cost can be reduced and the motion precision can be improved compared with the mode of determining the motion based on the driving model because the driving model is not required.
In order to better understand the embodiments of the present disclosure, application scenarios to which the embodiments of the present disclosure may be applied are described.
Fig. 2 is a schematic diagram of an application scenario provided in an embodiment of the present disclosure. The scene comprises: user terminal 201 and server 202, user terminal 201 may include: personal computers (Personal Computer, PCs), cell phones, tablet computers, notebook computers, smart wearable devices, and the like. The server 202 may be a cloud server or a local server, and the user terminal 201 and the server 202 may communicate using a communication network, for example, a wired network and/or a wireless network.
The user terminal 201 may transmit a video of the avatar to the server 202, and the server 202 determines an action of the avatar based on the video. This process may be referred to as off-line dynamic capture. Information that the process needs to display, such as the finalized actions, may be displayed by the user terminal 201. It will be appreciated that the action of the avatar may also be determined locally at the user terminal based on the video if the user terminal has corresponding capabilities.
Specifically, the target key point information of the virtual person can be determined based on the video of the real image, and the process can be called global optimization; and determining the action of the avatar based on the target key point information.
As shown in fig. 3, the overall architecture of the act of determining an avatar based on a video may include: the video of the real figure is processed by the touchdown detection network 301 to determine the foot touchdown state of the real figure, and the video is processed by the body perception network 302 to determine the initial key point information of the real figure. Performing global optimization 303 according to the ground contact state of the foot and the initial key point information to obtain target key point information; further, the action of the avatar may be determined based on the identified target key point information based on a correspondence between the key point information of the predetermined avatar and the action.
In combination with the above application scenario, the present disclosure further provides the following embodiments.
Fig. 4 is a schematic view of a second embodiment of the present disclosure, which provides an action determining method of an avatar, the method including:
401. and acquiring a video of the real image.
402. And aiming at a target image in the video, processing the target image by adopting a pre-trained body perception network so as to acquire initial key point information corresponding to the target image.
The body perception network is a deep neural network and is trained in advance, the input of the deep neural network is an image, and the output of the deep neural network is key point information of a real image contained in the image.
The keypoint information of the real character may be referred to as initial keypoint information.
The number of the target images is usually multiple, and the initial key point information corresponding to each target image can be obtained after the body perception network is adopted to process each target image.
Taking a three-dimensional avatar as an example, the key point information may include: three-dimensional position coordinates and/or three-dimensional angle information of the key points.
In this embodiment, the initial key point information is obtained through the body sensing network trained in advance, so that the accuracy of the initial key point information can be improved by utilizing the excellent performance of the body sensing network, and the accuracy of the determined virtual image action can be further improved.
403. And processing the target image by adopting a pre-trained foot touch detection network to determine the foot touch state of the real image in the target image.
The foot touchdown state is used for representing whether the foot of the real figure touches the ground or not, and can be represented by 1 or 0, wherein 1 represents touchdown and 0 represents non-touchdown.
The foot touchdown detection network is also a deep neural network, and is pre-trained, wherein the input is an image, and the output is the foot touchdown state of the real image contained in the image.
The body sensing network and the foot contact detection network can be specifically trained by adopting related technologies.
In this embodiment, the foot touchdown state is obtained through the pre-trained foot touchdown detection network, so that the excellent performance of the foot touchdown detection network can be utilized, the accuracy of the foot touchdown state is improved, and the accuracy of the determined avatar action is further improved.
404. And constructing a total objective function based on the initial key point information and the objective key point information of the virtual image to be determined.
The target key point information refers to key point information of the virtual image.
The target key point information is an unknown quantity, which is a quantity to be solved.
The initial keypoint information may be obtained through a body aware network, a known quantity.
The target images are usually multiple, corresponding initial key point information can be obtained for each target image, sub-target functions corresponding to each target image are built according to the initial key point information corresponding to each target image, and then a total target function is built based on each sub-target function.
That is, the target image is plural; aiming at each target image in a plurality of target images, constructing a sub-target function corresponding to each target image based on initial key point information corresponding to each target image and the target key point information; and constructing the total objective function based on the sub-objective functions corresponding to the target images.
When the total objective function is constructed based on the sub objective functions, specifically, the total objective function may be obtained by adding the sub objective functions.
In this embodiment, by constructing the sub-objective function corresponding to each objective image and then constructing the total objective function based on the sub-objective function, the total objective function can refer to the information of each objective image, that is, the global information of the video can be referred to, and then global optimization can be performed to obtain the objective key point information, so that the accuracy of the objective key point information is improved, and further, the accuracy of the determined virtual image action is improved.
For each sub-objective function, the sub-objective function may be constructed based on two parts, one based on the unknown target keypoint information and the known initial keypoint information and the other based on the known initial keypoint information.
That is, for the respective target images, performing: constructing a first function based on initial key point information corresponding to each target image and the target key point information; constructing a second function based on the initial key point information corresponding to each target image; and constructing the sub-objective function based on the first function and the second function.
Assuming that the first function is denoted by F1, the second function is denoted by F2, and the sub-objective function is denoted by F3, the calculation formula is f3=f1+f2.
In this embodiment, the first function is constructed by the initial key point information and the target key point information, the second function is constructed based on the initial key point information, and the sub-objective function is constructed based on the first function and the second function, so that the optimization target can be constructed from different angles, the accuracy of the sub-objective function is improved, and the accuracy of the action is further improved.
Further, the second function F2 may be constructed based on a variety of error functions, for example, the error functions may include a first error function G1, a second error function G2, and a third error function G3, and f2=g1+g2+g3.
For the three error functions described above, it may include: the initial key point information includes: two-dimensional position coordinates of key points of the real image, three-dimensional position coordinates of the key points of the real image and three-dimensional angle information; the constructing a second function based on the initial key point information corresponding to each target image includes: carrying out projection processing on the two-dimensional position coordinates to obtain three-dimensional projection coordinates, and constructing a first error function according to the three-dimensional projection coordinates and the three-dimensional position coordinates; determining a current speed corresponding to the target image based on the three-dimensional angle information, and constructing a second error function according to the current speed and a previous speed corresponding to a previous image of the target image; determining the current acceleration corresponding to the target image based on the three-dimensional angle information, and constructing a third error function according to the current acceleration and the previous acceleration corresponding to the previous image of the target image; constructing the second function based on the first error function, the second error function, and the third error function.
In this embodiment, error functions of different dimensions can be constructed based on the initial key point information, and a second function containing multiple dimension information can be constructed based on the error functions of multiple dimensions, so that a total objective function containing multiple dimension information can be constructed, global optimization effect is improved, accuracy of target key point information is improved, and action accuracy of a virtual object is improved.
Specifically, the calculation formula of the total objective function may be:
wherein F is the overall objective function, F k Is the sub-objective function corresponding to the kth target image, and N is the total number of target images.
For F k The calculation formula can be as follows:
F k =F1+G1+G2+G3
wherein,,
wherein q k The angle information (initial key point angle information) of the key point of the real image in the kth target image can be obtained through a body perception network;the target key point angle information corresponding to the kth target image is the unknown quantity to be solved; />The three-dimensional projection coordinates of the key points of the real image in the kth target image can be obtained by three-dimensional projection of the two-dimensional position coordinates of the key points of the real image in the kth target image on the target image and camera parameters; />The three-dimensional position coordinates (initial key point coordinates) of the key points of the real image in the kth target image can be obtained through a body perception network; / >Is the velocity of the key point of the real character in the kth target image (initial key point velocity),>the speed of the key point of the real image in the k-1 target image; />Acceleration (initial key point acceleration) which is the key point of the real figure in the kth target image, is->Is the acceleration of the key point of the real figure in the k-1 target image. The initial key point speed is obtained after deriving the initial key point angle information, and the initial key point acceleration is obtained after deriving the initial key point speed (or performing secondary derivation on the initial key point angle information); the i is an absolute value operation.
405. And constructing a target constraint condition corresponding to the real image based on the foot touchdown state and the initial key point information.
The ground contact constraint condition can be constructed based on the foot ground contact state and the foot key point speed of the real image; constructing dynamic constraint conditions based on the initial key point information of the real image; and taking the triggering constraint condition and the dynamic constraint condition as the target constraint condition.
That is, the target constraint includes: touchdown constraints C1, and dynamics constraints C2.
The expressions of C1 and C2 may be:
Ground contact constraint C1:
when contact=1
Wherein,,is the foot key point speed of the real image and can be used for the foot key point position coordinate p foot Deriving to obtain the key points of the feetThe location coordinates may be obtained through a body awareness network; contact=1 indicates that the foot strike condition is a touchdown, which may be obtained by a touchdown detection network.
Kinetic constraint C2:
wherein M is a mass matrix; b is a centrifugal force and Golgi force matrix, G is a gravity matrix, tau is a joint moment, J is a Jacobian matrix, fc is an external force applied, and tau and Fc are obtained from a kinetic network; q is initial key point angle information.
In this embodiment, the target constraint conditions include a touchdown constraint condition and a dynamics constraint condition, and by constructing multiple constraint conditions, accuracy of target key point information can be improved, so that accuracy of the determined avatar actions is improved.
406. And determining the target key point information based on the target function and the target constraint condition.
After the total objective function and the objective constraint condition are constructed, the objective key point information can be solved by minimizing the total objective function based on the objective constraint condition.
For example, a nonlinear solver may be used to solve for the target keypoint information.
407. And determining an action of the avatar based on the target keypoint information.
For example, the target key point information is key point angle information of the avatar, a correspondence between the key point angle information and the action may be preconfigured, and the action of the avatar may be determined based on the correspondence and the solved key point angle information of the avatar.
Fig. 5 is a schematic diagram according to a third embodiment of the present disclosure. The present embodiment provides an action determining apparatus of an avatar, as shown in fig. 5, the apparatus 500 including: an acquisition module 501, a construction module 502, a solution module 503 and a determination module 504.
The acquisition module 501 is configured to process a target image in a video of an actual image, so as to acquire initial key point information of the actual image corresponding to the target image; the construction module 502 is configured to construct a total objective function according to the initial key point information and the objective key point information of the avatar to be determined; wherein the total objective function is used for representing error information between the initial key point information and the target key point information; the solving module 503 is configured to determine the target key point information according to the total objective function and a target constraint condition corresponding to the real image; the determining module 504 is configured to determine an action of the avatar according to the target keypoint information.
In this embodiment, the target key point information is determined based on the constraint condition corresponding to the total objective function and the real image, and the motion of the virtual image is determined based on the target key point information, so that the cost can be reduced and the motion precision can be improved compared with the mode of determining the motion based on the driving model because the driving model is not required.
In some embodiments, the target image is a plurality of; the building block 502 is further configured to: aiming at each target image in a plurality of target images, constructing a sub-target function corresponding to each target image based on initial key point information corresponding to each target image and the target key point information; and constructing the total objective function based on the sub-objective functions corresponding to the target images.
In this embodiment, by constructing the sub-objective function corresponding to each objective image and then constructing the total objective function based on the sub-objective function, the total objective function can refer to the information of each objective image, that is, the global information of the video can be referred to, and then global optimization can be performed to obtain the objective key point information, so that the accuracy of the objective key point information is improved, and further, the accuracy of the determined virtual image action is improved.
In some embodiments, the building block 502 is further configured to: constructing a first function based on initial key point information corresponding to each target image and the target key point information aiming at each target image; constructing a second function based on initial key point information corresponding to each target image aiming at each target image; for each target image, constructing the sub-target function based on the first function and the second function.
In this embodiment, the first function is constructed by the initial key point information and the target key point information, the second function is constructed based on the initial key point information, and the sub-objective function is constructed based on the first function and the second function, so that the optimization target can be constructed from different angles, the accuracy of the sub-objective function is improved, and the accuracy of the action is further improved.
In some embodiments, the initial keypoint information comprises: two-dimensional position coordinates of key points of the real image, three-dimensional position coordinates of the key points of the real image and three-dimensional angle information;
the building block 502 is further configured to: carrying out projection processing on the two-dimensional position coordinates to obtain three-dimensional projection coordinates, and constructing a first error function according to the three-dimensional projection coordinates and the three-dimensional position coordinates; determining a current speed corresponding to the target image based on the three-dimensional angle information, and constructing a second error function according to the current speed and a previous speed corresponding to a previous image of the target image; determining the current acceleration corresponding to the target image based on the three-dimensional angle information, and constructing a third error function according to the current acceleration and the previous acceleration corresponding to the previous image of the target image; constructing the second function based on the first error function, the second error function, and the third error function.
In this embodiment, error functions of different dimensions can be constructed based on the initial key point information, and a second function containing multiple dimension information can be constructed based on the error functions of multiple dimensions, so that a total objective function containing multiple dimension information can be constructed, global optimization effect is improved, accuracy of target key point information is improved, and action accuracy of a virtual object is improved.
In some embodiments, the obtaining module 501 is further configured to: and aiming at the target image, processing the target image by adopting a pre-trained body perception network to acquire the initial key point information corresponding to the target image.
In this embodiment, the initial key point information is obtained through the body sensing network trained in advance, so that the accuracy of the initial key point information can be improved by utilizing the excellent performance of the body sensing network, and the accuracy of the determined virtual image action can be further improved.
Fig. 6 is a schematic diagram according to a fourth embodiment of the present disclosure. The present embodiment provides an action determining apparatus of an avatar, as shown in fig. 6, the apparatus 600 including: an acquisition module 601, a construction module 602, a solution module 603 and a determination module 604.
For details of the acquisition module 601, the construction module 602, the solving module 603 and the determining module 604, reference may be made to the previous embodiment.
In some embodiments, the apparatus 600 may further include: the constraint module 605 is configured to construct a touchdown constraint condition based on the foot touchdown state and the foot key point speed of the real figure; constructing dynamic constraint conditions based on the initial key point information of the real image; and taking the touchdown constraint condition and the dynamic constraint condition as the target constraint condition.
In this embodiment, the target constraint conditions include a touchdown constraint condition and a dynamics constraint condition, and by constructing multiple constraint conditions, accuracy of target key point information can be improved, so that accuracy of the determined avatar actions is improved.
In some embodiments, the apparatus 600 may further include: the detection module 606 is configured to process the target image by using a pre-trained touchdown detection network to determine the foot touchdown state.
In this embodiment, the foot touchdown state is obtained through the pre-trained foot touchdown detection network, so that the excellent performance of the foot touchdown detection network can be utilized, the accuracy of the foot touchdown state is improved, and the accuracy of the determined avatar action is further improved.
It is to be understood that in the embodiments of the disclosure, the same or similar content in different embodiments may be referred to each other.
It can be understood that "first", "second", etc. in the embodiments of the present disclosure are only used for distinguishing, and do not indicate the importance level, the time sequence, etc.
In the technical scheme of the disclosure, the related processes of collecting, storing, using, processing, transmitting, providing, disclosing and the like of the personal information of the user accord with the regulations of related laws and regulations, and the public order colloquial is not violated.
According to embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium and a computer program product.
Fig. 7 illustrates a schematic block diagram of an example electronic device 700 that may be used to implement embodiments of the present disclosure. The electronic device 700 is intended to represent various forms of digital computers, such as laptops, desktops, workstations, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile apparatuses, such as personal digital assistants, cellular telephones, smartphones, wearable devices, and other similar computing apparatuses. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 7, the electronic device 700 includes a computing unit 701 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 702 or a computer program loaded from a storage unit 707 into a Random Access Memory (RAM) 703. In the RAM 703, various programs and data required for the operation of the electronic device 700 may also be stored. The computing unit 701, the ROM 702, and the RAM 703 are connected to each other through a bus 704. An input/output (I/O) interface 705 is also connected to bus 704.
Various components in the electronic device 700 are connected to the I/O interface 705, including: an input unit 706 such as a keyboard, a mouse, etc.; an output unit 707 such as various types of displays, speakers, and the like; a storage unit 708 such as a magnetic disk, an optical disk, or the like; and a communication unit 709 such as a network card, modem, wireless communication transceiver, etc. The communication unit 709 allows the electronic device 700 to exchange information/data with other devices through a computer network, such as the internet, and/or various telecommunication networks.
The computing unit 701 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 701 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The calculation unit 701 performs the respective methods and processes described above, for example, an action determination method of an avatar. For example, in some embodiments, the avatar's action determination method may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as storage unit 708. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 700 via the ROM 702 and/or the communication unit 707. When the computer program is loaded into the RAM 703 and executed by the computing unit 701, one or more steps of the above-described avatar's action determination method may be performed. Alternatively, in other embodiments, the computing unit 701 may be configured to perform the avatar's action determination method in any other suitable manner (e.g., by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems-on-chips (SOCs), complex Programmable Logic Devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.
The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical hosts and VPS service ("Virtual Private Server" or simply "VPS") are overcome. The server may also be a server of a distributed system or a server that incorporates a blockchain.
It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel or sequentially or in a different order, provided that the desired results of the technical solutions of the present disclosure are achieved, and are not limited herein.
The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.
Claims (17)
1. A method of determining an action of an avatar, comprising:
aiming at a target image in a video of an actual image, processing the target image to acquire initial key point information of the actual image corresponding to the target image;
constructing a total objective function based on the initial key point information and the objective key point information of the virtual image to be determined; wherein the total objective function is used for representing error information between the initial key point information and the target key point information;
Determining the target key point information based on the total target function and the target constraint condition corresponding to the real image;
and determining an action of the avatar based on the target keypoint information.
2. The method of claim 1, wherein,
the target image is a plurality of images;
the constructing a total objective function based on the initial key point information and the objective key point information of the avatar to be determined, including:
aiming at each target image in a plurality of target images, constructing a sub-target function corresponding to each target image based on initial key point information corresponding to each target image and the target key point information;
and constructing the total objective function based on the sub-objective functions corresponding to the target images.
3. The method of claim 2, wherein the constructing, for each target image of the plurality of target images, a sub-objective function corresponding to the each target image based on the initial keypoint information and the target keypoint information corresponding to the each target image, comprises:
constructing a first function based on initial key point information corresponding to each target image and the target key point information aiming at each target image;
Constructing a second function based on initial key point information corresponding to each target image aiming at each target image;
for each target image, constructing the sub-target function based on the first function and the second function.
4. The method of claim 3, wherein,
the initial key point information includes: two-dimensional position coordinates of key points of the real image, three-dimensional position coordinates of the key points of the real image and three-dimensional angle information;
the constructing a second function based on the initial key point information corresponding to each target image includes:
carrying out projection processing on the two-dimensional position coordinates to obtain three-dimensional projection coordinates, and constructing a first error function according to the three-dimensional projection coordinates and the three-dimensional position coordinates;
determining a current speed corresponding to the target image based on the three-dimensional angle information, and constructing a second error function according to the current speed and a previous speed corresponding to a previous image of the target image;
determining the current acceleration corresponding to the target image based on the three-dimensional angle information, and constructing a third error function according to the current acceleration and the previous acceleration corresponding to the previous image of the target image;
Constructing the second function based on the first error function, the second error function, and the third error function.
5. The method of claim 1, further comprising:
constructing a touchdown constraint condition based on the foot touchdown state and the foot key point speed of the real image;
constructing dynamic constraint conditions based on the initial key point information of the real image;
and taking the touchdown constraint condition and the dynamic constraint condition as the target constraint condition.
6. The method of claim 5, further comprising:
and processing the target image by adopting a pre-trained touchdown detection network to determine the foot touchdown state.
7. The method according to any one of claims 1-6, wherein the processing the target image for the target image in the video of the real avatar to obtain initial keypoint information of the real avatar corresponding to the target image includes:
and aiming at the target image, processing the target image by adopting a pre-trained body perception network to acquire the initial key point information corresponding to the target image.
8. An action determining apparatus of an avatar, comprising:
The acquisition module is used for processing the target image aiming at the target image in the video of the real image so as to acquire initial key point information of the real image corresponding to the target image;
the construction module is used for constructing a total objective function according to the initial key point information and the objective key point information of the virtual image to be determined; wherein the total objective function is used for representing error information between the initial key point information and the target key point information;
the solving module is used for determining the target key point information according to the total target function and the target constraint condition corresponding to the real image;
and the determining module is used for determining the action of the virtual image according to the target key point information.
9. The apparatus of claim 8, wherein,
the target image is a plurality of images;
the build module is further to:
aiming at each target image in a plurality of target images, constructing a sub-target function corresponding to each target image based on initial key point information corresponding to each target image and the target key point information;
and constructing the total objective function based on the sub-objective functions corresponding to the target images.
10. The apparatus of claim 9, wherein the build module is further to:
constructing a first function based on initial key point information corresponding to each target image and the target key point information aiming at each target image;
constructing a second function based on initial key point information corresponding to each target image aiming at each target image;
for each target image, constructing the sub-target function based on the first function and the second function.
11. The apparatus of claim 9, wherein,
the initial key point information includes: two-dimensional position coordinates of key points of the real image, three-dimensional position coordinates of the key points of the real image and three-dimensional angle information;
the build module is further to:
carrying out projection processing on the two-dimensional position coordinates to obtain three-dimensional projection coordinates, and constructing a first error function according to the three-dimensional projection coordinates and the three-dimensional position coordinates;
determining a current speed corresponding to the target image based on the three-dimensional angle information, and constructing a second error function according to the current speed and a previous speed corresponding to a previous image of the target image;
Determining the current acceleration corresponding to the target image based on the three-dimensional angle information, and constructing a third error function according to the current acceleration and the previous acceleration corresponding to the previous image of the target image;
constructing the second function based on the first error function, the second error function, and the third error function.
12. The apparatus of claim 8, further comprising:
the constraint module is used for constructing a touchdown constraint condition based on the foot touchdown state and the foot key point speed of the real image; constructing dynamic constraint conditions based on the initial key point information of the real image; and taking the touchdown constraint condition and the dynamic constraint condition as the target constraint condition.
13. The apparatus of claim 12, further comprising:
and the detection module is used for processing the target image by adopting a pre-trained touchdown detection network so as to determine the foot touchdown state.
14. The apparatus of any of claims 8-13, wherein the acquisition module is further to:
and aiming at the target image, processing the target image by adopting a pre-trained body perception network to acquire the initial key point information corresponding to the target image.
15. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-7.
16. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-7.
17. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any of claims 1-7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310728732.1A CN116894894B (en) | 2023-06-19 | 2023-06-19 | Method, apparatus, device and storage medium for determining motion of avatar |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310728732.1A CN116894894B (en) | 2023-06-19 | 2023-06-19 | Method, apparatus, device and storage medium for determining motion of avatar |
Publications (2)
Publication Number | Publication Date |
---|---|
CN116894894A true CN116894894A (en) | 2023-10-17 |
CN116894894B CN116894894B (en) | 2024-08-27 |
Family
ID=88310122
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310728732.1A Active CN116894894B (en) | 2023-06-19 | 2023-06-19 | Method, apparatus, device and storage medium for determining motion of avatar |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116894894B (en) |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107038712A (en) * | 2016-12-14 | 2017-08-11 | 中国科学院沈阳自动化研究所 | A kind of woodpecker intra-articular irrigation method based on motion image sequence |
CN113420719A (en) * | 2021-07-20 | 2021-09-21 | 北京百度网讯科技有限公司 | Method and device for generating motion capture data, electronic equipment and storage medium |
CN113822097A (en) * | 2020-06-18 | 2021-12-21 | 北京达佳互联信息技术有限公司 | Single-view human body posture recognition method and device, electronic equipment and storage medium |
US20220358705A1 (en) * | 2020-02-18 | 2022-11-10 | Boe Technology Group Co., Ltd. | Method for generating animation figure, electronic device and storage medium |
CN115841534A (en) * | 2022-10-27 | 2023-03-24 | 阿里巴巴(中国)有限公司 | Method and device for controlling motion of virtual object |
CN115857676A (en) * | 2022-11-23 | 2023-03-28 | 上海哔哩哔哩科技有限公司 | Display method and system based on virtual image |
CN116092120A (en) * | 2022-12-30 | 2023-05-09 | 北京百度网讯科技有限公司 | Image-based action determining method and device, electronic equipment and storage medium |
CN116206370A (en) * | 2023-05-06 | 2023-06-02 | 北京百度网讯科技有限公司 | Driving information generation method, driving device, electronic equipment and storage medium |
-
2023
- 2023-06-19 CN CN202310728732.1A patent/CN116894894B/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107038712A (en) * | 2016-12-14 | 2017-08-11 | 中国科学院沈阳自动化研究所 | A kind of woodpecker intra-articular irrigation method based on motion image sequence |
US20220358705A1 (en) * | 2020-02-18 | 2022-11-10 | Boe Technology Group Co., Ltd. | Method for generating animation figure, electronic device and storage medium |
CN113822097A (en) * | 2020-06-18 | 2021-12-21 | 北京达佳互联信息技术有限公司 | Single-view human body posture recognition method and device, electronic equipment and storage medium |
CN113420719A (en) * | 2021-07-20 | 2021-09-21 | 北京百度网讯科技有限公司 | Method and device for generating motion capture data, electronic equipment and storage medium |
CN115841534A (en) * | 2022-10-27 | 2023-03-24 | 阿里巴巴(中国)有限公司 | Method and device for controlling motion of virtual object |
CN115857676A (en) * | 2022-11-23 | 2023-03-28 | 上海哔哩哔哩科技有限公司 | Display method and system based on virtual image |
CN116092120A (en) * | 2022-12-30 | 2023-05-09 | 北京百度网讯科技有限公司 | Image-based action determining method and device, electronic equipment and storage medium |
CN116206370A (en) * | 2023-05-06 | 2023-06-02 | 北京百度网讯科技有限公司 | Driving information generation method, driving device, electronic equipment and storage medium |
Non-Patent Citations (1)
Title |
---|
范清等: ""基于多个关键点对应性的手部动态重建"", 《图学学报》, vol. 41, no. 5, 30 November 2020 (2020-11-30) * |
Also Published As
Publication number | Publication date |
---|---|
CN116894894B (en) | 2024-08-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP4033453A1 (en) | Training method and apparatus for target detection model, device and storage medium | |
CN114186632B (en) | Method, device, equipment and storage medium for training key point detection model | |
CN112785625B (en) | Target tracking method, device, electronic equipment and storage medium | |
CN113378770B (en) | Gesture recognition method, device, equipment and storage medium | |
CN113378712B (en) | Training method of object detection model, image detection method and device thereof | |
CN113642431A (en) | Training method and device of target detection model, electronic equipment and storage medium | |
US20140232748A1 (en) | Device, method and computer readable recording medium for operating the same | |
CN115393488B (en) | Method and device for driving virtual character expression, electronic equipment and storage medium | |
CN113362314A (en) | Medical image recognition method, recognition model training method and device | |
CN111833391A (en) | Method and device for estimating image depth information | |
CN111462179A (en) | Three-dimensional object tracking method and device and electronic equipment | |
CN114627268A (en) | Visual map updating method and device, electronic equipment and medium | |
CN115147831A (en) | Training method and device of three-dimensional target detection model | |
CN114360047A (en) | Hand-lifting gesture recognition method and device, electronic equipment and storage medium | |
CN116894894B (en) | Method, apparatus, device and storage medium for determining motion of avatar | |
CN113705390A (en) | Positioning method, positioning device, electronic equipment and storage medium | |
CN114674328B (en) | Map generation method, map generation device, electronic device, storage medium, and vehicle | |
CN111538410A (en) | Method and device for determining target algorithm in VR scene and computing equipment | |
CN112733879A (en) | Model distillation method and device for different scenes | |
CN116228939B (en) | Digital person driving method, digital person driving device, electronic equipment and storage medium | |
CN116448105B (en) | Pose updating method and device, electronic equipment and storage medium | |
CN114332416B (en) | Image processing method, device, equipment and storage medium | |
CN116229583B (en) | Driving information generation method, driving device, electronic equipment and storage medium | |
CN116301361B (en) | Target selection method and device based on intelligent glasses and electronic equipment | |
CN113033415B (en) | Data queue dynamic updating method and device, electronic equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |