Nothing Special   »   [go: up one dir, main page]

CN114581571A - Monocular human body reconstruction method and device based on IMU and forward deformation field - Google Patents

Monocular human body reconstruction method and device based on IMU and forward deformation field Download PDF

Info

Publication number
CN114581571A
CN114581571A CN202210207960.XA CN202210207960A CN114581571A CN 114581571 A CN114581571 A CN 114581571A CN 202210207960 A CN202210207960 A CN 202210207960A CN 114581571 A CN114581571 A CN 114581571A
Authority
CN
China
Prior art keywords
human body
frame
model
field
human
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210207960.XA
Other languages
Chinese (zh)
Other versions
CN114581571B (en
Inventor
要宇馨
江博艺
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Xiangyan Technology Co ltd
Original Assignee
Hangzhou Xiangyan Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Xiangyan Technology Co ltd filed Critical Hangzhou Xiangyan Technology Co ltd
Priority to CN202210207960.XA priority Critical patent/CN114581571B/en
Publication of CN114581571A publication Critical patent/CN114581571A/en
Application granted granted Critical
Publication of CN114581571B publication Critical patent/CN114581571B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T13/00Animation
    • G06T13/203D [Three Dimensional] animation
    • G06T13/403D [Three Dimensional] animation of characters, e.g. humans, animals or virtual beings
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/20Processor architectures; Processor configuration, e.g. pipelining
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T15/003D [Three Dimensional] image rendering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computer Graphics (AREA)
  • Measurement Of The Respiration, Hearing Ability, Form, And Blood Characteristics Of Living Organisms (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a monocular human body reconstruction method and device based on an IMU (inertial measurement Unit) and a forward deformation field, which can obtain an accurate and high-quality natural and real rendering result of the geometry and new visual angle of a reconstructed dynamic human body by using a section of human body motion monocular RGB (red, green and blue) video training bound with an inertial sensor, and can drive the reconstructed human body to obtain new actions by using a newly input human body posture. According to the method, firstly, an implicit symbol distance field model for expressing a reference space shape and a nerve radiation field model for expressing colors are established, then a forward skin deformation field is established to obtain the deformation from a current space point to a reference space point corresponding to a frame-by-frame picture, the color value and the transparency of each pixel point in the frame-by-frame picture are obtained, and the difference between the color value and the transparency of each pixel point in the frame-by-frame picture is taken as a main loss function. In order to avoid the self-shielding problem of monocular video, the invention provides relative position information between adjacent frames as another main loss function for training by binding the inertial sensor on the moving human body.

Description

Monocular human body reconstruction method and device based on IMU and forward deformation field
Technical Field
The invention relates to the technical field of human body image processing, in particular to a high-quality monocular RGB (red, green and blue) video human body reconstruction method and device which adopt an inertial sensor, carry out human body deformation by utilizing a forward deformation field based on a human body skeleton parameterized model and carry out body rendering by combining geometric prediction and color prediction.
Background
In recent years, human image capture has become more popular in the production of motion picture games and the like, VR/AR, virtual digital human and the like applications. While the RGB image data shot by the monocular camera is the most common and easily available dynamic human body data form, but due to the limited information, it becomes a difficult problem to reconstruct a high-precision dynamic human body sequence.
In the past, the reconstruction of a dynamic human body is often obtained by means of depth data acquired by a structured light camera, the technologies generally obtain a reconstructed dynamic human body sequence by means of real-time non-rigid tracking and fusion of the depth data, and in order to better utilize human body prior information, a template model is established in advance for a human body model to assist in tracking. Or a dense multi-view video sequence is adopted to acquire the three-dimensional shape information. Recently, models based on sparse view and single view cameras have been proposed, which typically operate in conjunction with an implicit representation of the human body, where an implicit symbolic distance field or field expressing occupancy can express more complex detailed information with less storage than a displayed representation of a grid or point cloud. The technologies generally select a reference space, combine with a neural network to represent a deformation field from frame-by-frame dynamic data to the reference space, render a model by adopting a neural rendering technology, and optimize the reconstruction of a scene by constraining a rendered picture to be as close as possible to an input picture. In order to utilize a priori information of the human body, some methods use a commonly used and very popular parameterized model of the human skeleton (SMPL model) for basic low-dimensional representation of the human body, and use the representation to assist a deformation field to enable the model to handle more complex actions and increase robustness.
However, in the multi-view camera setting, even if the camera is a sparse multi-view camera, calibration and other problems between the cameras are required, so that the use is not convenient enough; the single camera is more convenient to use, but the acquired information is limited and ambiguous. Considering that an inertial sensor (IMU) can provide three-dimensional information such as velocity acceleration and direction of adjacent frames, there is wide reference in attitude estimation of a human body, and it is easy to add the sensor to an AR/VR device or the like. Therefore, we conducted a study of the human reconstruction problem of monocular RGB video in combination with inertial sensors. Therefore, a preprocessing process of multiple cameras is not needed, and the relative three-dimensional position information of adjacent frames provided by the inertial sensor enables the reconstruction system to better process human body sequences with any motion of the occlusion, and the reconstruction system is not limited to motion sequences with small motion amplitude and the like.
In order to model a deformation field of a dynamic human body, a skeleton model parameterized model (SMPL model) of the human body is used as a low-dimensional deformation representation, and because the linear mixed skin weight expression capacity in the parameterized model is limited, only a naked human body can be expressed, a neural network is used for learning the skin weight from any point on the surface of the dressed human body to a joint point. And, we use the deformation from the reference space to be learned to the real-time space of the current frame, forward deformation to express the deformation model. Compared with backward deformation from the current image space to the reference space, forward deformation is easier to learn, deformation models of a plurality of actions are more uniform, and new skeleton deformation can be given to obtain new actions of the dressing human body.
To take advantage of the input image as a supervision, we use a volume rendering technique that combines geometric prediction and color prediction for rendering. A learnable color prediction network and a symbol distance field for expressing geometric information are established for a reference space, the model corresponding to each input picture is deformed to the reference space to be rendered by combining the deformation field to obtain a rendered image, and then the rendered image and the input picture are subjected to pixel-by-pixel similarity measurement to perform joint learning. In this way, the implicit representation of the input picture frame by frame can be learned, and the three-dimensional curved surface can be extracted by predicting the symbolic distance value of the curved surface at random sampling points in the space. The forward deformation field and reference space we have learned can also be used to generate new poses of the reconstructed body, and new motion sequences can be used to drive the virtual body motion.
Disclosure of Invention
The invention aims to provide a monocular human body reconstruction method and device based on an IMU and a forward deformation field, aiming at the defects of the prior art, and the monocular human body reconstruction method and device can reconstruct a moving human body quickly, accurately and in high quality.
The purpose of the invention is realized by the following technical scheme:
according to a first aspect of the present specification, there is provided a monocular human reconstruction method based on an IMU and a forward deformation field, the method comprising:
s1: collecting a human motion monocular RGB video wearing an inertial sensor, segmenting a human body and a background of the human motion monocular RGB video frame by frame, recording a human body position bound by the inertial sensor and a derived acceleration signal, and recording an inertial sensor frame rate and a monocular RGB video shooting frame rate;
s2: fitting the human motion monocular RGB video frame by using a pre-trained human parametric fitting model to obtain the initial estimation of the shape and the posture of the human parametric fitting model frame by frame, marking a sensor label on a standard grid point corresponding to the human parametric fitting model, and indicating whether an inertial sensor and the bound sensor label are bound at the point;
s3: establishing a forward skin deformation field model from a point in a reference space to a point in a current space corresponding to the frame-by-frame human motion picture by adopting the learnable skin weight and combining an initial estimation result obtained by S2;
s4: establishing an implicit symbolic distance field model in a reference space that expresses a reference shape using a neural network;
s5: establishing a nerve radiation field model expressing colors in a reference space by utilizing a neural network;
s6: sampling rays from a current space corresponding to the frame-by-frame human motion picture in a volume rendering mode, and then sampling points along the rays;
s7: deforming the sampling point into a reference space according to the forward skin deformation field, obtaining a sampling point symbol distance value according to the implicit symbol distance field, and obtaining a sampling point color value and transparency according to the nerve radiation field;
s8: obtaining rendering color values corresponding to the frame-by-frame human motion picture according to the color values and the transparency of all sampling points along the ray direction;
s9: finding the closest point in the standard grid points corresponding to the human body parametric fitting model for each sampling point in S6, and transferring the sensor label of the standard grid point corresponding to the human body parametric fitting model to each sampling point; recording sampling points of the bound inertial sensor as key sampling points, firstly deforming the key sampling points into a reference space, then deforming the key sampling points into a current space corresponding to an adjacent frame through S3 to obtain new coordinates, and recording Euclidean distances between original coordinates and the new coordinates of the key sampling points;
s10: training a dynamic human body reconstruction model formed by the skin deformation field, the implicit symbol distance field and the nerve radiation field; obtaining a reconstructed human body according to the trained dynamic human body reconstruction model;
s11: inputting the new human body parameterized fitting model posture into the dynamic human body reconstruction model trained in S10, and generating the new posture of the reconstructed human body.
Further, in S10, the difference between the rendering color value obtained in S8 and the color value of the corresponding point of the human body picture split in S1 is taken as a loss function 1; obtaining a human body contour map according to the rendering color values obtained in the step S8, and taking the difference between the human body contour map obtained according to the human body picture divided in the step S1 and the human body contour map as a loss function 2; obtaining the acceleration of the key sampling point according to the Euclidean distance, the frame rate of the inertial sensor and the monocular RGB video shooting frame rate in S9, and taking the difference between the acceleration and the acceleration derived by the inertial sensor as a loss function 3; the weighted sum of the loss function 1, the loss function 2, and the loss function 3 is taken as the trained loss function.
Further, the forward skin deformation field model D in S3wThe function of (d) is:
Figure BDA0003531842310000041
wherein x isc(ri,tj) Representing a point in the reference space, xd(ri,tj) Representing a sampling ray r in the current space corresponding to the frame-by-frame human motion pictureiStep-up, step-length is tjOf the sampling points of (a) are,
Figure BDA0003531842310000042
transformation matrix, n, representing bones in a parameterized fitted model of the human bodybThe number of bones; the specific skin deformation formula is as follows:
Figure BDA0003531842310000043
wherein, wkIs a learnable skinning weight; from the point x in the current space corresponding to the frame-by-frame human motion pictured(ri,tj) Deformed to point x in reference spacec(ri,tj) The root of the skin deformation formula can be solved by using a numerical optimization method such as a Newton method or a quasi-Newton method.
Further, an implicit symbolic distance field f in reference space that expresses a reference shape in S4sThe function of (d) is:
fs:xc(ri,tj)→(sij,Fij)
wherein s isijA symbol distance value, F, representing the reference shape in the obtained reference spaceijIs a feature associated with the implicit symbolic distance field that establishes a connection in a reference space between the implicit symbolic distance field that expresses the reference shape and the neural radiation field that expresses the color.
Further, the implicit symbolic distance field model that expresses the reference shape in the reference space is a neural network model, which in turn comprises: input layer, nonlinear layer, full connection layer and loss layer.
Further, in S4, a frame is selected from the initial estimation of the shape and the pose of the frame-by-frame human parametric fit model, and the implicit symbolic distance field expressing the reference shape in the reference space is initialized by the standard grid corresponding to the frame.
Further, the nerve radiation field f of the expression color in the reference space in S5cThe function of (d) is:
fc:(xc(ri,tj),ri,Fij)→cij
wherein, cijIs xc(ri,tj) Obtaining a sampling ray r according to a discretized volume rendering formulaiCorresponding color value C (r)i) Comprises the following steps:
Figure BDA0003531842310000051
wherein n is the number of sampling points on the sampling ray, alphaijTransparency for the sample point correspondences:
Figure BDA0003531842310000052
wherein phim(x)=(1+e-mx)-1Is a Sigmoid function, m is a predefined parameter, si(j+1)Is to sample the ray riStep length of up, tj+1Sample point x ofd(ri,tj+1) Deformed reference space point xc(ri,tj+1) Input implicit symbolic distance field fsThe resulting symbol distance value.
Further, the neural radiation field model for expressing colors in the reference space is a neural network model, and sequentially comprises: input layer, nonlinear layer, full connection layer and loss layer.
Further, the loss function in S10 may also contain a regularization term that may employ an Eikonal loss function that constrains an implicit symbolic distance field in the reference space that expresses the reference shape.
Further, the initial estimation of the shape and the posture of the frame-by-frame human body parametric fit model obtained in S2 can be optimized jointly with the dynamic human body reconstruction model as a learnable variable in the training process of S10.
According to a second aspect of the present specification, there is provided an apparatus for monocular human body reconstruction based on an IMU and a forward deformation field, comprising a memory having stored therein executable code, and one or more processors for implementing the method for monocular human body reconstruction based on an IMU and a forward deformation field according to the first aspect when executing the executable code.
The invention has the beneficial effects that: 1) through the establishment of an implicit symbolic distance field model expressing a reference shape in a reference space, a nerve radiation field model expressing colors and a volume rendering technology, the method can render to obtain a natural and real human video; 2) acceleration information between two adjacent frames is introduced through IMU equipment to be used as posture estimation of a parameterized model and direct constraint of a deformation field, and the method can more accurately model the deformation field so as to ensure that the reconstructed human body geometry is more accurate and the rendering effect is more natural and real; 3) because the forward deformation field can deform the reference space point of the human body model to the current space according to the attitude parameters of the parameterized model, the method can input new attitude parameters to drive the modeled human body to deform to a new attitude.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on the drawings without creative efforts.
Fig. 1 is a flowchart of a monocular human reconstruction method based on an IMU and a forward deformation field according to an exemplary embodiment;
FIG. 2 is a schematic diagram illustrating an implementation principle of a monocular human body reconstruction method based on an IMU and a forward deformation field according to an exemplary embodiment;
fig. 3 is a block diagram of a monocular human body reconstruction method and apparatus based on an IMU and a forward deformation field according to an exemplary embodiment.
Detailed Description
The technical solutions in the embodiments of the present invention are clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.
The human body modeling based on the monocular video has the problems that high-precision modeling is difficult to achieve due to the fact that information of a single visual angle is limited and a human body has dynamic deformation, and particularly when the motion change of the human body is large. Moreover, when the condition of human body self-occlusion exists, the monocular video has information loss, the inertial sensor is different from an optical camera and is not influenced by the occlusion, and the provided information is three-dimensional and is not two-dimensional information similar to a picture. Therefore, an embodiment of the present invention provides a monocular human body reconstruction method based on an IMU and a forward deformation field, as shown in fig. 1 and 2, which mainly includes the following steps:
s1: collecting a human motion monocular RGB video wearing an inertial sensor, segmenting a human body and a background of the human motion monocular RGB video frame by frame, recording a human body position bound by the inertial sensor and a derived acceleration signal, and recording an inertial sensor frame rate and a monocular RGB video shooting frame rate; the invention adopts a data set which is really acquired, uses 6 inertial sensors which are respectively bound on a wrist, an ankle, a head and a waist, and uses a monocular RGB camera to shoot a video of human motion.
S2: and fitting the human motion monocular RGB video frame by using a pre-trained human parametric fitting model to obtain the initial estimation of the shape and the posture of the human parametric fitting model frame by frame, marking a sensor label on a standard grid point corresponding to the human parametric fitting model, and indicating whether the point binds an inertial sensor and a bound sensor label.
S3: establishing a forward skin deformation field model from a point in a reference space to a point in a current space corresponding to the frame-by-frame human motion picture by adopting the learnable skin weight and combining an initial estimation result obtained by S2; wherein D of the forward skin deformation field modelwThe function is:
Figure BDA0003531842310000071
wherein x isc(ri,tj) Representing a point in the reference space, xd(ri,tj) Representing a sampling ray r in the current space corresponding to the frame-by-frame human motion pictureiStep length of up, tjOf the sampling points of (a) are,
Figure BDA0003531842310000072
transformation matrix, n, representing bones in a parameterized fitted model of the human bodybThe number of bones; the specific skin deformation formula is as follows:
Figure BDA0003531842310000073
wherein, wkIs a learnable skinning weight; from the point x in the current space corresponding to the frame-by-frame human motion pictured(ri,tj) Deformed to point x in reference spacec(ri,tj) The root of the skin deformation formula can be solved by using a numerical optimization method such as a Newton method or a quasi-Newton method.
S4: establishing an implicit symbolic distance field model expressing a reference shape in a reference space by using a neural network; implicit symbolic distance field fsThe function of (d) is:
fs:xc(ri,tj)→(sij,Fij)
wherein s isijRepresenting the resulting expression in reference spaceSymbolic distance value of reference shape, FijIs a feature associated with the implicit symbolic distance field for establishing a connection in a reference space between the implicit symbolic distance field that expresses a reference shape and the neural radiation field that expresses a color;
specifically, an implicit symbolic distance field model that expresses a reference shape in a reference space employs a neural network model, which in turn comprises: input layer, nonlinear layer, full connection layer and loss layer. One frame can be selected from the initial estimation of the shape and the posture of the human body parametric fitting model frame by frame, and an implicit symbolic distance field expressing a reference shape in a reference space is initialized by using a standard grid corresponding to the frame.
S5: establishing a nerve radiation field model expressing colors in a reference space by utilizing a neural network; nerve radiation field fcThe function of (d) is:
fc:(xc(ri,tj),ri,Fij)→cij
wherein, cijIs xc(ri,tj) According to the discretized volume rendering formula, obtaining a sampling ray riCorresponding color value C (r)i) Comprises the following steps:
Figure BDA0003531842310000081
wherein n is the number of sampling points on the sampling ray. Alpha is alphaijTransparency for the sample point correspondences:
Figure BDA0003531842310000082
wherein phim(x)=(1+e-mx)-1Is a Sigmoid function, m is a predefined parameter, si(j+1)Is to sample the ray riStep length of up, tj+1Sample point x ofd(ri,tj+1) Deformed reference space point xc(ri,tj+1) Inputting implicit symbolic distance fieldsfsObtaining a symbol distance value;
specifically, the neural radiation field model for expressing colors in the reference space adopts a neural network model, and sequentially comprises the following steps: input layer, nonlinear layer, full connection layer and loss layer.
S6: and sampling rays from a current space corresponding to the frame-by-frame human motion picture by adopting a volume rendering mode, and then sampling points along the rays.
S7: and deforming the sampling point into a reference space according to the forward skin deformation field, obtaining a sampling point symbol distance value according to the implicit symbol distance field, and obtaining a sampling point color value and transparency according to the nerve radiation field.
S8: and obtaining rendering color values corresponding to the frame-by-frame human motion picture according to the color values and the transparencies of all the sampling points along the ray direction.
S9: finding the closest point in the standard grid points corresponding to the human body parametric fitting model for each sampling point in S6, and transferring the sensor label of the standard grid point corresponding to the human body parametric fitting model to each sampling point; and marking the sampling points of the bound inertial sensor as key sampling points, firstly deforming the key sampling points into a reference space, then deforming the key sampling points into a current space corresponding to an adjacent frame through S3 to obtain new coordinates, and recording Euclidean distances between original coordinates and the new coordinates of the key sampling points.
S10: training a dynamic human body reconstruction model formed by the skin deformation field, the implicit symbol distance field and the nerve radiation field;
taking the difference between the rendering color value obtained in the step S8 and the color value of the corresponding point of the human body picture segmented in the step S1 as a loss function 1; obtaining a human body contour map according to the rendering color values obtained in the step S8, and taking the difference between the human body contour map obtained according to the human body picture divided in the step S1 and the human body contour map as a loss function 2; obtaining the acceleration of the key sampling point according to the Euclidean distance, the frame rate of the inertial sensor and the monocular RGB video shooting frame rate in S9, and taking the difference between the acceleration and the acceleration derived by the inertial sensor as a loss function 3; taking the weighted sum of the loss function 1, the loss function 2 and the loss function 3 as a trained loss function; in addition, the loss function can also contain a regularization term, which can employ an Eikonal loss function that constrains an implicit symbolic distance field in the reference space that expresses the reference shape; obtaining a reconstructed human body according to the trained dynamic human body reconstruction model;
further, the initial estimation of the shape and the posture of the frame-by-frame human body parametric fit model obtained in S2 can be used as a learnable variable in the training process of S10 to be optimized in combination with the dynamic human body reconstruction model.
S11: inputting the new human body parameterized fitting model posture into the dynamic human body reconstruction model trained in S10, and generating the new posture of the reconstructed human body.
Corresponding to the embodiment of the monocular human body reconstruction method based on the IMU and the forward deformation field, the invention also provides an embodiment of a monocular human body reconstruction device based on the IMU and the forward deformation field.
Referring to fig. 3, the monocular human body reconstructing device based on the IMU and the forward deformation field according to the embodiment of the present invention includes a memory and one or more processors, where the memory stores executable codes, and the processors execute the executable codes to implement the monocular human body reconstructing method based on the IMU and the forward deformation field according to the embodiment.
The embodiment of the monocular human body reconstruction device based on the IMU and the forward deformation field of the present invention can be applied to any device with data processing capability, such as a computer or other device. The device embodiments may be implemented by software, or by hardware, or by a combination of hardware and software. The software implementation is taken as an example, and as a logical device, the device is formed by reading corresponding computer program instructions in the nonvolatile memory into the memory for running through the processor of any device with data processing capability. From a hardware aspect, as shown in fig. 3, the present invention is a hardware structure diagram of any device with data processing capability in which a monocular human body reconstruction device based on an IMU and a forward deformation field is located, except for the processor, the memory, the network interface, and the nonvolatile memory shown in fig. 3, any device with data processing capability in which the device is located in the embodiment may also include other hardware according to the actual function of the any device with data processing capability, which is not described again.
The implementation process of the functions and actions of each unit in the above device is specifically described in the implementation process of the corresponding step in the above method, and is not described herein again.
For the device embodiments, since they substantially correspond to the method embodiments, reference may be made to the partial description of the method embodiments for relevant points. The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the scheme of the invention. One of ordinary skill in the art can understand and implement it without inventive effort.
An embodiment of the present invention further provides a computer-readable storage medium, on which a program is stored, and when the program is executed by a processor, the monocular human body reconstruction method based on an IMU and a forward deformation field in the foregoing embodiments is implemented.
The computer readable storage medium may be an internal storage unit, such as a hard disk or a memory, of any data processing capability device described in any of the foregoing embodiments. The computer readable storage medium may also be any external storage device of a device with data processing capabilities, such as a plug-in hard disk, a Smart Media Card (SMC), an SD Card, a Flash memory Card (Flash Card), etc. provided on the device. Further, the computer readable storage medium may include both an internal storage unit and an external storage device of any data processing capable device. The computer-readable storage medium is used for storing the computer program and other programs and data required by the arbitrary data processing-capable device, and may also be used for temporarily storing data that has been output or is to be output.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in the process, method, article, or apparatus that comprises the element.
The foregoing description has been directed to specific embodiments of this disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.
The terminology used in the description of the one or more embodiments is for the purpose of describing the particular embodiments only and is not intended to be limiting of the description of the one or more embodiments. As used in one or more embodiments of the present specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.
It should be understood that although the terms first, second, third, etc. may be used in one or more embodiments of the present description to describe various information, such information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of one or more embodiments herein. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.
The above description is only for the purpose of illustrating the preferred embodiments of the one or more embodiments of the present disclosure, and is not intended to limit the scope of the one or more embodiments of the present disclosure, and any modifications, equivalent substitutions, improvements, etc. made within the spirit and principle of the one or more embodiments of the present disclosure should be included in the scope of the one or more embodiments of the present disclosure.

Claims (10)

1. A monocular human body reconstruction method based on an IMU and a forward deformation field is characterized by comprising the following steps:
s1: collecting a human motion monocular RGB video wearing an inertial sensor, segmenting a human body and a background of the human motion monocular RGB video frame by frame, and recording the position of the human body bound by the inertial sensor;
s2: fitting the human motion monocular RGB video frame by using a pre-trained human parametric fitting model to obtain the initial estimation of the shape and the posture of the human parametric fitting model frame by frame, and marking a sensor label on a standard grid point corresponding to the human parametric fitting model;
s3: establishing a forward skin deformation field model from a point in a reference space to a point in a current space corresponding to the frame-by-frame human motion picture by adopting the learnable skin weight and combining an initial estimation result obtained by S2;
s4: establishing an implicit symbolic distance field model that expresses a reference shape in a reference space;
s5: establishing a nerve radiation field model for expressing colors in a reference space;
s6: sampling rays from a current space corresponding to the frame-by-frame human motion picture in a volume rendering mode, and then sampling points along the rays;
s7: deforming the sampling point into a reference space according to the forward skin deformation field, obtaining a sampling point symbol distance value according to the implicit symbol distance field, and obtaining a sampling point color value and transparency according to the nerve radiation field;
s8: obtaining rendering color values corresponding to the frame-by-frame human motion picture according to the color values and the transparency of all sampling points along the ray direction;
s9: finding the closest point in the standard grid points corresponding to the human body parameterized fitting model for each sampling point in S6, and transferring the sensor labels of the standard grid points to the sampling points; recording sampling points of the bound inertial sensor as key sampling points, firstly deforming to a reference space, then deforming to a current space corresponding to an adjacent frame through S3 to obtain new coordinates, and recording Euclidean distances between original coordinates and the new coordinates of the key sampling points;
s10: training a dynamic human body reconstruction model formed by the skin deformation field, the implicit symbol distance field and the nerve radiation field, and obtaining a reconstructed human body according to the trained dynamic human body reconstruction model;
s11: inputting the new human body parameterized fitting model posture into the dynamic human body reconstruction model trained in S10, and generating the new posture of the reconstructed human body.
2. The monocular human body reconstruction method based on the IMU and the forward deformation field of claim 1, wherein the forward skin deformation field model D in S3wThe function of (d) is:
Figure FDA0003531842300000021
wherein x isc(ri,tj) Representing points in reference space, xd(ri,tj) Representing a sampling ray r in the current space corresponding to the frame-by-frame human motion pictureiStep length of up, tjOf the sampling points of (a) are,
Figure FDA0003531842300000022
transformation matrix, n, representing bones in a parameterized fitted model of the human bodybThe number of bones; the skin deformation formula is:
Figure FDA0003531842300000023
wherein, wkIs a learnable skinning weight.
3. The method of claim 2, wherein the implicit symbolic distance field f is the distance field of S4sThe function of (d) is:
fs:xc(ri,tj)→(sij,Fij)
wherein s isijA symbol distance value, F, representing the reference shape in the obtained reference spaceijIs a feature associated with the implicit symbolic distance field that establishes a connection in a reference space between the implicit symbolic distance field that expresses the reference shape and the neural radiation field that expresses the color.
4. The method of claim 1, wherein the step S4 is implemented by selecting a frame from initial estimates of the shape and pose of the frame-by-frame parametric fit model of the human body, and initializing an implicit symbolic distance field in a reference space that expresses a reference shape with a standard grid corresponding to the frame.
5. The method of claim 3, wherein the nerve radiation field f in S5 is a single-eye human body reconstruction method based on IMU and forward deformation fieldcThe function of (d) is:
fc:(xc(ri,tj),ri,Fij)→cij
wherein, cijIs xc(ri,tj) According to the discretized volume rendering formula, obtaining a sampling ray riCorresponding color value C (r)i) Comprises the following steps:
Figure FDA0003531842300000024
wherein n is the number of sampling points on the sampling ray, alphaijTransparency for the sample point correspondences:
Figure FDA0003531842300000031
wherein phim(x)=(1+e-mx)-1Is a Sigmoid function, m is a predefined parameter, si(j+1)Is to sample the ray riStep length of up, tj+1Sample point x ofd(ri,tj+1) Deformed reference space point xc(ri,tj+1) Input implicit symbolic distance field fsThe resulting symbol distance value.
6. The method of claim 1, wherein the implicit symbolic distance field model is a neural network model that, in turn, comprises: the device comprises an input layer, a nonlinear layer, a full connection layer and a loss layer;
the nerve radiation field model is a neural network model and sequentially comprises: input layer, nonlinear layer, full connection layer and loss layer.
7. The method for monocular human body reconstruction based on IMU and forward deformation field according to any one of claims 1 to 6, wherein in S10, the difference between the rendering color value obtained in S8 and the color value of the corresponding point of the human body picture segmented in S1 is used as a loss function 1;
obtaining a human body contour map according to the rendering color values obtained in the step S8, and taking the difference between the human body contour map obtained according to the human body picture divided in the step S1 and the human body contour map as a loss function 2;
obtaining the acceleration of a key sampling point according to the Euclidean distance, the frame rate of the inertial sensor and the video shooting frame rate in S9, and taking the difference between the acceleration and the acceleration derived by the inertial sensor as a loss function 3;
the weighted sum of the three loss functions is taken as the trained loss function.
8. The method of claim 1, wherein the loss function in S10 further comprises an Eikonal loss function that constrains an implicit symbolic distance field in the reference space that represents the reference shape.
9. The IMU and forward deformation field-based monocular human reconstruction method of claim 1, wherein the initial estimation of the shape and pose of the frame-by-frame human parametric fit model obtained in S2 is jointly optimized with the dynamic human reconstruction model as a learnable variable in the training process of S10.
10. An apparatus for IMU and forward deformation field based monocular human reconstruction, comprising a memory having stored therein executable code and one or more processors, wherein the processors, when executing the executable code, are configured to implement the IMU and forward deformation field based monocular human reconstruction method according to any one of claims 1-9.
CN202210207960.XA 2022-03-04 2022-03-04 Monocular human body reconstruction method and device based on IMU and forward deformation field Active CN114581571B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210207960.XA CN114581571B (en) 2022-03-04 2022-03-04 Monocular human body reconstruction method and device based on IMU and forward deformation field

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210207960.XA CN114581571B (en) 2022-03-04 2022-03-04 Monocular human body reconstruction method and device based on IMU and forward deformation field

Publications (2)

Publication Number Publication Date
CN114581571A true CN114581571A (en) 2022-06-03
CN114581571B CN114581571B (en) 2024-10-22

Family

ID=81772437

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210207960.XA Active CN114581571B (en) 2022-03-04 2022-03-04 Monocular human body reconstruction method and device based on IMU and forward deformation field

Country Status (1)

Country Link
CN (1) CN114581571B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114863037A (en) * 2022-07-06 2022-08-05 杭州像衍科技有限公司 Single-mobile-phone-based human body three-dimensional modeling data acquisition and reconstruction method and system
CN114863035A (en) * 2022-07-05 2022-08-05 南京理工大学 Implicit representation-based three-dimensional human motion capturing and generating method
CN115147559A (en) * 2022-09-05 2022-10-04 杭州像衍科技有限公司 Three-dimensional human body parameterization representation method and device based on neural implicit function
CN117557762A (en) * 2024-01-11 2024-02-13 武汉大学 Monocular video-based dynamic loose clothing human body reconstruction method and system

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6115052A (en) * 1998-02-12 2000-09-05 Mitsubishi Electric Information Technology Center America, Inc. (Ita) System for reconstructing the 3-dimensional motions of a human figure from a monocularly-viewed image sequence
CN112887698A (en) * 2021-02-04 2021-06-01 中国科学技术大学 High-quality face voice driving method based on nerve radiation field
CN113538667A (en) * 2021-09-17 2021-10-22 清华大学 Dynamic scene light field reconstruction method and device

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6115052A (en) * 1998-02-12 2000-09-05 Mitsubishi Electric Information Technology Center America, Inc. (Ita) System for reconstructing the 3-dimensional motions of a human figure from a monocularly-viewed image sequence
CN112887698A (en) * 2021-02-04 2021-06-01 中国科学技术大学 High-quality face voice driving method based on nerve radiation field
CN113538667A (en) * 2021-09-17 2021-10-22 清华大学 Dynamic scene light field reconstruction method and device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
李诗锐;郝优;墨瀚林;李琪;吕永春;王向东;李华;: "快速非刚体人体运动三维重建", 计算机辅助设计与图形学学报, no. 08, 15 August 2018 (2018-08-15) *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114863035A (en) * 2022-07-05 2022-08-05 南京理工大学 Implicit representation-based three-dimensional human motion capturing and generating method
CN114863035B (en) * 2022-07-05 2022-09-20 南京理工大学 Implicit representation-based three-dimensional human motion capturing and generating method
CN114863037A (en) * 2022-07-06 2022-08-05 杭州像衍科技有限公司 Single-mobile-phone-based human body three-dimensional modeling data acquisition and reconstruction method and system
CN114863037B (en) * 2022-07-06 2022-10-11 杭州像衍科技有限公司 Single-mobile-phone-based human body three-dimensional modeling data acquisition and reconstruction method and system
WO2024007478A1 (en) * 2022-07-06 2024-01-11 杭州像衍科技有限公司 Three-dimensional human body modeling data collection and reconstruction method and system based on single mobile phone
US12014463B2 (en) 2022-07-06 2024-06-18 Image Derivative Inc. Data acquisition and reconstruction method and system for human body three-dimensional modeling based on single mobile phone
CN115147559A (en) * 2022-09-05 2022-10-04 杭州像衍科技有限公司 Three-dimensional human body parameterization representation method and device based on neural implicit function
CN115147559B (en) * 2022-09-05 2022-11-29 杭州像衍科技有限公司 Three-dimensional human body parameterization representation method and device based on neural implicit function
CN117557762A (en) * 2024-01-11 2024-02-13 武汉大学 Monocular video-based dynamic loose clothing human body reconstruction method and system

Also Published As

Publication number Publication date
CN114581571B (en) 2024-10-22

Similar Documents

Publication Publication Date Title
CN113706714B (en) New view angle synthesizing method based on depth image and nerve radiation field
Bloesch et al. Codeslam—learning a compact, optimisable representation for dense visual slam
CN109636831B (en) Method for estimating three-dimensional human body posture and hand information
Laskar et al. Camera relocalization by computing pairwise relative poses using convolutional neural network
CN114581571B (en) Monocular human body reconstruction method and device based on IMU and forward deformation field
CN106780543B (en) A kind of double frame estimating depths and movement technique based on convolutional neural networks
Panek et al. Meshloc: Mesh-based visual localization
US12106554B2 (en) Image sequence processing using neural networks
KR20190065287A (en) Prediction of depth from image data using statistical model
CN114450719A (en) Human body model reconstruction method, reconstruction system and storage medium
CN110942512B (en) Indoor scene reconstruction method based on meta-learning
US20230126829A1 (en) Point-based modeling of human clothing
CN111462274A (en) Human body image synthesis method and system based on SMP L model
CN114996814A (en) Furniture design system based on deep learning and three-dimensional reconstruction
Zhang et al. Deep learning-based real-time 3D human pose estimation
CN115018989B (en) Three-dimensional dynamic reconstruction method based on RGB-D sequence, training device and electronic equipment
Yao et al. Neural Radiance Field-based Visual Rendering: A Comprehensive Review
CN115008454A (en) Robot online hand-eye calibration method based on multi-frame pseudo label data enhancement
CN118505878A (en) Three-dimensional reconstruction method and system for single-view repetitive object scene
WO2022139784A1 (en) Learning articulated shape reconstruction from imagery
CN118154770A (en) Single tree image three-dimensional reconstruction method and device based on nerve radiation field
Khan et al. Towards monocular neural facial depth estimation: Past, present, and future
CN117788544A (en) Image depth estimation method based on lightweight attention mechanism
CN114049678B (en) Facial motion capturing method and system based on deep learning
CN113034675B (en) Scene model construction method, intelligent terminal and computer readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant