CN110689625B

CN110689625B - Method and device for automatic generation of customized mixed facial expression model

Info

Publication number: CN110689625B
Application number: CN201910840594.XA
Authority: CN
Inventors: 徐枫; 王至博; 冯铖锃; 凌精望; 杨东
Original assignee: Tsinghua University
Current assignee: Tsinghua University
Priority date: 2019-09-06
Filing date: 2019-09-06
Publication date: 2021-07-16
Anticipated expiration: 2039-09-06
Also published as: US20210390792A1; WO2021042961A1; CN110689625A

Abstract

The invention discloses a method and device for automatically generating a customized mixed facial expression model. The method comprises: using the depth map and the facial feature points corresponding to each frame of an RGB-D image sequence to perform a non-decoding process on a facial three-dimensional template model. Rigid registration, deforms the face 3D template model according to the non-rigid registration result and Shape from Shading to generate a neutral face 3D model; processes the neutral face 3D model and the face hybrid model template through Deformation Transfer to generate a customized human face Face hybrid model; the neutral face 3D model is deformed in turn by custom face hybrid model, Warping Field and Shape from Shading, and face tracking results are generated to update the custom face hybrid model. This method can generate realistic facial expression models in real time.

Description

Automatic generation method and device for customized face mixed expression model

Technical Field

The invention relates to the technical field of three-dimensional reconstruction of facial animation, in particular to an automatic generation method and device of a customized facial mixed expression model.

Background

The high-precision customized mixed facial expression model comprises the shapes of human faces when people make certain expressions, and different shapes form different expression bases in the mixed model. In the fields of movies, animation, games and the like, the three-dimensional animation of the human face can be quickly generated through a group of expression coefficient groups.

The customized face mixed expression model is a face three-dimensional expression model which is frequently required to be used in movies and animations and used for making face animations, and can also be used for face tracking tasks. The commonly used methods for making high-precision face mixture models often require expensive equipment. The simple automatic method is difficult to meet the precision requirement, and the details of the face such as moles, wrinkles and the like cannot be restored.

Disclosure of Invention

The present invention is directed to solving, at least to some extent, one of the technical problems in the related art.

To this end, an object of the present invention is to provide an automatic generation method of a customized face mixture model, which performs high-precision tracking from faces in face color and depth sequences, and directly uses the high-precision tracking result to generate the customized face mixture model.

The invention also aims to provide an automatic generating device for the customized human face mixed expression model.

In order to achieve the above object, an embodiment of the present invention provides an automatic generation method for a customized face mixed expression model, including:

s1, acquiring an RGB-D image sequence containing user neutral expression, carrying out non-rigid registration on a human face three-dimensional template model by using a depth map and a human face feature point corresponding to each frame of image of the RGB-D image sequence, inputting each vertex in a non-rigid registration result into the depth map corresponding to each frame of image to generate a deformation data set, and deforming the human face three-dimensional template model according to the deformation data set;

s2, reconstructing details of a human face in a non-rigid registered human face three-dimensional model through a Shape from shaping technology in the last frame of the RGB-D image sequence, and generating a neutral human face three-dimensional model according to the deformed human face three-dimensional template model and the reconstructed human face three-dimensional template model;

s3, processing the neutral human face three-dimensional model and the human face mixed model template through a Deformation Transfer technology to generate a customized human face mixed model;

s4, sequentially deforming the neutral human face three-dimensional model through the customized human face mixed model, the Warping Field technology and the Shape from shaping technology so as to track the human face in the RGB-D image sequence and generate a human face tracking result;

and S5, updating the customized face mixing model according to the face tracking result.

The automatic generation method of the customized face mixed expression model comprises the steps of carrying out non-rigid registration on a face three-dimensional template model by utilizing a depth map and a face characteristic point corresponding to each frame of image of an RGB-D image sequence, and deforming the face three-dimensional template model according to a non-rigid registration result and Shape from shaping to generate a neutral face three-dimensional model; processing the neutral human face three-dimensional model and the human face mixed model template through the Deformation Transfer to generate a customized human face mixed model; and sequentially deforming the neutral human face three-dimensional model through the customized human face mixed model, the Warping Field and the Shape from shaping to generate a human face tracking result so as to update the customized human face mixed model. High-precision tracking is carried out on the face in the face color and depth sequence, and the high-precision tracking result is directly used for generating the customized face mixed model, so that the automatic generation of the high-precision customized face mixed expression model is realized, and the vivid face expression model can be generated in real time.

In addition, the automatic generation method for the customized face mixed expression model according to the above embodiment of the present invention may further have the following additional technical features:

further, in an embodiment of the present invention, the acquiring the RGB-D image sequence containing the user neutral expression includes:

and (3) keeping the neutral expression by the user, sequentially rotating the head in the upward, downward, left and right directions, and collecting each frame of user expression image to form the RGB-D image sequence.

Further, in an embodiment of the present invention, the inputting each vertex in the non-rigid registration result into the depth map corresponding to each frame image to generate a deformation data set includes:

inputting each vertex in the non-rigid registration result into a depth map corresponding to each frame of image to generate depth data, screening the depth data to generate effective depth data, and fusing the effective depth data into an array with the same size as the human face three-dimensional template model to generate the deformation data set.

Further, in an embodiment of the present invention, the S4 specifically includes:

s41, deforming the neutral human face three-dimensional model through the customized human face mixed model to generate an expression coefficient of the customized human face mixed model;

s42, deforming the deformed neutral human face three-dimensional model in the S41 by using a Warping Field technology;

and S43, deforming the deformed neutral human face three-dimensional model in the S42 by a Shape from shaping technology to generate a reconstruction result of the current neutral human face three-dimensional model.

Further, in an embodiment of the present invention, the face tracking result includes:

and the reconstruction result of the current neutral human face three-dimensional model and the expression coefficient of the human face mixed model.

In order to achieve the above object, an embodiment of another aspect of the present invention provides an automatic generating apparatus for a customized mixed facial expression model, including:

the processing module is used for acquiring an RGB-D image sequence containing user neutral expression, performing non-rigid registration on a human face three-dimensional template model by using a depth map and a human face feature point corresponding to each frame of image of the RGB-D image sequence, inputting each vertex in a non-rigid registration result into the depth map corresponding to each frame of image to generate a deformation data set, and deforming the human face three-dimensional template model according to the deformation data set;

the first generation module is used for reconstructing details of a human face in a non-rigid registered human face three-dimensional model through a Shape from shaping technology in the last frame of the RGB-D image sequence and generating a neutral human face three-dimensional model according to the deformed human face three-dimensional template model and the reconstructed human face three-dimensional template model;

the second generation module is used for processing the neutral human face three-dimensional model and the human face mixed model template through a Deformation Transfer technology to generate a customized human face mixed model;

the tracking module is used for sequentially deforming the neutral human face three-dimensional model through the customized human face mixed model, the Warping Field technology and the Shape from shaping technology so as to track the human face in the RGB-D image sequence and generate a human face tracking result;

and the updating module is used for updating the customized face mixing model according to the face tracking result.

The automatic generation device of the customized human face mixed expression model of the embodiment of the invention carries out non-rigid registration on a human face three-dimensional template model by utilizing a depth map and a human face characteristic point corresponding to each frame image of an RGB-D image sequence, and carries out deformation on the human face three-dimensional template model according to a non-rigid registration result and Shape from shaping to generate a neutral human face three-dimensional model; processing the neutral human face three-dimensional model and the human face mixed model template through the Deformation Transfer to generate a customized human face mixed model; and sequentially deforming the neutral human face three-dimensional model through the customized human face mixed model, the Warping Field and the Shape from shaping to generate a human face tracking result so as to update the customized human face mixed model. High-precision tracking is carried out on the face in the face color and depth sequence, and the high-precision tracking result is directly used for generating the customized face mixed model, so that the automatic generation of the high-precision customized face mixed expression model is realized, and the vivid face expression model can be generated in real time.

In addition, the automatic customized face mixed expression model generation device according to the above embodiment of the present invention may further have the following additional technical features:

Further, in one embodiment of the present invention, the tracking module comprises: a first deforming unit, a second deforming unit and a third deforming unit;

the first deformation unit is used for deforming the neutral human face three-dimensional model through the customized human face mixed model to generate an expression coefficient of the customized human face mixed model;

the second deformation unit is used for deforming the deformed neutral human face three-dimensional model in the first deformation unit by using a Warping Field technology;

and the third deformation unit is used for deforming the deformed neutral human face three-dimensional model in the second deformation unit through Shape from shaping technology to generate a reconstruction result of the current neutral human face three-dimensional model.

Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.

Drawings

The foregoing and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:

FIG. 1 is a flow chart of a method for automatically generating a customized mixed facial expression model according to an embodiment of the present invention;

fig. 2 is a schematic structural diagram of an automatic customized human face mixed expression model generation device according to an embodiment of the invention.

Detailed Description

Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are illustrative and intended to be illustrative of the invention and are not to be construed as limiting the invention.

The following describes a method and an apparatus for automatically generating a customized face mixed expression model according to an embodiment of the present invention with reference to the accompanying drawings.

First, an automatic generation method of a customized face mixed expression model proposed according to an embodiment of the present invention will be described with reference to the accompanying drawings.

Fig. 1 is a flowchart of an automatic generation method of a customized face mixed expression model according to an embodiment of the invention.

As shown in fig. 1, the method for automatically generating the customized human face mixed expression model includes the following steps:

and step S1, acquiring an RGB-D image sequence containing user neutral expression, performing non-rigid registration on the human face three-dimensional template model by using a depth map and human face feature points corresponding to each frame of image of the RGB-D image sequence, inputting each vertex in a non-rigid registration result into the depth map corresponding to each frame of image to generate a deformation data set, and deforming the human face three-dimensional template model according to the deformation data set.

Further, the user expression images of each frame are collected to form an RGB-D image sequence by keeping the user in a neutral expression and sequentially rotating the head in the upward, downward, leftward and rightward directions.

The resolution of the RGB-D image sequence used in the embodiments of the present invention is 640 x 480.

Further, in an embodiment of the present invention, inputting each vertex in the non-rigid registration result into the depth map corresponding to each frame image to generate a deformation data set, includes:

inputting each vertex in the non-rigid registration result into a depth map corresponding to each frame of image to generate depth data, screening the depth data to generate effective depth data, and fusing the effective depth data into an array with the same size as the human face three-dimensional template model to generate a deformation data set.

Specifically, each frame of the RGB-D image sequence is processed to obtain a depth map corresponding to each frame and a face feature point in each frame of image, and in each frame, the depth map and the detected face feature point are used to perform non-rigid registration on a face three-dimensional template model, where the face three-dimensional template model is an existing template model, each vertex in the non-rigid registration result is input into the depth map corresponding to each frame, depth data close to the distance is searched as valid data, and then fused into an array having the same size as the face three-dimensional template model, and then the fused result is used as a data item of the deformed face three-dimensional template model, i.e., a deformed data group, and the deformed data group is used to deform the face three-dimensional template model.

It is understood that the depth map includes three-dimensional coordinate points, the three-dimensional coordinates of each vertex in the non-rigid result are compared with the three-dimensional coordinates in the depth map, and the depth data at a closer distance is taken as effective data.

And step S2, reconstructing the details of the human face in the non-rigid registered human face three-dimensional model through Shape from shaping technology in the last frame of the RGB-D image sequence, and generating a neutral human face three-dimensional model according to the deformed human face three-dimensional template model and the reconstructed human face three-dimensional template model.

Specifically, in the last frame of the RGB-D image sequence, the details of the face in the non-rigid registered three-dimensional face model are reconstructed by the Shape from shaping technique, and the deformed three-dimensional face target model in step S1 and the reconstructed three-dimensional face template model in step S2 are integrated to generate a neutral three-dimensional face model.

It can be understood that the face in the input color and depth sequence keeps the neutral expression to do rigid movement only, and the three-dimensional reconstruction of the neutral face is completed by deforming the three-dimensional template model of the face. In the reconstruction process, a non-rigid registration result of the human face three-dimensional template model is used for fusing a more accurate human face three-dimensional model; and obtaining a better non-rigid registration result by using the fused human face three-dimensional model, and performing iteration and alternation on the two.

In the traditional combined reconstruction method, the reconstructed human face three-dimensional network does not have a fixed topological structure. In the embodiment of the invention, the face fused by the fusion method has the same topological structure with the face template model.

And step S3, processing the neutral human face three-dimensional model and the human face mixed model template through the Deformation Transfer technology to generate a customized human face mixed model.

And after the three-dimensional reconstruction of the neutral face is completed, the preliminary initialization of the customized face model is completed by using a Deformation Transfer technology.

After the Deformation Transfer technology is used, a preliminary result of the customized human face model can be obtained.

Specifically, a reconstruction high-precision neutral face model and a face mixing model in a template are used as input by using a Deformation Transfer technology, and an initialization result of a customized face mixing model is obtained.

And step S4, sequentially deforming the neutral human face three-dimensional model through customizing a human face mixed model, a Warping Field technology and a Shape from shaping technology so as to track the human face in the RGB-D image sequence and generate a human face tracking result.

Further, in an embodiment of the present invention, the method further includes:

s41, deforming the neutral face three-dimensional model through the customized face mixed model to generate an expression coefficient of the customized face mixed model;

and S43, deforming the deformed neutral human face three-dimensional model in the S42 by Shape from shaping technology to generate a reconstruction result of the current neutral human face three-dimensional model.

The face tracking result comprises a reconstruction result of the current neutral face three-dimensional model and an expression coefficient of the face mixed model.

Specifically, the face in the input color and depth sequence is tracked, the face in the input sequence is tracked with high precision by using the existing customized face mixing model, the Warping Field and the Shape from shaping, and finally the high-precision reconstruction result of the current frame face model and the expression coefficient of the face mixing model at the moment are obtained.

The tracking method of the face hybrid model used in the embodiment does not limit the changing space of the face hybrid model, so that the change of the face hybrid model has higher degree of freedom, and the high-precision face hybrid model can be updated.

And step S5, updating the customized face mixed model according to the face tracking result.

Specifically, the high-precision reconstruction result of the face model and the corresponding expression coefficient are used for updating the customized face mixing model.

And respectively solving each vertex motion in the updated customized face mixed model, and keeping the semanteme of each expression base in the mixed model unchanged by using a mask.

According to the automatic generation method of the customized human face mixed expression model provided by the embodiment of the invention, the non-rigid registration is carried out on the human face three-dimensional template model by utilizing the depth map and the human face characteristic point corresponding to each frame image of the RGB-D image sequence, and the human face three-dimensional template model is deformed according to the non-rigid registration result and Shape from modeling to generate a neutral human face three-dimensional model; processing the neutral human face three-dimensional model and the human face mixed model template through the Deformation Transfer to generate a customized human face mixed model; and sequentially deforming the neutral human face three-dimensional model through the customized human face mixed model, the Warping Field and the Shape from shaping to generate a human face tracking result so as to update the customized human face mixed model. High-precision tracking is carried out on the face in the face color and depth sequence, and the high-precision tracking result is directly used for generating the customized face mixed model, so that the automatic generation of the high-precision customized face mixed expression model is realized, and the vivid face expression model can be generated in real time.

Next, an automatic customized face mixed expression model generation apparatus proposed according to an embodiment of the present invention is described with reference to the drawings.

As shown in fig. 2, the apparatus for automatically generating customized mixed facial expression model includes: a processing module 100, a first generating module 200, a second generating module 300, a tracking module 400, and an updating module 500.

The processing module 100 is configured to obtain an RGB-D image sequence including a user neutral expression, perform non-rigid registration on a three-dimensional face template model by using a depth map and a face feature point corresponding to each frame of image of the RGB-D image sequence, input each vertex in a non-rigid registration result into the depth map corresponding to each frame of image to generate a deformation data set, and deform the three-dimensional face template model according to the deformation data set.

The first generation module 200 is used for reconstructing details of a human face in a non-rigid registered human face three-dimensional model through a Shape from shaping technology in the last frame of an RGB-D image sequence, and generating a neutral human face three-dimensional model according to the deformed human face three-dimensional template model and the reconstructed human face three-dimensional template model;

the second generating module 300 is configured to process the neutral human face three-dimensional model and the human face mixed model template through a transformation Transfer technology, and generate a customized human face mixed model.

And the tracking module 400 is configured to sequentially deform the neutral face three-dimensional model through customizing a face mixture model, a Warping Field technology and a Shape from shaping technology, so as to track a face in the RGB-D image sequence and generate a face tracking result.

And the updating module 500 is used for updating the customized face mixing model according to the face tracking result.

The device can generate a better neutral face reconstruction result; the high-precision tracking of the human face can be realized; a high-precision face hybrid model can be generated.

Further, in an embodiment of the present invention, acquiring an RGB-D image sequence containing a neutral expression of a user includes:

and (3) keeping the neutral expression by the user, sequentially rotating the head in the upward, downward, left and right directions, and collecting each frame of user expression image to form an RGB-D image sequence.

the second deformation unit is used for deforming the deformed neutral human face three-dimensional model in the first deformation unit through a Warping Field technology;

Further, in one embodiment of the present invention, the face tracking result includes:

and reconstructing a result of the current neutral human face three-dimensional model and the expression coefficient of the human face mixed model.

It should be noted that the explanation of the foregoing embodiment of the method for automatically generating a customized mixed facial expression model is also applicable to the apparatus of this embodiment, and details are not described here.

According to the automatic generation device for the customized human face mixed expression model, which is provided by the embodiment of the invention, the non-rigid registration is carried out on the human face three-dimensional template model by utilizing the depth map and the human face characteristic point corresponding to each frame image of the RGB-D image sequence, and the human face three-dimensional template model is deformed according to the non-rigid registration result and Shape from modeling to generate a neutral human face three-dimensional model; processing the neutral human face three-dimensional model and the human face mixed model template through the Deformation Transfer to generate a customized human face mixed model; and sequentially deforming the neutral human face three-dimensional model through the customized human face mixed model, the Warping Field and the Shape from shaping to generate a human face tracking result so as to update the customized human face mixed model. High-precision tracking is carried out on the face in the face color and depth sequence, and the high-precision tracking result is directly used for generating the customized face mixed model, so that the automatic generation of the high-precision customized face mixed expression model is realized, and the vivid face expression model can be generated in real time.

Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present invention, "a plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise.

In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.

Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.

Claims

1. A method for automatically generating a customized face mixed expression model is characterized by comprising the following steps:

s1, acquiring an RGB-D image sequence containing user neutral expression, using a depth map and a face feature point corresponding to each frame of image of the RGB-D image sequence to perform non-rigid registration on a face three-dimensional template model, inputting each vertex in a non-rigid registration result into the depth map corresponding to each frame of image to generate a deformation data set, and inputting each vertex in the non-rigid registration result into the depth map corresponding to each frame of image to generate a deformation data set, including: inputting each vertex in the non-rigid registration result into a depth map corresponding to each frame of image to generate depth data, screening the depth data to generate effective depth data, and fusing the effective depth data into an array with the same size as the human face three-dimensional template model to generate the deformation data set; deforming the human face three-dimensional template model according to the deformation data set;

s5, updating the customized face mixing model according to the face tracking result;

wherein, the S4 specifically includes:

2. The method for automatically generating a customized human face mixed expression model according to claim 1, wherein the obtaining of the RGB-D image sequence containing the user neutral expression comprises:

3. The method of claim 1, wherein the face tracking result comprises:

4. An automatic generation device for a customized face mixed expression model is characterized by comprising:

the processing module is used for acquiring an RGB-D image sequence containing user neutral expression, performing non-rigid registration on a human face three-dimensional template model by using a depth map and a human face feature point corresponding to each frame of image of the RGB-D image sequence, inputting each vertex in a non-rigid registration result into the depth map corresponding to each frame of image to generate a deformation data set, and inputting each vertex in the non-rigid registration result into the depth map corresponding to each frame of image to generate the deformation data set, and the processing module comprises: inputting each vertex in the non-rigid registration result into a depth map corresponding to each frame of image to generate depth data, screening the depth data to generate effective depth data, and fusing the effective depth data into an array with the same size as the human face three-dimensional template model to generate the deformation data set; deforming the human face three-dimensional template model according to the deformation data set;

the updating module is used for updating the customized face mixing model according to the face tracking result;

wherein the tracking module comprises: a first deforming unit, a second deforming unit and a third deforming unit;

5. The apparatus for automatically generating customized human face mixed expression model according to claim 4, wherein the acquiring of the RGB-D image sequence containing the user neutral expression comprises:

6. The apparatus of claim 4, wherein the face tracking result comprises: