CN110956691A

CN110956691A - Three-dimensional face reconstruction method, device, equipment and storage medium

Info

Publication number: CN110956691A
Application number: CN201911148553.0A
Authority: CN
Inventors: 王多民
Original assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Current assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date: 2019-11-21
Filing date: 2019-11-21
Publication date: 2020-04-03
Anticipated expiration: 2039-11-21
Also published as: CN110956691B

Abstract

The embodiment of the application discloses a three-dimensional face reconstruction method, a device, equipment and a storage medium, wherein the method comprises the following steps: when a face picture acquisition instruction is detected, acquiring a two-dimensional picture containing a face; identifying a target face in the two-dimensional picture based on a preset face identification strategy, and acquiring position information of the target face; cutting a target face in the two-dimensional picture based on the position information of the target face to obtain a cut target face picture; inputting the cut target face picture into a target neural network model, and outputting three-dimensional model parameters of the target face; and driving a target three-dimensional model to carry out three-dimensional reconstruction based on the three-dimensional model parameters of the target face to obtain the three-dimensional face model of the target face. Therefore, the face reconstruction result with high precision and excellent effect is obtained quickly, and the operation is convenient and simple.

Description

Three-dimensional face reconstruction method, device, equipment and storage medium

Technical Field

The present application relates to image processing technologies, and in particular, to a method, an apparatus, a device, and a storage medium for reconstructing a three-dimensional face.

Background

Three-dimensional face reconstruction technology has been widely used in various fields, and is well known by users and belongs to the application of three-dimensional facial expression packages. In the prior art, most methods for generating a three-dimensional facial expression package directly generate a three-dimensional facial expression package by using a picture or a video, for example, a video clip is directly used to generate a dynamic picture of an interested clip, or keywords related to an expression in a user message text are analyzed, and the keyword and an expression template are used to generate the expression. The methods have no interaction process with a user when generating the expression package, and lack authenticity and interestingness; some of the two-dimensional images analyze the facial expressions to drive the generation of animation expressions, and the two-dimensional images cannot well drive the animation expressions of the three-dimensional models, so that the three-dimensional face reconstruction effect is poor.

Disclosure of Invention

In order to solve the foregoing technical problems, embodiments of the present application are intended to provide a method, an apparatus, a device, and a storage medium for reconstructing a three-dimensional face.

The technical scheme of the application is realized as follows:

in a first aspect, a three-dimensional face reconstruction method is provided, and the method includes:

when a face picture acquisition instruction is detected, acquiring a two-dimensional picture containing a face;

identifying a target face in the two-dimensional picture based on a preset face identification strategy, and acquiring position information of the target face;

cutting a target face in the two-dimensional picture based on the position information of the target face to obtain a cut target face picture;

inputting the cut target face picture into a target neural network model, and outputting three-dimensional model parameters of the target face;

and driving a target three-dimensional model to carry out three-dimensional reconstruction based on the three-dimensional model parameters of the target face to obtain the three-dimensional face model of the target face.

In a second aspect, a three-dimensional face reconstruction apparatus is provided, the apparatus comprising:

the system comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring a two-dimensional picture containing a human face when a human face acquisition instruction is detected;

the detection unit is used for identifying a target face in the two-dimensional picture based on a preset face identification strategy and acquiring the position information of the target face;

the cutting unit is used for cutting the target face in the two-dimensional picture based on the position information of the target face to obtain a cut target face picture;

the reconstruction unit is used for inputting the cut target face picture into a target neural network model and outputting three-dimensional model parameters of the target face; and driving a target three-dimensional model to carry out three-dimensional reconstruction based on the three-dimensional model parameters of the target face to obtain the three-dimensional face model of the target face.

In a third aspect, a three-dimensional face reconstruction device is provided, including: a processor and a memory configured to store a computer program operable on the processor, wherein the processor is configured to perform the steps of the aforementioned method when executing the computer program.

In a fourth aspect, a computer-readable storage medium is provided, on which a computer program is stored, wherein the computer program, when executed by a processor, implements the steps of the aforementioned method.

Drawings

Fig. 1 is a schematic flow chart of a three-dimensional face reconstruction method in an embodiment of the present application;

FIG. 2 is a schematic flow chart of a model training method according to an embodiment of the present application;

fig. 3 is a schematic structural diagram of a three-dimensional face reconstruction device in an embodiment of the present application;

fig. 4 is a schematic structural diagram of a three-dimensional face reconstruction device in an embodiment of the present application.

Detailed Description

So that the manner in which the features and elements of the present embodiments can be understood in detail, a more particular description of the embodiments, briefly summarized above, may be had by reference to embodiments, some of which are illustrated in the appended drawings.

An embodiment of the present application provides a three-dimensional face reconstruction method, fig. 1 is a schematic flow chart of the three-dimensional face reconstruction method in the embodiment of the present application, and as shown in fig. 1, the method may specifically include:

step 101: when a face picture acquisition instruction is detected, acquiring a two-dimensional picture containing a face;

step 102: identifying a target face in the two-dimensional picture based on a preset face identification strategy, and acquiring position information of the target face;

step 103: cutting a target face in the two-dimensional picture based on the position information of the target face to obtain a cut target face picture;

step 104: inputting the cut target face picture into a target neural network model, and outputting three-dimensional model parameters of the target face;

step 105: and driving a target three-dimensional model to carry out three-dimensional reconstruction based on the three-dimensional model parameters of the target face to obtain the three-dimensional face model of the target face.

The execution main body for establishing the three-dimensional face model can be a mobile terminal or a fixed terminal, and after the mobile terminal acquires the two-dimensional picture, the three-dimensional model parameters of the face in the two-dimensional picture are acquired by using the neural network model, so that the standard three-dimensional model is driven to perform three-dimensional reconstruction to obtain the three-dimensional face model.

Here, the face image obtaining instruction may be an instruction to start building a three-dimensional face model, or a photographing instruction. Obtain the two-dimensional picture that contains the people's face through the camera, here, the camera can be any kind of camera that can gather the two-dimensional picture, for example: monocular cameras, color cameras, black and white cameras, and the like. The face picture can be a black and white picture or a color picture. For example, a face picture is acquired through a camera of a mobile phone, a camera and a wearable device.

In some embodiments, the recognizing a target face in the two-dimensional picture based on a preset face recognition policy and acquiring location information of the target face includes: identifying at least one face in the two-dimensional picture based on a preset face identification strategy, and acquiring position information of the at least one face; and screening the target face from the at least one face based on a preset screening strategy to obtain the position information of the target face.

Specifically, the screening strategy includes: determining the number of pixels occupied by the at least one face based on the position information of the at least one face; and screening the human face of which the number of the occupied pixels is greater than the number threshold value as a target human face.

And inputting the cut target face pictures into a target neural network model, setting a face key point detector in the target neural network model, and identifying N face key points in each picture by using the face key point detector to obtain three-dimensional model parameters of the face. The neural network model is a lightweight neural network, and a face reconstruction result with high precision and excellent effect is quickly obtained under the condition of limited computing resources.

That is to say, the face can be accurately identified only when the area of the head portrait is larger than the minimum identification area, otherwise, the face information cannot be accurately identified, and the three-dimensional reconstruction cannot be performed.

Furthermore, the face picture is cut according to the recognized face position information, the background in the face picture is subtracted, and only the face part is reserved. Specifically, a face recognizer is arranged in the three-dimensional face reconstruction device and used for recognizing the face position in the face picture and cutting the face to obtain the cut face picture. The cut-out shape may be square, rectangular, oval, etc.

Specifically, the preset three-dimensional expression template may be a general template stored in an expression library, such as a character template, an animal template, an animation template, and the like. Or a template made by the user himself.

For example, the animation template is driven by using the face pose parameters and the face expression parameters, and the three-dimensional animation template and the three-dimensional data used for training the neural network have the same spatial topology and node semantic information, so that the three-dimensional animation can be driven to the same pose as the current head of the user by using the face pose parameters, and the three-dimensional animation can be driven to the same expression as the current face of the user by using the face expression parameters.

In some embodiments, the method further comprises: acquiring voice information of the target face acquired by a voice acquisition unit; acquiring audio characteristics corresponding to the target three-dimensional model; adjusting the voice information by using the audio features corresponding to the target three-dimensional model to obtain a target audio corresponding to the target face; and storing the three-dimensional face model of the target face and the corresponding target audio frequency.

That is to say, in the process of reconstructing the three-dimensional face, the user can not only generate the three-dimensional model containing the face information of the user according to the face of the user and the preset three-dimensional model, but also combine the sound characteristics of the user with the audio template to generate the audio with the sound characteristics of the user. Therefore, the effect which the user wants can be achieved from both vision and hearing, and the user can also choose to change only one of sound and face by self.

For example, when the user selects the recording button, the mobile phone terminal starts to record the expression display interface in real time, meanwhile, a microphone of the mobile phone is called to store the voice of the user, when the user selects to stop, the expression recording is finished, and the three-dimensional face with the voice is stored in the expression library.

By adopting the technical scheme, the neural network is trained on the basis of the two-dimensional pictures representing different facial expressions and the three-dimensional standard model with expression capability, the three-dimensional facial models with different expressions can be generated in a fitting manner, the authenticity of reconstruction of the three-dimensional facial model is increased, and the light-weight neural network is used, so that the facial reconstruction result with high precision and excellent effect is quickly obtained under the condition of limited computing resources, and the operation is convenient and simple.

On the basis of the foregoing embodiment, a model training method is further provided, fig. 2 is a schematic flow chart of the model training method in the embodiment of the present application, and as shown in fig. 2, the method includes:

step 201: acquiring a training sample set; the training sample set comprises at least one two-dimensional picture of the facial expression;

in practical applications, the method for obtaining the training sample set may include: controlling a camera to acquire at least one two-dimensional picture of the facial expression; and establishing a training sample set by using all the collected two-dimensional pictures. Here, the camera may be any camera capable of acquiring two-dimensional pictures, such as: monocular cameras, color cameras, black and white cameras, and the like. The two-dimensional picture may be a black and white picture or a color picture. The training sample set may be directly downloaded from an avatar library in the network.

In practical application, when a training sample set is established, as many facial expression samples as possible need to be collected, so that the trained neural network model can simulate more expressions.

Specifically, the types of facial expressions in the training sample set include at least one of the following: smile, sipping mouth, frown, eyebrow raise, anger, chin left, chin right, chin forward, mouth left, mouth right, chin up, open mouth, drum cheek, close eyes, and sadness.

In some embodiments, the face types in the training sample set include at least one of: race, age, sex, angle, face shape.

That is, when the training sample set is established, in addition to the facial expression, other factors affecting the three-dimensional reconstruction of the face, such as race, age, sex, weight, height, face shape, shooting angle, etc., should be considered.

Two-dimensional pictures of human faces of different ages, different sexes, different angles and different skin colors can be acquired and stored at different angles in different scenes through electronic equipment such as a mobile phone, a camera, wearable equipment and the like with cameras; establishing a training sample set by using two-dimensional pictures acquired by a plurality of electronic devices; and sending the training sample set to a three-dimensional face reconstruction device, so that the three-dimensional face reconstruction device trains the neural network model by using the training sample set.

Step 202: performing face key point detection on the two-dimensional pictures in the training sample set, and determining N two-dimensional key points of the face in the two-dimensional pictures;

here, in order to generate a three-dimensional face image matched with a real face in a two-dimensional picture, the face in the two-dimensional picture needs to be identified, and N pieces of key point information are detected, where the N pieces of key point information can represent face features, such as face expression, face pose, face identity, and the like.

In practical application, the more the number of the key points, the more the face information is, but the more the number of the key points, the higher the performance requirement of the processor is, and the higher the cost is, so in order to balance the cost and the effect, the number N of the key points in the embodiment of the present application is an integer greater than 68. Such as 90, 106, 240, etc. Compared with the conventional 68 key points and less key points, more face information can be provided, and the accuracy of three-dimensional face reconstruction is improved.

In some embodiments, the performing face keypoint detection on the two-dimensional pictures in the training sample set to determine N two-dimensional keypoints of a face in the two-dimensional pictures includes: carrying out face detection and face clipping on the two-dimensional picture to obtain a clipped two-dimensional picture; and performing face key point detection on the cut two-dimensional picture, and determining N two-dimensional key points of the face in the two-dimensional picture.

Specifically, a face recognizer and a face key point detector can be arranged, the face recognizer is used for recognizing one or more face positions in a two-dimensional picture, and the face is cut to obtain a picture only containing the face; and then, identifying the face key points of each picture by using a face key point detector.

Step 203: performing iterative fitting on the human face three-dimensional standard model through a preset optimization algorithm based on the corresponding relation between the N two-dimensional key points and N three-dimensional key points in the human face three-dimensional standard model to obtain standard three-dimensional model parameters of the two-dimensional picture;

illustratively, the two-dimensional key points only contain x-axis and y-axis information, the three-dimensional key points contain x-axis, y-axis and z-axis information, and points (x1, y1 and z1) with the same x-axis and y-axis information in the three-dimensional standard model are obtained as corresponding three-dimensional key points based on the indexes of the two-dimensional key points (x1 and y1) in the two-dimensional picture.

Illustratively, by utilizing the corresponding relationship between 106 personal face key points of each face picture and 106 key points of corresponding semantics of the personal face key points in the three-dimensional standard model of the face, iteration is continuously carried out through an optimization algorithm, and the three-dimensional standard template of the face is changed into the shape of the face in the picture. The optimization algorithm takes the following formula as an optimization target:

until the algorithm converges. Wherein s is a scaling factor, R is a rotation angle parameter, t is a translation parameter, the three form a human face pose parameter,

is the three-dimensional key point coordinate corresponding to the two-dimensional key point n on the three-dimensional model in the iterative optimization process,

for the two-dimensional keypoint coordinates after parallel projection,

is the coordinate of the two-dimensional keypoint n,

is an average face, α_iIs a face identity parameter, S_iIs a face identity base, β_iIs a facial expression parameter, B_iIs a facial expression base.

In some embodiments, the three-dimensional standard Model of the human face is a three-dimensional deformation Model (3D deformable Model, 3 DMM). The 3DMM has a three-dimensional standard model training neural network with expression capability, can fit and generate three-dimensional face models with different expressions, and increases the authenticity of three-dimensional face model reconstruction.

Through the steps 101 to 103, the training pictures and the corresponding standard three-dimensional model parameters required by the training of the neural network model are generated. Here, the number of pictures in the training sample set is in the millions, and the standard three-dimensional model parameters can be used as the true value of the neural network model training.

Step 204: and taking the two-dimensional pictures in the training sample set as input, taking the standard three-dimensional model parameters of the two-dimensional pictures in the training sample set as target output, and training a neural network model to obtain the target neural network model.

Specifically, the three-dimensional model parameters include: the system comprises a face posture parameter, a face identity parameter and a face expression base. Accordingly, the process of training the neural network model includes: taking the two-dimensional pictures in the training sample set as the input of the neural network model, and outputting and predicting three-dimensional model parameters; and calculating a loss function based on one parameter in the predicted three-dimensional model parameters and three parameters in the target three-dimensional model parameters, and adjusting the neural network model to obtain a trained neural network model. Here, when calculating the model loss parameters, three sets of parameters are calculated separately, and when calculating one set of parameters, the other two sets of parameters use true values (i.e., standard three-dimensional model parameters), in such a way that the model converges better.

In practical application, before the two-dimensional picture is input into the neural network model, the two-dimensional picture can be subjected to face recognition and face cutting, and the cut two-dimensional picture is input into the neural network model.

Here, the trained neural network model may be configured on any terminal, and the terminal acquires a two-dimensional picture of a face of the user and inputs the two-dimensional picture into the trained neural network model, so that the three-dimensional reconstruction model of the user can be directly output. And the expression of the three-dimensional reconstruction model can be changed along with the change of the expression of the user.

Based on the three-dimensional face reconstruction method, a specific implementation scenario is given in the embodiment of the application as follows.

The acquisition process of the human face standard three-dimensional model parameters is as follows:

step 1: collecting a two-dimensional picture of a human face;

specifically, the acquisition requirements are as follows: the ages are widely distributed and average, and are covered from 5 years to 80 years; the sex ratio is balanced, and the male and female ratio is kept about 1; the ethnicity is distributed uniformly, east Asian, middle Asian, Caucasian, black people and other people are distributed uniformly, and other people have partial pictures; people with various facial shapes can cover when the face picture is collected. For each person, 73 face poses and 15 expressions are required to be collected.

The human face pose includes: a face is corrected; a left 30 degree (rotary roll), a left 60 degree (roll), a left 90 degree (roll), a right 30 degree (roll), a right 60 degree (roll), and a right 90 degree (roll); head-up 30 degrees (pitch), head-up 60 degrees (pitch), head-down 30 degrees (pitch), head-down 60 degrees (pitch); 30 degrees left (offset yaw), 60 degrees left (yaw), 30 degrees right (yaw), 60 degrees right (yaw); and 6 combinations of 3 cases on the left side and 2 cases on the head up (roll + pitch), and 6 combinations of 3 cases on the right side and 2 cases on the head up (roll + pitch), and 6 combinations of 3 cases on the left side and 2 cases on the head down (roll + pitch), and 6 combinations of 3 cases on the right side and 2 cases on the head down (roll + pitch); and 6 combinations of 3 cases on the left side and 2 cases on the left side (roll + yaw), and 6 combinations of 3 cases on the right side and 2 cases on the right side (roll + yaw), and 3 combinations of 3 cases on the left side and 30 degrees on the right side (roll + yaw), and 3 combinations of 3 cases on the right side and 30 degrees on the left side (roll + yaw); and 4 combinations of 2 cases left-hand and 2 cases head-up (yaw + pitch), and 4 combinations of 2 cases right-hand and 2 cases head-up (yaw + pitch), and 4 combinations of 2 cases left-hand and 2 cases head-down (yaw + pitch), and 4 combinations of 2 cases right-hand and 2 cases head-down (yaw + pitch); there are 73 postures.

The facial expressions include: smile, sipping mouth, frown, eyebrow raise, anger, chin left, chin right, chin forward, mouth left, mouth right, chin up, mouth wide, drum cheek, closed eyes and sadness 15 expressions in total.

Step 2: face recognition and clipping;

specifically, a face recognizer is arranged, one or more face positions in a two-dimensional picture are recognized by the face recognizer, and the face is cut to obtain the picture only containing the face.

And step 3: detecting two-dimensional key points of the human face;

specifically, a face key point detector is arranged, and 106 personal face key points in each picture are identified by the face key point detector.

And 4, step 4: and (5) performing iterative optimization to obtain standard three-dimensional model parameters.

Specifically, by using the corresponding relationship between 106 personal face key points of each face picture and 106 key points of corresponding semantics of the key points in the three-dimensional standard model of the face, iteration is continuously performed through an optimization algorithm, and the three-dimensional standard template of the face is changed into the shape of the face in the picture. The optimization algorithm takes the following formula as an optimization target:

for the two-dimensional keypoint coordinates after parallel projection,

is the coordinate of the two-dimensional keypoint n,

The acquisition process of the neural network model is as follows:

step 1: and (3) constructing a network model by using tensorflow, generating tfrecrds from training data in a training sample set, and constructing a network training flow so that an input- > neural network model- > output- > loss function forms a complete chain.

Step 2: the model was trained in 50 rounds, one round running the data in the dataset from beginning to end.

And step 3: the output of the model is face pose parameters { s, R, t }, face identity parameters and face expression parameters.

And 4, step 4: when the model loss parameters are calculated, three groups of parameters are respectively calculated, and when one group of parameters is calculated, the other two groups of parameters use true values (namely standard three-dimensional model parameters), so that the model is converged better.

After the deep neural network model training is finished, the cut human face picture can be directly used as input to generate a human face three-dimensional model corresponding to the picture. It should be noted that the three-dimensional face model can be generated by directly using a single face picture, so that the method is convenient to use and does not need a user to perform complex operation; meanwhile, the used deep neural network adopts a lightweight and quick model and can run on a mobile phone terminal in real time.

The process of utilizing the trained neural network model to carry out three-dimensional face reconstruction is as follows:

step 1: the expression making interface is arranged in the system input method, when a user selects the expression interface, a "+" symbol is arranged at the lower right corner of the expression making interface, and the fact that the user can add an expression by clicking the button is shown. When the user clicks the button to add the expression, the pop-up interface has an option for making the 3D animation expression for the user to select.

The trained neural network model can be used for making a 3D animation expression template for the user, and the made template is added into an expression library of a system input method, so that the user can use the self-made 3D expression in the chatting process.

Step 2: and the user selects to make the 3D animation expression, displays a 3D animation expression making interface and simultaneously starts the front monocular camera of the mobile phone.

And step 3: a user selects a three-dimensional animation expression template in a cartoon standard model library, and the three-dimensional animation expression template can be downloaded in an application store; the user can also use the deep neural network to generate self-portrait into a three-dimensional model, select various stickers (such as Bika dune and the like) to be pasted on the generated three-dimensional model, construct a three-dimensional animation model taking the shape of the face of the user and the selected stickers as textures, and store the three-dimensional animation model in the expression library.

And 4, step 4: after the user selects the three-dimensional animation expression template, the interface displays the selected three-dimensional animation expression in real time and is driven by the facial expression and the head movement of the user.

And 5: the process of driving the three-dimensional animation expression by the user is as follows: the user can make actions, expressions, speaking and the like at will, the front-facing camera captures a face image of the user in real time, a face detector detects a face area, face cutting is carried out, the cut face image is sent into a deep neural network for generating parameters, and the neural network outputs face posture parameters, face identity parameters and face expression parameters. The animated expressions are driven here using the face pose parameters and the facial expression parameters. Because the three-dimensional animation template and the three-dimensional data used for training the neural network have the same spatial topological structure and node semantic information, the three-dimensional animation can be driven to the same posture as the current head of the user by using the posture parameters, and the three-dimensional animation can be driven to the same expression as the current face of the user by using the facial expression parameters.

Step 6: when the user selects the recording button, the mobile phone terminal starts to record the expression display interface in real time, and meanwhile, a microphone of the mobile phone is called to store the voice of the user.

And 7: and when the user selects to stop, the recording of the expression is finished, and the expression with the sound is stored in a built-in expression library of the system input method.

And 8: the stored three-dimensional animation expression can be selected by a system input method and sent to a terminal of a contact in the chat software.

An embodiment of the present application further provides a three-dimensional face reconstruction device, and as shown in fig. 3, the device includes:

an acquiring unit 301, configured to acquire a two-dimensional picture including a human face when a human face acquiring instruction is detected;

a detection unit 302, configured to identify a target face in the two-dimensional picture based on a preset face identification policy, and acquire position information of the target face;

a clipping unit 303, configured to clip a target face in the two-dimensional picture based on the position information of the target face, so as to obtain a clipped target face picture;

a reconstruction unit 304, configured to input the clipped target face image into a target neural network model, and output three-dimensional model parameters of the target face; and driving a target three-dimensional model to carry out three-dimensional reconstruction based on the three-dimensional model parameters of the target face to obtain the three-dimensional face model of the target face.

In some embodiments, the detection unit 302 is specifically configured to identify at least one face in the two-dimensional picture based on a preset face recognition policy, and obtain position information of the at least one face; and screening the target face from the at least one face based on a preset screening strategy to obtain the position information of the target face.

In some embodiments, the screening strategy comprises: determining the number of pixels occupied by the at least one face based on the position information of the at least one face; and screening the human face of which the number of the occupied pixels is greater than the number threshold value as a target human face.

In some embodiments, the apparatus further comprises: the voice acquisition unit is used for acquiring the voice information of the target face acquired by the voice acquisition unit;

the voice processing unit is also used for acquiring audio characteristics corresponding to the target three-dimensional model; adjusting the voice information by using the audio features corresponding to the target three-dimensional model to obtain a target audio corresponding to the target face;

and the storage unit is used for storing the three-dimensional face model of the target face and the corresponding target audio.

In some embodiments, the obtaining unit is further configured to obtain a training sample set; the training sample set comprises at least one two-dimensional picture of the facial expression;

the device also includes: the training unit is also used for carrying out face key point detection on the two-dimensional pictures in the training sample set and determining N two-dimensional key points of the face in the two-dimensional pictures; performing iterative fitting on the human face three-dimensional standard model through a preset optimization algorithm based on the corresponding relation between the N two-dimensional key points and N three-dimensional key points in the human face three-dimensional standard model to obtain standard three-dimensional model parameters of the two-dimensional picture; and taking the two-dimensional pictures in the training sample set as input, taking the standard three-dimensional model parameters of the two-dimensional pictures in the training sample set as target output, and training a neural network model to obtain the target neural network model.

In some embodiments, the category of facial expressions in the training sample set comprises at least one of: smile, sipping mouth, frown, eyebrow raise, anger, chin left, chin right, chin forward, mouth left, mouth right, chin up, open mouth, drum cheek, close eyes, and sadness.

An embodiment of the present application further provides a three-dimensional face reconstruction device, as shown in fig. 4, the device includes: a processor 401 and a memory 402 configured to store a computer program operable on the processor; the processor 401, when running the computer program in the memory 402, realizes the following steps:

In some embodiments, the processor 401, when executing the computer program in the memory 402, implements the following steps: identifying at least one face in the two-dimensional picture based on a preset face identification strategy, and acquiring position information of the at least one face; and screening the target face from the at least one face based on a preset screening strategy to obtain the position information of the target face.

In some embodiments, the processor 401, when executing the computer program in the memory 402, further realizes the following steps: acquiring voice information of the target face acquired by a voice acquisition unit; acquiring audio characteristics corresponding to the target three-dimensional model; adjusting the voice information by using the audio features corresponding to the target three-dimensional model to obtain a target audio corresponding to the target face; and storing the three-dimensional face model of the target face and the corresponding target audio frequency.

In some embodiments, the processor 401, when executing the computer program in the memory 402, further realizes the following steps: acquiring a training sample set; the training sample set comprises at least one two-dimensional picture of the facial expression; performing face key point detection on the two-dimensional pictures in the training sample set, and determining N two-dimensional key points of the face in the two-dimensional pictures; performing iterative fitting on the human face three-dimensional standard model through a preset optimization algorithm based on the corresponding relation between the N two-dimensional key points and N three-dimensional key points in the human face three-dimensional standard model to obtain standard three-dimensional model parameters of the two-dimensional picture; and taking the two-dimensional pictures in the training sample set as input, taking the standard three-dimensional model parameters of the two-dimensional pictures in the training sample set as target output, and training a neural network model to obtain the target neural network model.

Of course, in actual practice, the various components of the device are coupled together by a bus system 403, as shown in FIG. 4. It will be appreciated that the bus system 403 is used to enable communications among the components connected. The bus system 403 includes a power bus, a control bus, and a status signal bus in addition to a data bus. But for clarity of illustration the various buses are labeled as bus system 403 in figure 4.

In practical applications, the processor may be at least one of an Application Specific Integrated Circuit (ASIC), a Digital Signal Processing Device (DSPD), a Programmable Logic Device (PLD), a Field Programmable Gate Array (FPGA), a controller, a microcontroller, and a microprocessor. It is understood that the electronic devices for implementing the above processor functions may be other devices, and the embodiments of the present application are not limited in particular.

The Memory may be a volatile Memory (volatile Memory), such as a Random-Access Memory (RAM); or a non-volatile Memory (non-volatile Memory), such as a Read-Only Memory (ROM), a flash Memory (flash Memory), a Hard Disk (HDD), or a Solid-State Drive (SSD); or a combination of the above types of memories and provides instructions and data to the processor.

The embodiment of the application also provides a computer readable storage medium for storing the computer program.

Optionally, the computer-readable storage medium may be applied to any three-dimensional face reconstruction device in the embodiments of the present application, and the computer program enables a computer to execute corresponding processes implemented by a processor in the methods in the embodiments of the present application, which are not described herein again for brevity.

In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above-described device embodiments are merely illustrative, for example, the division of the unit is only a logical functional division, and there may be other division ways in actual implementation, such as: multiple units or components may be combined, or may be integrated into another system, or some features may be omitted, or not implemented. In addition, the coupling, direct coupling or communication connection between the components shown or discussed may be through some interfaces, and the indirect coupling or communication connection between the devices or units may be electrical, mechanical or other forms.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed on a plurality of network units; some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, all the functional units in the embodiments of the present invention may be integrated into one processing module, or each unit may be separately used as one unit, or two or more units may be integrated into one unit; the integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional unit. Those of ordinary skill in the art will understand that: all or part of the steps for implementing the method embodiments may be implemented by hardware related to program instructions, and the program may be stored in a computer readable storage medium, and when executed, the program performs the steps including the method embodiments; and the aforementioned storage medium includes: a mobile storage device, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

The methods disclosed in the several method embodiments provided in the present application may be combined arbitrarily without conflict to obtain new method embodiments.

Features disclosed in several of the product embodiments provided in the present application may be combined in any combination to yield new product embodiments without conflict.

The features disclosed in the several method or apparatus embodiments provided in the present application may be combined arbitrarily, without conflict, to arrive at new method embodiments or apparatus embodiments.

The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.

Claims

1. A method for reconstructing a three-dimensional face, the method comprising:

2. The method according to claim 1, wherein the recognizing a target face in the two-dimensional picture based on a preset face recognition strategy to obtain the position information of the target face comprises:

identifying at least one face in the two-dimensional picture based on a preset face identification strategy, and acquiring position information of the at least one face;

and screening the target face from the at least one face based on a preset screening strategy to obtain the position information of the target face.

3. The method of claim 2, wherein the screening strategy comprises:

determining the number of pixels occupied by the at least one face based on the position information of the at least one face;

and screening the human face of which the number of the occupied pixels is greater than the number threshold value as a target human face.

4. The method of claim 1, further comprising:

acquiring voice information of the target face acquired by a voice acquisition unit;

acquiring audio characteristics corresponding to the target three-dimensional model;

adjusting the voice information by using the audio features corresponding to the target three-dimensional model to obtain a target audio corresponding to the target face;

and storing the three-dimensional face model of the target face and the corresponding target audio frequency.

5. The method of claim 1, further comprising:

acquiring a training sample set; the training sample set comprises at least one two-dimensional picture of the facial expression;

performing face key point detection on the two-dimensional pictures in the training sample set, and determining N two-dimensional key points of the face in the two-dimensional pictures;

performing iterative fitting on the human face three-dimensional standard model through a preset optimization algorithm based on the corresponding relation between the N two-dimensional key points and N three-dimensional key points in the human face three-dimensional standard model to obtain standard three-dimensional model parameters of the two-dimensional picture;

and taking the two-dimensional pictures in the training sample set as input, taking the standard three-dimensional model parameters of the two-dimensional pictures in the training sample set as target output, and training a neural network model to obtain the target neural network model.

6. The method of claim 5, wherein the category of facial expressions in the training sample set comprises at least one of: smile, sipping mouth, frown, eyebrow raise, anger, chin left, chin right, chin forward, mouth left, mouth right, chin up, open mouth, drum cheek, close eyes, and sadness.

7. The method of claim 5, wherein the set of training samples includes at least one of the following face types: race, age, sex, angle, face shape.

8. A three-dimensional face reconstruction apparatus, the apparatus comprising:

9. A three-dimensional face reconstruction device, the device comprising: a processor and a memory configured to store a computer program capable of running on the processor,

wherein the processor is configured to perform the steps of the method of any one of claims 1 to 7 when running the computer program.

10. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 7.