CN116630147B

CN116630147B - Face image editing method based on reinforcement learning

Info

Publication number: CN116630147B
Application number: CN202310908009.1A
Authority: CN
Inventors: 金鑫; 赵姝; 章乐; 赵鑫; 邓强; 肖超恩
Original assignee: Beijing Hidden Computing Technology Co ltd
Current assignee: Beijing Hidden Computing Technology Co ltd
Priority date: 2023-07-24
Filing date: 2023-07-24
Publication date: 2024-02-06
Anticipated expiration: 2043-07-24
Also published as: CN116630147A

Abstract

A face image editing method based on reinforcement learning includes the steps: acquiring a face image to be edited and extracting a first face attribute; mapping the face image to be edited through an encoding module to obtain an image hidden variable; acquiring a pre-trained generator; inputting the image hidden variable into the generator to generate a first face image; inputting the first face image into a trained image evaluation model to obtain an evaluation result; inputting the first facial attribute and the evaluation result to a trained reinforcement learning module to generate a second facial attribute; inputting the image hidden variable and the second facial attribute into a continuous normalization stream module to generate a hidden variable of a target face image; inputting hidden variables of the target face image into the image generation module to generate a second face image; the invention realizes the automatic adjustment of the facial attribute of the face through reinforcement learning and improves the aesthetic quality of the face image.

Description

Face image editing method based on reinforcement learning

Technical Field

The invention relates to the technical field of image editing, in particular to a face image editing method based on reinforcement learning.

Background

The pursuit of beauty is the nature and objective requirement of human beings, and can meet the emotion requirement of the human beings and make people feel pleasant. Images are important carriers for conveying information and expressing emotion, and the aesthetic attractiveness between different images is quite different, and the quality of the images influences the feeling of audiences. Artificial intelligence has been developed rapidly in cognition and beauty evaluation, but still has a great progress in creating beauty.

With the popularization of social applications, people hope to upload more beautiful personal images on software, thereby increasing the charm of making friends of themselves, and more of the graphic applications are put into practical production and research. From the original image sent to the current more people to select to decorate and beautify the image and upload, the aesthetic requirements of people can be seen to be continuously improved. The existing image beautifying software intelligently edits the real face image according to the templatization standard, provides personalized beautifying guidance, has wide application requirements in daily life, has huge development potential in the professional fields of medical cosmetology, planar advertisement design, image post-processing and the like, and has bright prospect.

Therefore, how to select facial attributes and edit facial images with high aesthetic quality according to different aesthetic requirements is a problem that needs to be solved by those skilled in the art.

Disclosure of Invention

In view of this, the invention adjusts the facial semantic attribute of the bottom layer of the StyleGAN generator by reinforcement learning method, and then edits and obtains the beautiful face image more conforming to human aesthetic.

In order to achieve the above purpose, the present invention adopts the following technical scheme:

the face image editing method based on reinforcement learning is characterized by comprising the following steps:

acquiring a face image to be edited and extracting a first face attribute;

mapping the face image to be edited through an encoding module to obtain an image hidden variable;

acquiring a pre-trained generator;

inputting the image hidden variable into the generator to generate a first face image;

selecting an attribute to be edited from the first facial attribute, and inputting a first face image into a trained image evaluation model to obtain an evaluation result;

inputting the attribute to be edited and the evaluation result to a trained reinforcement learning module to generate a second facial attribute;

inputting the image hidden variable and the second facial attribute into a continuous normalization stream module to generate a hidden variable of a target face image;

and inputting the hidden variable of the target face image into the image generation module to generate a second face image.

Further, the first face image is input to a trained image evaluation model to obtain an evaluation result, and the steps include:

preprocessing the first face image, inputting the preprocessed first face image into a backbone network for feature extraction to obtain a feature vector;

inputting the feature vector to a channel attention module to obtain a three-dimensional vector, and expanding the three-dimensional vector into a one-dimensional vector after activation and self-adaptive average pooling;

and inputting the one-dimensional vector into a regression network, and outputting an evaluation result.

Further, the training step of the image evaluation model includes:

training a classification network, inputting training data into a backbone network to extract characteristics, and classifying through the classification network; when the classification network is trained, parameter feedback is carried out on the loss value through a cross entropy function, and the parameter of the regression network is kept from being returned;

regression training is performed on the data based on the classification network to extract more aesthetic features, at which time only the regression network is released, freezing the parameters of the backbone network and the classification network.

Further, the training step of the reinforcement learning module specifically includes:

initializing the facial attribute, and generating a plurality of groups of corresponding training images according to the selected attribute;

respectively calculating each group of training images according to a preset reinforcement learning strategy, generating new face attributes, and generating new face images corresponding to the new face attributes;

and evaluating the new face image through the image evaluation model, and updating the gradient by adopting a soft gradient strategy according to the evaluation result to iterate until convergence.

Further, the reinforcement learning strategyThe method comprises the following steps:

wherein the method comprises the steps ofFor the state vector of the t-th iteration, +.>R is a fitting factor satisfying r is more than or equal to 0 and less than or equal to 1 for the action vector of the t-th iteration; />Entropy (entropy)>Is super-parameter, control->The relative importance in the target; />Is the probability of a policy.

Further, the soft ladder policy is calculated as:

wherein,is a temperature super parameter for controlling the detection range, +.>Is policy +>Value of->Is a state-dependent baseline; strategy->Is->Is a very small parameter.

Further, the reinforcement learning module comprises a feature extraction unit and a gating circulation unit;

the characteristic extraction unit extracts image characteristics in the initial image and inputs the image characteristics into the gating circulation unit;

and the tail end of the hidden layer in the gate control circulation unit is connected with a full connection layer, and the value of the selected attribute is output through the full connection layer.

Further, the feature extraction unit is a ResNet18 network.

A neural network model, comprising: an encoder, a generator, an image evaluation network, and a reinforcement learning network;

the encoder converts the image to be edited into a hidden space vector and then carries out image reconstruction through a generator; during reconstruction, the generator generates facial attributes according to the hidden space vector and obtains a reconstructed image;

the image evaluation network evaluates the reconstructed image;

the reinforcement learning network selects according to the generated facial attribute in the reconstruction process, optimizes according to the evaluation structure, obtains the optimized facial attribute, and inputs the optimized facial attribute to the generator again to generate a final image.

A face image editing system based on reinforcement learning comprises an image acquisition module, an image editing module and an image generation module;

the image acquisition module is used for acquiring a face image to be edited;

the image editing module is used for selecting editing attributes according to the face image to be edited;

the image generation module is used for carrying out preliminary evaluation according to the face image to be edited, optimizing the selected editing attribute according to the evaluation result and obtaining the optimized face attribute; and the method is used for generating an editing result according to the optimized facial attribute.

The invention has the beneficial effects that:

compared with the prior art, the invention discloses a face image editing method based on reinforcement learning, which realizes automatic adjustment of the image face attribute through reinforcement learning and generates an aesthetic high-quality face image conforming to image evaluation; the invention enhances the face image through reinforcement learning, and provides a novel model optimization method of the soft gradient strategy with a self-criticizing training mode in reinforcement learning.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only embodiments of the present invention, and that other drawings can be obtained according to the provided drawings without inventive effort for a person skilled in the art.

Fig. 1 is a schematic diagram of a face image editing method based on reinforcement learning according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of an image evaluation process according to an embodiment of the present invention;

fig. 3 is a schematic diagram of a reinforcement learning module according to an embodiment of the invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Referring to fig. 1, an embodiment of the present invention provides a face image editing method based on reinforcement learning, including the steps of:

s1: acquiring a face image to be edited and extracting a first face attribute; mapping the face image to be edited by an encoding module to obtain an image hidden variable; the encoding module adopts a pixel2style2pixel framework, which is hereinafter referred to as pSp. pSp framework is based on a new type of encoder network that generates a series of style vectors. The first facial attribute includes posture, hair volume, beard, age, expression, and the like.

S2: acquiring a pre-trained generator; inputting the image hidden variable into a generator to generate a first face image; specifically, a StyleGAN network model structure is used as a generator to generate an image with a resolution of 1024×1024. After the pSp encoder converts the face image to be edited into the style vector in the hidden space, the generator is used for reconstruction.

S3: and selecting the attribute to be edited from the first facial attribute, and inputting the first face image into the trained image evaluation model to obtain an evaluation result.

In one embodiment, the image evaluation model is composed of a backbone network and a regression network, the backbone network is used for extracting image features, the regression network is used for carrying out regression calculation according to the image features, and a final value is output as an evaluation result; specifically, a channel attention module is further arranged between the backbone network and the regression network, and the evaluation process is as follows: preprocessing a face image, rotationally cutting and scaling the face image to 800 multiplied by 800 resolution with the face as the center, extracting image features by adopting a pre-built neural network, and convolving the extracted image features; normalizing the convolved result and activating the convolved result through an activation function; inputting the activated features to an ECA attention module to obtain a three-dimensional vector, and expanding the three-dimensional vector into a one-dimensional vector after activation and self-adaptive average pooling; the one-dimensional vector is input into the regression network and the image quality parameter, i.e. the aesthetic score, is output, ranging in number from 0 to 1. The Efficient Net-B4 can be used as a pre-training model to extract the characteristics of the image to be evaluated, the image characteristics are extracted, and the convolution operation with the kernel size of 3 is carried out after the characteristics are extracted.

In this embodiment, the training steps of the image evaluation model are:

training a classification network, inputting training data into a backbone network to extract characteristics, and classifying through the classification network; when the classification network is trained, parameter feedback is carried out on the loss value through a cross entropy function, and the parameter of the regression network is kept from being returned; regression training is performed on the data based on the classification network to extract more aesthetic features, at which time only the regression network is released, freezing the parameters of the backbone network and the classification network.

Specifically, a quite class network is trained by taking 0.1 as a step length, the loss value is returned through a cross entropy function during classified network training, and the return of the parameters of a regression network is kept. Regression training is performed on the data based on the classification network to extract more aesthetic features, at which time only the regression network is released, freezing the parameters of the backbone network and the classification network.

In the classification training, the number of single treatments was set to 32, the initial learning rate was set to 0.001, the learning rate was automatically reduced to half when the accuracy was not increased for a plurality of consecutive rounds, adam optimization algorithm was selected, the coefficient of running average for calculating the gradient and the square of the gradient was set to (0.98, 0.999), and the weight attenuation coefficient was set to 0.0001. In the regression training, the batch size is set to 64, and if the mean square error is not reduced after multiple training rounds, the learning rate is automatically halved as well.

S4: and inputting the attribute to be edited and the evaluation result to a trained reinforcement learning module to generate a second facial attribute. As shown in fig. 2, the reinforcement learning module includes a feature extraction unit and a gating circulation unit; the feature extraction unit extracts image features in the initial image and inputs the image features into the gating circulation unit; the tail end of the hidden layer in the gate control circulation unit is connected with a full connection layer, and the value of the selected attribute is output through the full connection layer; the feature extraction unit is a ResNet18 network.

The training step of the reinforcement learning module comprises the following steps:

s41: initializing the facial attributes and generating a plurality of groups of training images according to the selected attributes.

S42: each group of training images calculates the value of the corresponding attribute, namely the new facial attribute, through the reinforcement learning module. Respectively calculating each group of training images by adopting a preset reinforcement learning strategy, generating new face attributes, and generating new face images corresponding to the new face attributes; wherein, in reinforcement learning, agent continuously interacts with environment, defined asWherein->And->Is a state and action space, +.>Is the state transition probability>Is rewarded with->Is the initial state s ₀ The gamma prime factor defines how much the smart agent will be affected by the far-end state. The goal is to learn a random strategy +.>Thus when action is taken->When the desired reward for the trajectory is maximized.

The method comprises the following specific steps:

firstly, setting an attribute value corresponding to a selected feature dimension to be 0; then respectively exploring the selected feature dimensions, wherein the exploring process needs to establish a plurality of exploring tracks according to preset values; therefore, 5 initial images are obtained by cloning initial attributes, wherein each image corresponds to the input of one exploration track, each exploration track explores a certain attribute, new attribute values are calculated through exploration, and images are generated; finally, image quality evaluation is carried out to obtain 5 scores, and the baseline is dependent on the stateThe average value of the five scores is set, so that the reinforcement learning module is updated to improve the probability of obtaining a track with a higher aesthetic score, and the base line can effectively reduce the variance in the learning process, thereby improving the stability of the training process. In this step, the aliasing factor γ is set to 1. In addition, the Agent rewards only the terminal state, and does not rewards the middle state of the track. The prize value refers to the score of the image after a series of adjustments. The descent gradient was optimized by Adam optimizer, using a learning rate of 1e-5, without L2 regularization of the parameters. The ResNet18 module is a pre-trained version on ImageNet. For a gated loop cell, the size of the hidden state is 512, which is the same as the output of the ResNet18 module.

S43: and evaluating the new face image through the image evaluation model, and updating the gradient by adopting a soft gradient strategy according to the evaluation result for iteration until convergence.

In this embodiment, when reinforcement learning is performed, the maximum entropy reinforcement learning framework is used to ensure randomness of exploration, and prevent premature convergence to a suboptimal strategy. Strong chemistryThe learned target is set as a strategy：

Wherein S and A are respectively all possible state space sets encountered by the Agent and all possible action space sets generated, and r is a shading factor meeting 0.ltoreq.r.ltoreq.1.Entropy (entropy)>Is a super parameter, set to 0.01, control +.>Relative importance in the target.

The soft strategy gradient formula is as follows:

wherein,is a temperature super parameter for controlling the detection range, +.>Is policy +>Value of->Is a state-dependent baseline and can be any function as long as it does not change with motion.

In one embodiment, the self-adjudicating training pattern is incorporated into the soft policy gradient update. The Monte Carlo method is used to calculate the Q value of reinforcement learning, wherein the Q value is the aesthetic score given by the image quality assessment model. In this embodiment, the reinforcement learning module uses a batch size of 16 and each gradient update step has a batch size of 80.

S5: inputting the image hidden variables and new facial attributes into a continuous normalization flow module (Continuous Normalizing Flow, CNF) to generate hidden variables of a target face image;

s6: inputting the hidden variable of the target face image into the image generation module to generate a final face image.

Example 2

Based on the same inventive concept, an embodiment of the present invention discloses a neural network model for implementing the image editing method in embodiment 1, including: an encoder, a generator, an image evaluation network, and a reinforcement learning network; the encoder converts the image to be edited into hidden space vectors and then carries out image reconstruction through a generator; during reconstruction, the generator generates facial attributes according to the hidden space vector and obtains a reconstructed image; the image evaluation network evaluates the reconstructed image; the reinforcement learning network selects according to the generated facial attribute in the reconstruction process, optimizes according to the evaluation structure, obtains the optimized facial attribute, and inputs the optimized facial attribute to the generator again to generate a final image.

Example 3

Based on the same inventive concept, the embodiment of the invention discloses a face image editing system based on reinforcement learning, which is characterized by comprising an image acquisition module, an image editing module and an image generation module; the image acquisition module is used for acquiring a face image to be edited; the image editing module is used for selecting editing attributes according to the face image to be edited; the image generation module is used for carrying out preliminary evaluation according to the face image to be edited, optimizing the selected editing attribute according to the evaluation result and obtaining the optimized face attribute; and the method is used for generating an editing result according to the optimized facial attribute.

According to the invention, the control of the facial attribute of the face image is realized through reinforcement learning, and the facial attribute can be selected and edited independently according to different aesthetic requirements to obtain a high-quality face image; in the reasoning process, the actions with the highest probability are selected instead of sampling according to the softmax layer of the reinforcement learning module, and the intelligent agent automatically sets values for the selected feature dimensions in sequence to obtain new facial attributes. And inputting the hidden variables of the image and the new facial attributes into a continuous normalization stream module to generate hidden variables of the target face image, and then inputting the hidden variables into an image generation module to obtain the edited face image.

In the present specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, and identical and similar parts between the embodiments are all enough to refer to each other. For the device disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant points refer to the description of the method section.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. The face image editing method based on reinforcement learning is characterized by comprising the following steps:

acquiring a face image to be edited and extracting a first face attribute;

acquiring a pre-trained generator;

inputting the hidden variable of the target face image into the generator to generate a second face image;

the training step of the reinforcement learning module comprises the following specific steps:

evaluating the new face image through the image evaluation model, and updating the gradient by adopting a soft gradient strategy according to the evaluation result for iteration until convergence;

the reinforcement learning strategy pi ^* The method comprises the following steps:

wherein s is _t A is the state vector of the t-th iteration _t R is a fitting factor satisfying r is more than or equal to 0 and less than or equal to 1 for the action vector of the t-th iteration;entropy, alpha is super parameter, control +.>The relative importance in the target; ρ _π Probability of being a policy;

the rope ladder strategy calculation formula is as follows:

wherein alpha is a temperature super-parameter controlling the detection range,is the Q value of the strategy, b (s _t ) Is a state-dependent baseline; policy pi _θ θ in (2) is a minuscule parameter.

2. The method for editing a face image based on reinforcement learning according to claim 1, wherein the step of inputting the first face image into a trained image evaluation model to obtain an evaluation result comprises:

3. The method for editing a face image based on reinforcement learning according to claim 1, wherein the training step of the image evaluation model comprises:

4. The face image editing method based on reinforcement learning according to claim 1, wherein the reinforcement learning module comprises a feature extraction unit and a gating circulation unit;

5. The face image editing method based on reinforcement learning according to claim 4, wherein the feature extraction unit is a res net18 network.

6. A neural network model for implementing the image editing method of any of claims 1-5, comprising: an encoder, a generator, an image evaluation network, and a reinforcement learning network;

the image evaluation network evaluates the reconstructed image;

7. A face image editing system based on reinforcement learning, which is characterized by adopting the neural network model in claim 6 and comprising an image acquisition module, an image editing module and an image generation module;

the image acquisition module is used for acquiring a face image to be edited;