CN113936318A - Human face image restoration method based on GAN human face prior information prediction and fusion - Google Patents
Human face image restoration method based on GAN human face prior information prediction and fusion Download PDFInfo
- Publication number
- CN113936318A CN113936318A CN202111218941.9A CN202111218941A CN113936318A CN 113936318 A CN113936318 A CN 113936318A CN 202111218941 A CN202111218941 A CN 202111218941A CN 113936318 A CN113936318 A CN 113936318A
- Authority
- CN
- China
- Prior art keywords
- face
- image
- information
- stage
- generator
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 61
- 230000004927 fusion Effects 0.000 title claims abstract description 15
- 238000013528 artificial neural network Methods 0.000 claims abstract description 12
- 238000012549 training Methods 0.000 claims description 43
- 230000006870 function Effects 0.000 claims description 28
- 230000001815 facial effect Effects 0.000 claims description 15
- 238000012360 testing method Methods 0.000 claims description 9
- 238000012795 verification Methods 0.000 claims description 9
- 238000010586 diagram Methods 0.000 claims description 6
- 238000003062 neural network model Methods 0.000 claims description 6
- 210000004709 eyebrow Anatomy 0.000 claims description 5
- 230000009286 beneficial effect Effects 0.000 claims description 4
- 238000001514 detection method Methods 0.000 claims description 4
- 230000008014 freezing Effects 0.000 claims description 4
- 238000007710 freezing Methods 0.000 claims description 4
- 238000007781 pre-processing Methods 0.000 claims description 4
- 230000009471 action Effects 0.000 claims description 3
- 210000000887 face Anatomy 0.000 claims description 3
- 238000012986 modification Methods 0.000 claims description 3
- 230000004048 modification Effects 0.000 claims description 3
- 230000008569 process Effects 0.000 claims description 3
- 230000011218 segmentation Effects 0.000 claims description 3
- 230000000903 blocking effect Effects 0.000 claims description 2
- 210000000988 bone and bone Anatomy 0.000 claims description 2
- 238000010200 validation analysis Methods 0.000 claims description 2
- 230000008439 repair process Effects 0.000 description 11
- 230000001788 irregular Effects 0.000 description 4
- 230000008901 benefit Effects 0.000 description 3
- 230000007547 defect Effects 0.000 description 3
- 230000000295 complement effect Effects 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 238000002203 pretreatment Methods 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 208000025721 COVID-19 Diseases 0.000 description 1
- 230000003042 antagnostic effect Effects 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000001427 coherent effect Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 239000004744 fabric Substances 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000012423 maintenance Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Computational Linguistics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Evolutionary Biology (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Image Processing (AREA)
- Image Analysis (AREA)
Abstract
The invention relates to a human face image restoration method based on GAN human face prior information prediction and fusion, wherein a neural network of the method takes a VAE structure as a main network and comprises two stages, firstly a rough image with human face structure content information is generated through a coarse neural network at a stage-I stage, meanwhile human face generation guide information is obtained through fusing human face contour, region and key point intermediate characteristics, then in order to better consider the human face structure information, the result of the stage-I is refined through a fine neural network at a stage-II stage, the guide information is introduced into a second generator to realize human face detail and structure refinement, and finally a natural harmonious and structurally symmetrical human face image is generated.
Description
Technical Field
The invention relates to the field of image processing, in particular to a human face image restoration method based on GAN human face prior information prediction and fusion.
Background
Image completion is a popular field in computer vision tasks and aims to fill in missing parts of a completed image with visually reasonable content. The face completion is a special case of image completion, and aims to complete the repair of a shielded face area without being constrained by posture and direction. However, the existing face completion method only includes a simple facial feature to complete face completion, and as a result, the method is still not satisfactory and has the defect of easy detection. Furthermore, there are often ambiguous boundaries and details near the missing part. In particular, for face repair, face region information (structure information, contour information, and content information) has not been fully utilized, which would result in unnatural face images being generated, such as: asymmetric eyebrows and different eye sizes. Unlike conventional image restoration methods, face restoration requires content, contour, and structural information about a target object to achieve natural and realistic output. However, these general image restoration methods only focus on the sharpness of the whole image, do not consider the particularity of the human face, and do not fully explore and utilize the semantic information of the human face, so that the generated human face image is unnatural, fuzzy and distortion, and lacks of human face texture details. Especially under special conditions such as COVID-19 epidemic situation, the face completion can effectively remove the mask and restore the full face of the face. Therefore, face image completion based on deep learning is still a challenging main subject for face repair.
The prior art has the following defects:
1. it is difficult to complement an image having a large missing region
In the traditional image completion method, the background continuity is generally selected to complete the foreground missing region completion, and the copied similar region is filled into the missing region to complete the image completion. The method can not solve the face image completion problem of large missing area. It is not recommended to complement the missing large face area with other face areas. In fact, large missing regions with square masks are more difficult to accomplish than irregular masks or smaller square missing masks because the acceptance range of the convolution kernel is square and once the convolution kernel reaches the missing region, the convolution kernel cannot capture any useful information. While for irregular or small missing parts, the convolution kernel can capture useful information in the received field from the background or missing regions. Therefore, some image repair methods typically repair irregular, or smaller square masks to verify the effectiveness of the method, not in line with the actual requirements.
2. It is difficult to generate a natural harmonious face from a background image
For a missing image of a human face, the content of the missing region is very different from that of the background region, so that it is difficult to generate a natural and harmonious face from the background image. For example, some image inpainting methods use an attention mechanism to search background regions to find similar blocks of missing regions, but each missing lock of each image takes a long time to train for similarity matching with surrounding background blocks and is prone to facial feature deformation.
3. The adaptability of the repair network and the correctness of the repair result need to be improved
The human face missing image completion mainly focuses on reconstructing human face parts with natural and harmonious characteristics, and the natural image completion method or the partial human face completion method only focuses on the definition of the whole image or simply considers the facial characteristics to complete human face completion, does not consider the particularity of the human face, and does not fully search and utilize facial semantic information, so that the generated human face image is not natural, fuzzy and distorted, and lacks human face texture details. Face completion remains a challenging issue because it requires the generation of semantically new pixels for the missing key components and the maintenance of structural and appearance consistency. Further research is needed to improve the adaptability of the repair network and the correctness of the repair result.
To solve these problems, we propose a new generation countermeasure network, which can perform face restoration with large area missing with the assistance of obtaining the face prior face fusion information network.
Disclosure of Invention
Aiming at the defects of the prior art, the method for repairing the human face image based on the human face prediction and fusion of the GAN comprises the following steps:
step 1: downloading public human face data set, preprocessing the data set and constructing image x of missing human faceθMeanwhile, the training set, the verification set and the test set are proportionally divided;
the face image complementing method mainly comprises two stages: the method comprises a stage-I stage and a stage II stage, and specifically comprises the following steps:
step 2: the rough modification neural network model at the stage-I stage comprises a first generator, two encoders and three decoders, and firstly, the image x which is constructed in the step 1 and lacks the face informationθSending into a network with a variational automatic encoder VAE structure as a backbone, and obtaining face contour information through nonlinear reconstruction by two encoders and three decodersM′θ-fAnd face region information x'θ-fAnd face key point information x'θ-l(ii) a Performing information fusion on face contour information, structure information and content information obtained by VAE network reconstruction to obtain face prior guidance information which is beneficial to generating contours, structures and contents of clear faces; image x with missing face informationθSending the face image to a first generator, and fusing the face prior guidance information in the intermediate layer of the first generator to fully explore face region information to generate a low-resolution face image;
the method comprises the following specific steps:
step 21: the missing face image xθSequentially inputting the two encoders and the three decoders, and obtaining face contour information M 'through nonlinear reconstruction'θ-fAnd face region information x'θ-fAnd face key point information x'θ-lRespectively extracting the outline, the area and the key point information of the face;
step 22: constructing and obtaining a face coding feature vector z by combining face contour information, face region information and face key point informationθ-fAnd zθ-lFinally, z is fused by feature fusionθ-fAnd zθ-lFusing to construct a face feature expression space and obtain face prior guidance information z with higher qualityθ-M;
Step 23: missing face image x in stage-I training stageθThrough a first generator, the face prior guidance information is fused in the middle layer of the first generator, and after the intermediate characteristic diagram of the first generator is spliced, a low-resolution natural symmetrical face image is generated under the action of the prior guidance information;
step 24: performing iterative training on the neural network model at the stage-I stage according to the set batch size of each training set, and iteratively updating network parameters of a generator, an encoder and a decoder of the stage-I according to the face contour information, the reconstruction loss function of the face content information and the face structure information and the face information prediction loss function to complete the face image completion network training of the stage-I;
step 25: judging whether the set verification iteration times are reached, and if so, verifying the primary model and storing the primary model; if not, go to step 26;
step 26: judging whether the set total iteration times are reached, and if so, ending the training; otherwise, repeating the steps 21-25;
and step 3: after the rough neural network training of the stage-I stage is finished, freezing the trained stage-I neural network and related parameters, and starting the learning and training process of the fine neural network of the stage-II stage, wherein the network structure of the stage-II stage mainly comprises: the second generator, the global arbiter and the block arbiter specifically comprise the following steps:
step 31: in the stage-II stage, the low-resolution natural symmetric face image generated in the stage-I stage is used as input and is input into a second generator, and meanwhile, in order to better consider face structure information, the face prior guidance information is further introduced into an intermediate layer of the second generator so as to refine details and structure of the face and generate a first face repairing image with higher resolution;
step 32: sending the first face repairing image into two discriminators, and enabling a second generator to generate a high-resolution face repairing image with a symmetrical face structure by means of the countermeasure thought generated by a GAN network, wherein the global discriminator judges the distribution consistency of the images on the whole, and the blocking discriminator is responsible for monitoring the generation details of the images in each patch;
step 32: performing iterative training on the facial image refinement network in the stage-II stage according to the set batch size of each training set;
step 33: judging whether the set verification iteration times are reached, and if so, verifying the primary model and storing the primary model; if not, go to step 34;
step 34: judging whether the set total iteration times are reached, and if so, ending the training; otherwise, repeating the steps 31 to 33;
according to a preferred embodiment, in the stage-I training, by adding constraints, three potential feature discriminators are used for replacing the original KL divergence as constraints, so that the interference caused by the reconstruction loss of the human face part by the large base number of the KL divergence is reduced, and simultaneously, the method can compete with an encoder, enhance the learning capability of the encoder and obtain more accurate human face feature information.
According to a preferred embodiment, the pretreatment method comprises:
four reference standard images are constructed to assist face completion, including a standard face contour image Mθ-fStandard face content image Xθ-fStandard face structure image Zθ-lAnd standard portrait foreground map XFGThe acquisition method comprises the following steps:
standard human face contour image Mθ-fExtracting 68 key points of the face from the original face by using a face detection alignment method, and expanding the obtained 41 key points one by one according to 3% of the size of the key points to ensure that the eyebrow and boundary information of the face is fused into the key points, thereby obtaining a face contour image Mθ-f;
Standard face content image Xθ-fOriginal image XrealAnd a standard face contour image Mθ-fMultiplying to obtain standard face content image Xθ-f;
Standard face structure image Zθ-lThe face structure image Z is obtained by performing expansion fusion on 41 key points including eyes, nose, mouth and the like in the faceθ-l;
Standard portrait foreground picture XFGObtained by segmentation of the acquired portrait using a hundredth interface.
The invention has the beneficial effects that:
1. the confrontation network is generated in two stages, and the face image can be repaired in stages from coarse to fine under the assistance of the obtained face fusion information network; meanwhile, the structural information and the enhanced texture of the human face are fused, and finally a high-resolution human face image with realistic details is generated.
2. Aiming at the problems of face structure distortion, face information asymmetry and face blurring generated by the existing face complementing method, a generation countermeasure network based on face structure, outline and content information coding is provided. The method can improve the generation quality of the large-area missing face completion, and can obtain a satisfactory completion result when the missing area is very different from the background content.
3. Aiming at the problem that the learning burden of a generator is increased by using one generator in a GAN structure, and the generated face is sometimes speckled, therefore, a VAE-based multi-generator face completion generation countermeasure network is proposed: two encoders and three decoders are adopted at the stage-I stage, so that the learning burden of a generator is reduced, and structural information, content information and contour information with dependency relationship in a face image are acquired simultaneously to generate a high-quality face image.
4. And the network is constrained by using reconstruction loss and countermeasure loss, so that the network performance is improved, the characteristics of the generated image in the network are close to the characteristics of the corresponding original face image, and the final face completion result is further improved.
5. The semantic information of the face is fused in the stage-I generator and the stage-II generator, so that the semantic information of the face can be fully explored and utilized to guide face repairing; the robustness of the generator can be enhanced, and the face completion result is more stable.
Drawings
FIG. 1 is a flow chart of a method of the facial image restoration network of the present invention;
FIG. 2 is a diagram of a face image restoration network according to the present invention;
FIG. 3 is a diagram of six face information used in the present invention; and
FIG. 4 is a graph comparing the experimental results of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail with reference to the accompanying drawings in conjunction with the detailed description. It should be understood that the description is intended to be exemplary only, and is not intended to limit the scope of the present invention. Moreover, in the following description, descriptions of well-known structures and techniques are omitted so as to not unnecessarily obscure the concepts of the present invention.
The invention mainly solves the problems of unclear, unnatural and asymmetric image quality in face image restoration. The human face image repairing task mainly fills up images lacking human face content, is a special case of image repairing and aims to complete repairing of a shielded face area without being constrained by posture and direction. For face repair, face region information (structure information, contour information, and content information) has not been fully utilized, which would result in unnatural face images being generated, such as: asymmetric eyebrows and different eye sizes. Therefore, the improvement of the existing face image restoration algorithm is urgently needed, so that the face image restoration algorithm can generate a high-quality result.
The following detailed description is made with reference to the accompanying drawings.
Fig. 1 is a flow chart of a method of the facial image restoration network of the present invention, fig. 2 is a structure diagram of the facial image restoration network of the present invention, and the method of the present invention will be described in detail with reference to fig. 1 and fig. 2. The invention provides a human face image restoration method based on GAN (generic image extension) face prediction and fusion, which comprises the following steps:
step 1: downloading public face data sets, such as: CelebA-HQ, preprocessing the data set, and constructing an image x without human faceθ(ii) a According to the conventional proportion of 28: 1: 1 constructing a training set, a verification set and a test set. In one specific embodiment, the CelebA-HQ dataset contains 30000 human face images, and specifically, the dataset is divided into 3 subsets: 28000 training sets, 1000 validation sets and 1000 test sets.
The face image complementing method mainly comprises two stages: the method comprises a stage-I stage and a stage II stage, and specifically comprises the following steps:
step 2: the rough modification neural network model at the stage-I stage comprises a first generator, two encoders and three decoders, and firstly, the image x which is constructed in the step 1 and lacks the face informationθSending the data into a network with a back bone structure of a variable automatic encoder VAE, and obtaining face contour information M 'through nonlinear reconstruction by two encoders and three decoders'θ-fAnd face region information x'θ-fAnd face key point information x'θ-l(ii) a Performing information fusion on face contour information, structure information and content information obtained by VAE network reconstruction to obtain face prior guidance information which is beneficial to generating contours, structures and contents of clear faces; image x with missing face informationθAnd sending the face image to a first generator, and fusing the face prior guidance information in the intermediate layer of the first generator to fully explore face region information to generate a low-resolution face image. The image generated in the first stage is a low-resolution face image with clearer contour, more symmetrical structure and more complete content.
The method comprises the following specific steps:
step 21: the missing face image xθSequentially inputting the two encoders and the three decoders, and obtaining face contour information M 'through nonlinear reconstruction'θ-fAnd face region information x'θ-fAnd face key point information x'θ-lTo extract the contour, region and key point information of the face respectively.
Step 22: constructing and obtaining a face coding feature vector z by combining face contour information, face region information and face key point informationθ-fAnd zθ-lFinally, z is fused by feature fusionθ-fAnd zθ-lFusing to construct a face feature expression space and obtain face prior guidance information z with higher qualityθ-M,zθ-MThe mathematical expression of (a) is as follows:
wherein,splicing the channels; θ has no practical meaning, and together with M, f, l, it has some meaning. Wherein, theta-M represents the face contour information; theta-f represents the face region characteristics; θ -l represents face keypoint information. z is a radical ofθ-fIs an intermediate quantity for learning the score of the face featureAnd (4) measuring the cloth amount.
Unlike conventional VAE structures, we use two encoders and three decoders to correlate or rely on the structural, content, and contour information in the acquired face image. Therefore, the constructed face coding feature vector zθ-fAnd zθ-lAlso include these three kinds of information simultaneously, finally make zθ-MAnd providing higher-quality human face prior guiding information.
In the stage-I stage, a coding and decoding network taking VAE as a backbone is constructed, and by adding constraints, three potential feature discriminators are used for replacing original KL divergence to serve as constraints, so that the interference caused by the large base number of the KL divergence to the reconstruction loss of the face part is reduced, meanwhile, the large base number of the KL divergence can compete with an encoder, the learning capability of the large base number of the KL divergence is enhanced, and more accurate face feature information is obtained.
Step 23: in order to generate natural harmonious and symmetrical face images, missing face images x are processed in the stage-I training stageθAnd through a first generator, the face prior guidance information is fused in the intermediate layer of the first generator, and after the intermediate characteristic diagram of the first generator is spliced, a low-resolution natural symmetrical face image is generated under the action of the prior guidance information.
Step 24: and performing iterative training on the neural network model at the stage-I stage according to the set batch size of each training set, and iteratively updating network parameters of a generator, an encoder and a decoder of the stage-I according to the face contour information, the reconstruction loss function of the face content information and the face structure information and the face information prediction loss function to complete the face image completion network training of the stage-I.
Step 25: judging whether the set verification iteration times are reached, and if so, verifying the primary model and storing the primary model; if not, go to step 26.
Step 26: judging whether the set total iteration times are reached, and if so, ending the training; otherwise, repeating the steps 21-25.
And step 3: after the rough neural network training of the stage-I stage is finished, freezing the trained stage-I neural network and related parameters, and starting the learning and training process of the fine neural network of the stage-II stage, wherein the network structure of the stage-II stage mainly comprises: the second generator, the global arbiter and the block arbiter specifically comprise the following steps:
step 31: in the stage-II stage, the low-resolution natural symmetric face image generated in the stage-I stage is used as input and is input into a second generator, and meanwhile, in order to better consider face structure information, the face prior guidance information is further introduced into an intermediate layer of the second generator so as to refine details and structure of the face and generate a first face repairing image with higher resolution;
step 32: and sending the first face repairing image into two discriminators, generating a confrontation thought by means of a GAN network, enabling a second generator to generate a high-resolution face repairing image with a symmetrical face structure, judging the distribution consistency of the images by the global discriminator on the whole, and supervising the generation details of the images in each patch by the block discriminator.
Unlike the conventional region discriminator which focuses only on the generated region, the patch discriminator cuts the entire image into a plurality of patches of small size, and then judges whether or not each patch is true. Therefore, the global discriminator supervises the consistency of the generated area and the background in the whole image, and the block discriminator achieves the specific purpose of restoring the texture details. When the two discriminators are unable to distinguish between the final restored face image and the original face image, indicating that the second generator and the network of discriminators are balanced, the second generator can capture the true distribution of the face image data.
Step 32: and performing iterative training on the facial image finishing network in the stage-II stage according to the set batch size of each training set.
In each training, firstly training and updating the parameters of the discriminator according to the resistance loss function, and freezing the parameters of the discriminator after the updating is finished; secondly, updating generator parameters according to the countermeasure loss function and the reconstruction loss function, and finishing training of the facial image refinement network in the stage-II stage in an alternate training mode;
step 33: judging whether the set verification iteration times are reached, and if so, verifying the primary model and storing the primary model; if not, go to step 34.
Step 34: judging whether the set total iteration times are reached, and if so, ending the training; otherwise, step 31 to step 33 are repeated.
The loss function of the facial image restoration method provided by the invention comprises a reconstruction loss function for reconstructing facial structure information, a facial information prediction loss function and an antagonistic loss function for reconstructing facial distribution information, wherein,
the reconstruction loss function is used for constraining the global structure of the generated face image and guiding the extraction of the contour information, the content information and the structure information of the face image;
the face information prediction loss function is used for guiding the extraction of the contour information, the content information and the structure information of the face image;
the resistance loss function is used for restoring detail information of the face image, so that the face looks clearer.
The loss function of the stage-I stage comprises a reconstruction loss function and a potential classification loss function, and is specifically as follows:
wherein,represents the reconstruction loss function of the encoder-decoder,representing the reconstruction loss function of the first generator,representing a potential classification loss function.
Wherein M isθ-fRepresenting a real face contour image, xθ-fRepresenting real face content images, xθ-lRepresenting a true face structure image, M'θ-fRepresenting the generated face contour image, x'θ-fDenotes a generated face content image, x'θ-lRepresenting the generated image of the structure of the human face,representing cross entropy loss, | · | | non-woven vision1A range constraint is represented that is,λgen1a hyperparameter representing the weight of the control loss,denotes element-by-element multiplication, xrealRepresenting the original real image dataset, xtRepresenting the result of the first generator in stage-I, xFGAnd (5) representing a portrait foreground image, and setting a hyper parameter eta to be 0.5.
The three classifiers are used as potential feature discriminators and used for extracting potential feature constraints to replace KL divergence constraints so as to reduce interference caused by the loss of the large base number of the KL divergence on the reconstruction of the face part, simultaneously compete with an encoder, enhance the learning capability of the encoder and obtain more accurate face feature information. The specific discriminator loss is as follows:
wherein D isi(i ∈ {0,1,2}) represents a potential discriminator.
In addition, the potential discriminator DiThe resistance loss of (a) is defined as follows:
wherein z represents a standard normal distribution feature vector, zθ-fAnd zθ-lRepresenting face coding feature vector, zθ-MAnd representing a human face feature expression space, namely human face prior guiding information.
The loss function for stage-II includes the penalty loss and reconstruction loss as follows:
wherein,representing the local and global penalty function of the generator,representing the reconstruction loss function of the second generator. The second generator is used to repair the detail information of the face image, so that the face looks clearer, specifically as follows:
the reconstruction loss function of the second generator is used for constraining the global structure of the generated face image, and the mathematical expression is as follows:
wherein x isrecRepresenting the high resolution image generated by the generator in stage-II.
In addition, the countermeasure loss function of the discriminator is used for repairing the detail information of the face image, so that the face looks clearer;
According to a preferred embodiment, the pretreatment method comprises:
obtaining a missing image x through mask occlusionθ。
Four reference standard images are constructed to assist face completion, including a standard face contour image Mθ-fStandard face content image xθ-fStandard face structure image Zθ-lAnd standard portrait foreground map XFGThe acquisition method comprises the following steps:
standard human face contour image Mθ-fExtracting 68 key points of the face from the original face by using a face detection alignment method, and expanding the obtained 41 key points one by one according to 3% of the size of the key points to ensure that the eyebrow and boundary information of the face is fused into the key points, thereby obtaining a face contour image Mθ-f。
Standard face content image xθ-fOriginal image XrealAnd a standard face contour image Mθ-fMultiplying to obtain standard face content image xθ-f。
Standard face structure image Zθ-lObtaining a face structure image Z by performing expansion fusion on 41 key points including eyes, nose, mouth and the like in the faceθ-l。
Standard portrait foreground picture XFGObtained by segmentation of the acquired portrait using a hundredth interface.
Wherein, the standard human face contour image Mθ-fStandard face content image xθ-fStandard face structure image Xθ-lAs a reference standard for reconstructing relevant content by the network at loss.
The human face image repairing method also comprises the steps of testing the trained repairing network, processing the input image of the network according to the method in the step 1, respectively operating the stage-I to stage-II training networks according to the step 2, and outputting a test result by a stage-II generator after the training is finished.
Our model was evaluated on the natural face image dataset CelebA-HQ. The CelebA-HQ dataset was divided into 28000 training images, 1000 verification images and 1000 test images, with 256 × 256 face images for CelebA-HQ.
Fig. 3 includes six kinds of face information images of a training data set, which are respectively: (a) original imagext(b) cut out human face image Xθ(c) face contour image Mθ-f(d) face content image xθ-f(e) face structure image xθ-l(f) human image foreground image xFG;
In addition, we compare our method with the existing six best face restoration methods: PM (PatchMatch), GLCIC (Global and Localiy Consistent Image completion), CA (contextual attachment), PICNet (Pluralogic Image completion), PEN (Pyramid-context Encode Network) and CSA (coherent Semantic attachment), and are compared using the same set of irregular mask data.
First, we qualitatively compared our model to PM, GLCIC, CA, PICNet, PEN and CSA. FIG. 4 shows the results of the different methods on the data set CelebA-HQ, and we show the cut-out area in black in FIG. 4 (b)). In fig. 4(c), when the missing region is largely different from the surrounding environment, the PM cannot complete the entire face. In fig. 4(d), although GLCIC can complete the whole face, the inlined area is too blurred. In fig. 4(e), the face finished by CA is severely distorted. In fig. 4(g), the PICNet may return a clear face, but the face is not harmonious. This is because the PICNet aims to produce a clear image by enhancing the constraint ability of the discriminator, but destroys the structural consistency of the image, resulting in image distortion. In FIG. 4(h), PEN, although performing well on the CelebA-HQ dataset, does not perform well on the low resolution dataset. In fig. 4(i), CSA produces very good performance similar to our approach, but it takes a long time to train because they need to find similar blocks and compute similarities to surrounding blocks. In addition, the CSA-complemented face image may generate some noise (see fig. 4 (i)). In contrast, our model achieves a natural and realistic result in FIG. 4 (j). Experimental results show that the models with reconstruction loss, face information prediction loss and countervailing loss can improve the naturalness and definition of the generated face images, and finally generate high-quality face image restoration results.
TABLE 1 Objective evaluation index comparison of the Experimental results for the CelebA-HQ dataset
To further evaluate the effectiveness of this method, we also performed quantitative comparison experiments, and the quantitative results of the data set are shown in table 1. In particular, table 1 shows the quantitative results of the different methods on the CelebA-HQ dataset. The method proposed by us utilizes the advantages of the guiding information and the second stage of the integrated information in three categories of indicators (PSNR, SSIM and L)1) The aspect shows the most advanced level than other methods. It achieves the best generalization performance with large masks. Specifically, our method (Our) with Stage-II and the guidance section achieved PSNR 25.823, SSIM 0.890, and L1 6.74 on the CelebA-HQ dataset. Furthermore, as can be seen from table 1, PEN produced results comparable to our method in terms of PSNR, SSIM and L1 for the CelebA-HQ dataset. However, PEN has much poorer completion performance than our method. The quantitative results show that our method as a whole achieves better performance in terms of PSNR, SSIM and L1 than all other methods.
It should be noted that the above-mentioned embodiments are exemplary, and that those skilled in the art, having benefit of the present disclosure, may devise various arrangements that are within the scope of the present disclosure and that fall within the scope of the invention. It should be understood by those skilled in the art that the present specification and drawings are illustrative only and are not limiting upon the claims. The scope of the invention is defined by the claims and their equivalents.
Claims (3)
1. A face image restoration method based on GAN face prior information prediction and fusion is characterized by comprising the following steps:
step 1: downloading a public face data set, preprocessing the data set and constructing an image x lacking a faceθAnd proportionally divided into trainingSet, validation set and test set;
step 2: the face image complementing method mainly comprises two stages: the stage I and stage II comprise the following steps:
the rough modification neural network model at the stage-I stage comprises a first generator, two encoders and three decoders, and firstly, the image x which is constructed in the step 1 and lacks the face informationθSending into a network with a back bone structure of a variational automatic encoder VAE, and obtaining face contour information M 'through nonlinear reconstruction by two encoders and three decoders'θ-fAnd face region information x'θ-fAnd face key point information x'θ-l(ii) a Performing information fusion on face contour information, structure information and content information obtained by VAE network reconstruction to obtain face prior guidance information which is beneficial to generating contours, structures and contents of clear faces; image x missing face informationθSending the face image to a first generator, and fusing the face prior guidance information in the intermediate layer of the first generator to fully explore face region information to generate a low-resolution face image;
step 21: the missing face image xθSequentially inputting the two encoders and the three decoders, and obtaining face contour information M 'through nonlinear reconstruction'θ-fAnd face region information x'θ-fAnd face key point information x'θ-lRespectively extracting the outline, the area and the key point information of the face;
step 22: constructing and obtaining a face coding feature vector z by combining face contour information, face region information and face key point informationθ-fAnd zθ-lFinally, z is fused by feature fusionθ-fAnd zθ-lFusing to construct human face feature expression space and obtain human face prior guidance information z with higher qualityθ-M;
Step 23: missing face image x in stage-I training stageθThrough a first generator, the face prior guidance information is fused in the intermediate layer of the first generator, and after the face prior guidance information is spliced to the intermediate characteristic diagram of the first generator, the face prior guidance information is generated under the action of the prior guidance informationLow-resolution natural symmetric face images;
step 24: iteratively training the neural network model at the stage-I stage according to the set batch size of each training set, and iteratively updating network parameters of a generator, an encoder and a decoder of the stage-I according to the face contour information, the reconstruction loss function of the face content information and the face structure information and the face information prediction loss function to complete the face image completion network training of the stage-I;
step 25: judging whether the set verification iteration times are reached, and if so, verifying the primary model and storing the primary model; if not, go to step 26;
step 26: judging whether the set total iteration times are reached, and if so, ending the training; otherwise, repeating the steps 21-25;
and step 3: after the rough neural network training of the stage-I stage is finished, freezing the trained stage-I neural network and related parameters, and starting the learning and training process of the fine neural network of the stage-II stage, wherein the network structure of the stage-II stage mainly comprises: the second generator, the global arbiter and the block arbiter specifically comprise the following steps:
step 31: in the stage-II stage, a low-resolution natural symmetric face image generated in the stage-I stage is used as input and is input into a second generator, and meanwhile, in order to better consider face structure information, face prior guidance information is further introduced into an intermediate layer of the second generator so as to refine details and structure of a face and generate a first face repairing image with higher resolution;
step 32: sending the first face repairing image into two discriminators, enabling a second generator to generate a high-resolution face repairing image with a symmetrical face structure by means of the countermeasure thought generated by a GAN network, judging the distribution consistency of the images by the overall discriminator, and supervising the generation details of the images in each patch by the blocking discriminator;
step 32: performing iterative training on the facial image refinement network in the stage-II stage according to the set batch size of each training set;
step 33: judging whether the set verification iteration times are reached, and if so, verifying the primary model and storing the primary model; if not, go to step 34;
step 34: judging whether the set total iteration times are reached, and if so, ending the training; otherwise, repeating the steps 31-33;
and 4, step 4: and testing the trained restoration model according to the test set.
2. The method for restoring a human face image as claimed in claim 1, wherein in the stage-I training, by adding constraints, three potential feature discriminators are used to replace the original KL divergence as constraints, so as to reduce the interference caused by the large base number of the KL divergence on the reconstruction loss of the human face part, and simultaneously, the method can compete with an encoder, enhance the learning capability and obtain more accurate human face feature information.
3. A method of inpainting a face image as claimed in claim 2, wherein the preprocessing method comprises:
four reference standard images are constructed to assist face completion, including a standard face contour image Mθ-fStandard face content image Xθ-fStandard face structure image Zθ-lAnd standard portrait foreground map XFGThe acquisition method comprises the following steps:
standard human face contour image Mθ-fExtracting 68 key points of the face from the original face by using a face detection alignment method, and expanding the obtained 41 key points one by one according to 3% of the size of the key points to ensure that the eyebrow and boundary information of the face is fused into the key points, thereby obtaining a face contour image Mθ-f;
Standard face content image Xθ-fOriginal image XrealAnd a standard face contour image Mθ-fMultiplying to obtain standard face content image Xθ-f;
Standard face structure image Zθ-lThe face structure image Z is obtained by performing expansion fusion on 41 key points including eyes, nose, mouth and the like in the faceθ-l;
Standard portrait foreground picture XFGObtained by segmentation of the acquired portrait using a hundredth interface.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111218941.9A CN113936318A (en) | 2021-10-20 | 2021-10-20 | Human face image restoration method based on GAN human face prior information prediction and fusion |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111218941.9A CN113936318A (en) | 2021-10-20 | 2021-10-20 | Human face image restoration method based on GAN human face prior information prediction and fusion |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113936318A true CN113936318A (en) | 2022-01-14 |
Family
ID=79280510
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111218941.9A Pending CN113936318A (en) | 2021-10-20 | 2021-10-20 | Human face image restoration method based on GAN human face prior information prediction and fusion |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113936318A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114693972A (en) * | 2022-03-29 | 2022-07-01 | 电子科技大学 | Reconstruction-based intermediate domain self-adaptive method |
CN114913588A (en) * | 2022-06-20 | 2022-08-16 | 电子科技大学 | Face image restoration and recognition method applied to complex scene |
WO2023245927A1 (en) * | 2022-06-23 | 2023-12-28 | 中国科学院自动化研究所 | Image generator training method and apparatus, and electronic device and readable storage medium |
-
2021
- 2021-10-20 CN CN202111218941.9A patent/CN113936318A/en active Pending
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114693972A (en) * | 2022-03-29 | 2022-07-01 | 电子科技大学 | Reconstruction-based intermediate domain self-adaptive method |
CN114693972B (en) * | 2022-03-29 | 2023-08-29 | 电子科技大学 | Intermediate domain field self-adaption method based on reconstruction |
CN114913588A (en) * | 2022-06-20 | 2022-08-16 | 电子科技大学 | Face image restoration and recognition method applied to complex scene |
CN114913588B (en) * | 2022-06-20 | 2023-04-25 | 电子科技大学 | Face image restoration and recognition method applied to complex scene |
WO2023245927A1 (en) * | 2022-06-23 | 2023-12-28 | 中国科学院自动化研究所 | Image generator training method and apparatus, and electronic device and readable storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Yu et al. | Diverse image inpainting with bidirectional and autoregressive transformers | |
CN108520503B (en) | Face defect image restoration method based on self-encoder and generation countermeasure network | |
Din et al. | A novel GAN-based network for unmasking of masked face | |
Zhang et al. | Text-guided neural image inpainting | |
CN113936318A (en) | Human face image restoration method based on GAN human face prior information prediction and fusion | |
Zhang et al. | Hierarchical density-aware dehazing network | |
Bhunia et al. | Improving document binarization via adversarial noise-texture augmentation | |
Hsu et al. | Single image dehazing using wavelet-based haze-lines and denoising | |
CN111861901A (en) | Edge generation image restoration method based on GAN network | |
CN112837234B (en) | Human face image restoration method based on multi-column gating convolution network | |
CN113989129A (en) | Image restoration method based on gating and context attention mechanism | |
CN112801914A (en) | Two-stage image restoration method based on texture structure perception | |
CN116051407A (en) | Image restoration method | |
Liu et al. | Facial image inpainting using multi-level generative network | |
Shamsolmoali et al. | Transinpaint: Transformer-based image inpainting with context adaptation | |
Sari et al. | Interactive image inpainting of large-scale missing region | |
CN114494387A (en) | Data set network generation model and fog map generation method | |
CN112686822B (en) | Image completion method based on stack generation countermeasure network | |
Kumar et al. | Underwater image enhancement using deep learning | |
CN116468638A (en) | Face image restoration method and system based on generation and balance countermeasure identification | |
CN116703750A (en) | Image defogging method and system based on edge attention and multi-order differential loss | |
CN114820381A (en) | Digital image restoration method based on structure information embedding and attention mechanism | |
CN116958317A (en) | Image restoration method and system combining edge information and appearance stream operation | |
Tal et al. | Nldnet++: A physics based single image dehazing network | |
Wu et al. | Semantic image inpainting based on generative adversarial networks |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |