Nothing Special   »   [go: up one dir, main page]

CN112270644A - Face super-resolution method based on spatial feature transformation and cross-scale feature integration - Google Patents

Face super-resolution method based on spatial feature transformation and cross-scale feature integration Download PDF

Info

Publication number
CN112270644A
CN112270644A CN202011124368.0A CN202011124368A CN112270644A CN 112270644 A CN112270644 A CN 112270644A CN 202011124368 A CN202011124368 A CN 202011124368A CN 112270644 A CN112270644 A CN 112270644A
Authority
CN
China
Prior art keywords
feature
output
map
face
scale
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011124368.0A
Other languages
Chinese (zh)
Other versions
CN112270644B (en
Inventor
张凯兵
庄诚
李敏奇
景军锋
卢健
刘薇
陈小改
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Liaoning Houfa Xianzhi Technology Co ltd
Original Assignee
Xian Polytechnic University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian Polytechnic University filed Critical Xian Polytechnic University
Priority to CN202011124368.0A priority Critical patent/CN112270644B/en
Publication of CN112270644A publication Critical patent/CN112270644A/en
Application granted granted Critical
Publication of CN112270644B publication Critical patent/CN112270644B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4053Scaling of whole images or parts thereof, e.g. expanding or contracting based on super-resolution, i.e. the output image resolution being higher than the sensor resolution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4023Scaling of whole images or parts thereof, e.g. expanding or contracting based on decimating pixels or lines of pixels; based on inserting pixels or lines of pixels
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/40Analysis of texture
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Biology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a face super-resolution method based on spatial feature transformation and cross-scale feature integration, which is implemented according to the following steps: preprocessing a face image to obtain a training set and a test set, and processing the preprocessed face image to generate a semantic segmentation probability map; constructing a generation confrontation network model for training; and sequentially inputting the face images in the training set into the constructed generation confrontation network model, setting parameters, training and achieving convergence, and inputting the face images in the test set into the trained generation confrontation network model to obtain the super-resolution reconstructed high-resolution images. The face super-resolution method based on spatial feature transformation and cross-scale feature integration solves the problem that texture details in a reconstructed face image are ignored in the existing method in the prior art.

Description

Face super-resolution method based on spatial feature transformation and cross-scale feature integration
Technical Field
The invention belongs to the technical field of face image recognition, and relates to a face super-resolution method based on spatial feature transformation and cross-scale feature integration.
Background
The existing tasks related to human faces, such as face recognition, face alignment, expression recognition, three-dimensional face reconstruction and the like, are all realized based on a clear high-resolution face data set, and the effect is obviously reduced when a low-resolution face image is faced. In addition, due to the inherent limitation of the conventional digital imaging device, the obtained face image is often subjected to a series of degradation processes such as optical blurring and undersampling, and finally, a clear image in visual sense is difficult to obtain. The image super-resolution technology is used as an effective image recovery means, and can effectively overcome the problem of low image resolution caused by the limitation of physical resolution of imaging equipment, optical blurring and the like.
The face super-resolution method is roughly divided into two types, namely a traditional method based on a classical machine learning algorithm and a deep learning method based on a convolutional neural network. Among them, the super-resolution method based on deep learning is gaining attention due to its superior reconstruction performance. However, most of the existing face image super-resolution algorithms only focus on the super-resolution reconstruction of a 'tiny face' with 16 × 16 pixels, namely, the super-resolution reconstruction is also called as a 'face phantom', and the reconstruction of a 'small face' such as a face image with 64 × 64 pixels, which is common in practical application, is ignored; therefore, the result images obtained by the methods can only meet the face detection task, but cannot keep identity consistency with the real face. In addition, the methods usually pursue high peak signal-to-noise ratio and structural similarity, and whether the texture details in the reconstructed face image meet the requirements of human eyes on visual perception quality is ignored.
Disclosure of Invention
The invention aims to provide a face super-resolution method based on spatial feature transformation and cross-scale feature integration, and solves the problem that the existing method in the prior art ignores the texture details in a reconstructed face image.
The technical scheme adopted by the invention is that the face super-resolution method based on spatial feature transformation and cross-scale feature integration is implemented according to the following steps:
step 1, randomly selecting N human face images from a human face data set, and then preprocessing the human face images to generate a training set and a test set;
step 2, adopting a face analysis pre-training model BisNet as a base network for generating a semantic segmentation probability map, and processing the face image preprocessed in the step 1 to generate the semantic segmentation probability map;
step 3, constructing a generated confrontation network model for training, wherein the generated confrontation network model comprises a semantic segmentation probability map intermediate condition generation module, a spatial feature transformation module, a cross-scale feature integration module and a fusion output module which are sequentially connected, a sub-pixel convolution layer sampled on an image is introduced into the cross-scale feature integration module, and a confrontation loss function and a perception loss function introduced into the generated confrontation network model;
step 4, sequentially inputting the face images in the training set obtained in the step 1 into a constructed generation confrontation network model, setting parameters, training and achieving convergence;
and 5, inputting the face images in the test set in the step 1 into the generated confrontation network model trained in the step 4 to obtain a super-resolution reconstructed high-resolution image.
The face data set in the step 1 is a CelebA-HQ face data set.
The preprocessing of the face images in the training set in the step 1 specifically comprises the following steps: adopting a bicubic interpolation algorithm to carry out down-sampling on the images in the training set, and outputting an interpolation image I with the size of 512 multiplied by 512HRAs a trainingTraining and testing the set of target images, and then interpolating image IHRDownsampling 4 times to 64 times 64 by adopting bicubic interpolation as training and testing input image ILR(ii) a Then inputting the image ILRAdopting double cubic interpolation up-sampling from 4 times to 256 times 256 as semantic segmentation network input image IS
The step 2 specifically comprises the following steps:
the face analysis pre-training model BisNet is used as a base network generated by a semantic segmentation probability map, and the output layer of the face analysis pre-training model BisNet is modified, and the method specifically comprises the following steps: adding a softmax function into an output layer of a face analysis pre-training model BisNet, and inputting the semantic segmentation network input image I obtained in the step 1SInputting the semantic probability output result into a modified face analysis pre-training model BisNet, outputting the semantic probability output result into a pth file, namely a Pythrch model file, and obtaining a semantic segmentation probability map ISeg
The step 4 specifically comprises the following steps:
step 4.1, setting training parameters, inputting training and testing into image ILRTarget image I of training set and test setHRAnd semantic segmentation probability map ISegLoading network input end, namely input end of semantic segmentation probability map intermediate condition generation module, and semantic segmentation probability map intermediate condition generation module inputting semantic segmentation probability map ISegProcessing to generate a semantic information intermediate condition psi;
step 4.2, the semantic segmentation probability map intermediate condition generation module inputs training and testing to the image ILRGenerating a feature map as a front layer feature map through a layer of convolution;
step 4.3, taking the intermediate condition psi of the front-layer feature map and the semantic information as the input of a spatial feature transformation module, and outputting a feature map F1 by the spatial feature transformation module;
step 4.4, inputting the output feature map F1 in the step 4.3 into the cross-scale integration module to obtain different scale features, then inputting the different scale features into the fusion output module to obtain a super-resolution image, and recording the super-resolution image as ISR
Step 45, converting the super-resolution image ISRAnd corresponding interpolated image IHRInput discriminator DηThe discrimination information is transmitted back to the generator for generation of the countermeasure network model, i.e. generator Gθ
And 4.6, continuously iterating the steps 4.4-4.5 to minimize the sum of the confrontation loss and the perception loss, and then taking the corresponding parameters as the trained model parameters to obtain the trained generated confrontation network model.
The semantic segmentation probability map intermediate condition generation module comprises five convolutional layers which are sequentially connected, the number of input channels of the first convolutional layer is 19, the number of output channels is 128, the size of a convolutional kernel is 4 multiplied by 4, the convolutional step length is 4, and the negative nonzero slope of a modified linear unit is 0.1; the number of input channels of the second convolutional layer is 128, the number of output channels is 128, the size of a convolutional kernel is 1 multiplied by 1, the convolution step length is 4, and the negative nonzero slope of the modified linear unit is 0.1; the number of input channels of the third convolutional layer is 128, the number of output channels is 128, the size of a convolutional kernel is 1 multiplied by 1, the convolution step is 1, the negative non-zero slope of the modified linear unit is 0.1, the number of input channels of the fourth convolutional layer is 128, the number of output channels is 128, the size of the convolutional kernel is 1 multiplied by 1, and the convolution step is 1; finally, the number of input channels of one convolutional layer is 128, the number of output channels is 32, the size of a convolutional kernel is 1 multiplied by 1, the convolution step length is 1, and finally, an intermediate condition containing semantic information is output by one convolutional layer and is marked as psi;
the spatial feature transformation module is composed of 8 residual units with spatial feature transformation layers, and each residual unit is composed of a spatial feature transformation layer, a convolution layer and a nonlinear activation layer.
Step 4.4, inputting the output feature map F1 in step 4.3 into the cross-scale integration module, and obtaining different scale features specifically as follows:
in the cross-scale integration module, the dimension of the output feature map F1 is raised by 4 times through a convolution layer pair, and the feature map F2 is obtained by performing up-sampling on the output feature map F1 by 2 times through sub-pixel convolution; meanwhile, the output feature map F1 is amplified by 2 times through double cubic interpolation and then is fused with the feature map F2 on a channel to obtain a feature map F3_1, and the feature map is transmitted backwards; the feature map F2 is reduced by two times through convolution with a step length of 2, and then is fused with the feature map F1 on a channel, so that a feature map F3_2 is obtained and is transmitted backwards; f3_1 and F3_2 are respectively input into two residual error feature extraction modules, the output feature maps are respectively marked as a feature map F4_1 and a feature map F4_2, the feature map F4_1 respectively obtains a feature map F5_2 through direct output, performs down-sampling 2-fold output by using convolution with the step length of 2 to obtain a feature map F5_1, and performs up-sampling 2-fold output by using bicubic interpolation to obtain a feature map F5_ 3;
the feature map F4_1 is up-sampled by 2 times by using a second sub-pixel to output a feature map F5, and then the feature map F5 is directly output to obtain F6_3, down-sampled by 2 times by using convolution with the step length of 2 to obtain F6_2, and down-sampled by 4 times by using convolution with the step length of 4 to obtain F6_ 1;
f4_2 are directly output to obtain F7_1, bicubic interpolation is performed for 2 times to obtain F7_2, and bicubic interpolation is performed for 4 times to obtain F7_ 3; then, after feature fusion is carried out on F5_1, F6_1 and F7_1 which are small-scale, the feature fusion is input into a feature extraction module consisting of 4 residual blocks, and the feature graph is output and amplified by 4 times through an interpolation up-sampling module to output a feature graph F8_ 1; similarly, feature fusion is carried out on feature maps F5_2, F6_2 and F7_2 which are on the same scale, then the feature maps are input into a residual feature extraction module consisting of 4 residual blocks, and the output feature map is amplified by 2 times through an interpolation upsampling module to output F8_ 2; and F5_3, F6_3 and F7_3 with the same large scale are subjected to feature fusion and then input into a residual feature extraction module consisting of 4 residual blocks, and a feature map is output to directly output F8_ 3.
And 4.4, inputting the features with different scales into a fusion output module, and obtaining a reconstructed super-resolution result specifically as follows:
feature fusion is carried out on feature maps F8_1, F8_2 and F8_3 with different scales, and then two convolution layers are used for outputting step-by-step dimensionality reduction to obtain a reconstructed super-resolution image which is marked as ISR
The perceptual loss function of step 4.6 is:
Figure BDA0002733105220000051
the penalty function is:
LD=∑ilog(1-Dη(Gθ(ILR)))
wherein phi (I)SR),φ(IHR) Representing the characteristic diagram G extracted after the result diagram and the target diagram are respectively subjected to the pre-trained Vgg networkθRepresentative of a generating network, DηRepresenting a discriminative network.
The invention has the beneficial effects that:
(1) the used spatial feature transformation layer can realize the reconstruction of a high-resolution image with a rich semantic region only by one-time forward transmission through converting the intermediate features of a single network.
(2) The reconstruction network uses semantic mapping to guide texture recovery for different regions in the high resolution domain, while using probability maps to capture fine texture details.
(3) The cross-scale feature integration module enables the texture features in transmission to be exchanged on each scale, and more effective feature representation is realized, so that the performance of the super-resolution reconstruction algorithm is further improved.
Drawings
FIG. 1 is a comparison graph of the results of example 1-1 in the face super-resolution method of the present invention with spatial feature transformation and cross-scale feature integration;
FIG. 2 is a comparison graph of the results of the face super-resolution method of the present invention in which spatial feature transformation and cross-scale feature integration are performed according to the embodiment 1-2.
Detailed Description
The present invention will be described in detail below with reference to the accompanying drawings and specific embodiments.
The face super-resolution method based on spatial feature transformation and cross-scale feature integration is implemented according to the following steps:
step 1, randomly selecting N human face images from a human face data set, and then preprocessing the human face images to generate a training set and a test set; the method specifically comprises the following steps: randomly selecting 1000 face pictures from CelebA-HQ face data setTaking the image as a training set and 100 images as a test set, adopting a bicubic interpolation algorithm to carry out down-sampling on high-resolution images in the training set, and outputting an interpolation image I with the size of 512 multiplied by 512HRAs target images of a training set and a test set; downsampling 4 times to 64 x 64 as training and test input image I using also bicubic interpolationLR(ii) a Then adding ILRRe-interpolation up-sampling 4 times to 256 times 256 as semantic segmentation network input image IS
Step 2, adopting a face analysis pre-training model BisNet as a base network for generating a semantic segmentation probability map, and processing the face image preprocessed in the step 1 to generate the semantic segmentation probability map; the method specifically comprises the following steps:
the face analysis pre-training model BisNet is used as a base network generated by a semantic segmentation probability map, and the output layer of the face analysis pre-training model BisNet is modified, and the method specifically comprises the following steps: adding a softmax function into an output layer of a face analysis pre-training model BisNet, and inputting the semantic segmentation network input image I obtained in the step 1SInputting the semantic probability output result into a modified face analysis pre-training model BisNet, outputting the semantic probability output result into a pth file, namely a Pythrch model file, and obtaining a semantic segmentation probability map ISeg
Step 3, constructing a generated confrontation network model for training, wherein the generated confrontation network model comprises a semantic segmentation probability map intermediate condition generation module, a spatial feature transformation module, a cross-scale feature integration module and a fusion output module which are sequentially connected, a sub-pixel convolution layer sampled on an image is introduced into the cross-scale feature integration module, and a confrontation loss function and a perception loss function introduced into the generated confrontation network model;
step 4, sequentially inputting the face images in the training set obtained in the step 1 into a constructed generation confrontation network model, setting parameters, training and achieving convergence;
and 5, inputting the face images in the test set in the step 1 into the generated confrontation network model trained in the step 4 to obtain a super-resolution reconstructed high-resolution image.
The step 4 specifically comprises the following steps:
step 4.1, setting training parameters, inputting training and testing into image ILRTarget image I of training set and test setHRAnd semantic segmentation probability map ISegLoading network input end, namely input end of semantic segmentation probability map intermediate condition generation module, and semantic segmentation probability map intermediate condition generation module inputting semantic segmentation probability map ISegProcessing is carried out to generate a semantic information intermediate condition psi, wherein a semantic segmentation probability map intermediate condition generation module comprises five convolutional layers which are sequentially connected, the number of input channels of the first convolutional layer is 19, the number of output channels is 128, the size of a convolutional kernel is 4 multiplied by 4, the convolution step length is 4, and the negative nonzero slope of a correction linear unit is 0.1; the number of input channels of the second convolutional layer is 128, the number of output channels is 128, the size of a convolutional kernel is 1 multiplied by 1, the convolution step length is 4, and the negative nonzero slope of the modified linear unit is 0.1; the number of input channels of the third convolutional layer is 128, the number of output channels is 128, the size of a convolutional kernel is 1 multiplied by 1, the convolution step is 1, the negative non-zero slope of the modified linear unit is 0.1, the number of input channels of the fourth convolutional layer is 128, the number of output channels is 128, the size of the convolutional kernel is 1 multiplied by 1, and the convolution step is 1; finally, the number of input channels of one convolutional layer is 128, the number of output channels is 32, the size of a convolutional kernel is 1 multiplied by 1, the convolution step length is 1, finally, an intermediate condition containing semantic information is output by one convolutional layer and is marked as psi, and the structural parameters of the module are shown in table 1;
step 4.2, the semantic segmentation probability map intermediate condition generation module inputs training and testing to the image ILRGenerating a feature map as a front layer feature map through a layer of convolution;
step 4.3, using the intermediate condition psi of the front layer feature map and the semantic information as the input of a spatial feature transformation module, outputting a feature map F1 by the spatial feature transformation module, wherein the spatial feature transformation module consists of 8 residual error units with spatial feature transformation layers, and each residual error unit consists of a spatial feature transformation layer, a convolution layer and a nonlinear activation layer; each residual unit consists of a spatial characteristic transformation layer, a convolution layer and a nonlinear activation layer, and the structure is shown in table 2; the method comprises the following steps that a feature map of a layer above a spatial feature transformation layer and a semantic information intermediate condition psi are used as input, a pair of modulation parameters (gamma, beta) are generated through two groups of convolution inside, and affine transformation of the feature map on the space is achieved through multiplication and addition;
the mathematical description is as follows:
SFT(F|γ,β)=γ⊙F+β
wherein F represents a feature map whose dimensions are consistent with those of γ and β, which is a dot product operation of elements at corresponding positions of the matrix.
Step 4.4, inputting the output feature map F1 in the step 4.3 into the cross-scale integration module to obtain different scale features, then inputting the different scale features into the fusion output module to obtain a super-resolution image, and recording the super-resolution image as ISR(ii) a In the cross-scale integration module, the dimension of the output feature map F1 is raised by 4 times through a convolution layer pair, and the feature map F2 is obtained by performing up-sampling on the output feature map F1 by 2 times through sub-pixel convolution; meanwhile, the output feature map F1 is amplified by 2 times through double cubic interpolation and then is fused with the feature map F2 on a channel to obtain a feature map F3_1, and the feature map is transmitted backwards; the feature map F2 is reduced by two times through convolution with a step length of 2, and then is fused with the feature map F1 on a channel, so that a feature map F3_2 is obtained and is transmitted backwards; f3_1 and F3_2 are respectively input into two residual error feature extraction modules, the structure of each residual error block is shown in a table 3, output feature maps are respectively marked as a feature map F4_1 and a feature map F4_2, the feature map F4_1 is respectively output directly to obtain a feature map F5_2, the feature map F5_1 is obtained by performing down-sampling 2 times output by using convolution with the step length of 2, and the feature map F5_3 is obtained by performing up-sampling 2 times output by using bicubic interpolation;
the feature map F4_1 is up-sampled by 2 times by using a second sub-pixel to output a feature map F5, and then the feature map F5 is directly output to obtain F6_3, down-sampled by 2 times by using convolution with the step length of 2 to obtain F6_2, and down-sampled by 4 times by using convolution with the step length of 4 to obtain F6_ 1;
f4_2 are directly output to obtain F7_1, bicubic interpolation is performed for 2 times to obtain F7_2, and bicubic interpolation is performed for 4 times to obtain F7_ 3; then, after feature fusion is carried out on F5_1, F6_1 and F7_1 which are small-scale, the feature fusion is input into a feature extraction module consisting of 4 residual blocks, and the feature graph is output and amplified by 4 times through an interpolation up-sampling module to output a feature graph F8_ 1; similarly, feature fusion is carried out on feature maps F5_2, F6_2 and F7_2 which are on the same scale, then the feature maps are input into a residual feature extraction module consisting of 4 residual blocks, and the output feature map is amplified by 2 times through an interpolation upsampling module to output F8_ 2; performing feature fusion on the large-scale F5_3, F6_3 and F7_3, inputting the fused features into a feature extraction module consisting of 4 residual blocks, and directly outputting a feature map F8_3, wherein the structure of the residual blocks is shown in Table 3;
feature fusion is carried out on feature maps F8_1, F8_2 and F8_3 with different scales, and then two convolution layers are used for outputting step-by-step dimensionality reduction to obtain a reconstructed super-resolution image which is marked as ISR
Step 4.5, the super-resolution image ISRAnd corresponding interpolated image IHRInput discriminator DηThe discrimination information is transmitted back to the generator for generation of the countermeasure network model, i.e. generator Gθ
Step 4.6, continuously iterating steps 4.4-4.5 to minimize the sum of the confrontation loss and the perception loss, and then taking the corresponding parameters as the trained model parameters to obtain a trained generated confrontation network model, wherein the perception loss function is as follows:
Figure BDA0002733105220000101
the penalty function is:
LD=∑ilog(1-Dη(Gθ(ILR)))
wherein phi (I)SR),φ(IHR) Representing the characteristic diagram G extracted after the result diagram and the target diagram are respectively subjected to the pre-trained Vgg networkθRepresentative of a generating network, DηRepresenting a discriminative network.
The invention sets the training data volume of each step, namely bachsize, to 16, sets iteration 3000 rounds, sets the perception loss weight to 1, and sets the countermeasure loss weight to 10-4(ii) a Starting training, and obtaining the training of the last round after the training is finishedAnd the obtained parameters are stored into a model file, and in the invention, after all training samples are traversed for 3000 rounds, the lumped loss is verified to be basically unchanged, which indicates that the training can be finished.
TABLE 1
Conv_1|LeakyRelu (19,128,4,4)|LeakyRelu
Conv_2|LeakyRelu (128,128,1,1)|LeakyRelu
Conv_3|LeakyRelu (128,128,1,1)|LeakyRelu
Conv_4|LeakyRelu (128,128,1,1)|LeakyRelu
Conv_out (128,32,1,1)
TABLE 2
Figure BDA0002733105220000111
As shown in table 2, SFT is a spatial feature transform layer, Scale _ Conv0 and Scale _ Conv1 are two convolution layers, which can be learned to obtain a scaling parameter γ; shift _ Conv0 and Shift _ Conv1 are two convolutional layers that can be learned to obtain the Shift parameter β. The corresponding parameters in brackets respectively represent the number of input feature maps, the number of output feature maps, the size of convolution kernels and the size of step sizes of the layer from left to right.
TABLE 3
Conv (64,64,3,1,1)
Relu \
Conv (64,64,3,1,1)
As shown in table 3, the module is composed of a convolutional layer, an active layer, and a convolutional layer, and the corresponding parameters in the parentheses represent the number of input feature maps, the number of output feature maps, the size of convolutional kernels, and the step size of the layer, respectively, from left to right.
Examples
In order to generate a human face semantic segmentation probability map more conveniently and compare image details more easily, the human face semantic segmentation probability map generation method adopts a human face high-definition data set CelebA-HQ experimentally, and randomly selects a part of human face images from the human face semantic segmentation probability map to compare results under 4 times of amplification; in addition, in order to better quantify the image quality score and make the image quality score more fit with the sense of human eyes, the invention compares PSNR (peak signal-to-noise ratio) and SSIM (structural similarity), and also calculates the local block similarity and a perception index parameter based on the recommendation of Ma super et al. PSNR values, SSIM values, LPIPS values and PI values obtained by using the existing more advanced technologies such as MSRN (multi-scale residual error network), EDSR (improved deep residual error super resolution network) method, SRFBN (super resolution feedback network) method, SFTGAN (spatial feature transform network) method, ESRGAN (improved super resolution generation countermeasure network) method and the method of the present invention are respectively as follows:
Figure BDA0002733105220000121
Figure BDA0002733105220000131
by comparison, the method of the invention is superior to other comparison methods in subjective visual quality as shown in fig. 1 and 2 and objective evaluation indexes, and especially compared with more advanced ESRGAN (improved super-resolution generation countermeasure network), almost the same performance as the method is obtained, but the parameter number of the invention is only 4,604,262, while the parameter number of the ESRGAN (improved super-resolution generation countermeasure network) is 16,697,987.

Claims (9)

1. The face super-resolution method based on spatial feature transformation and cross-scale feature integration is characterized by comprising the following steps:
step 1, randomly selecting N human face images from a human face data set, and then preprocessing the human face images to generate a training set and a test set;
step 2, adopting a face analysis pre-training model BisNet as a base network for generating a semantic segmentation probability map, and processing the face image preprocessed in the step 1 to generate the semantic segmentation probability map;
step 3, constructing a generated confrontation network model for training, wherein the generated confrontation network model comprises a semantic segmentation probability map intermediate condition generation module, a spatial feature transformation module, a cross-scale feature integration module and a fusion output module which are sequentially connected, a sub-pixel convolution layer sampled on an image is introduced into the cross-scale feature integration module, and a confrontation loss function and a perception loss function are introduced into the generated confrontation network model;
step 4, sequentially inputting the face images in the training set obtained in the step 1 into a constructed generation confrontation network model, setting parameters, training and achieving convergence;
and 5, inputting the face images in the test set in the step 1 into the generated confrontation network model trained in the step 4 to obtain a super-resolution reconstructed high-resolution image.
2. The face super-resolution method based on spatial feature transformation and cross-scale feature integration according to claim 1, wherein the face data set in step 1 is a CelebA-HQ face data set.
3. The face super-resolution method based on spatial feature transformation and cross-scale feature integration according to claim 1, wherein the preprocessing of the face images in the training set in step 1 specifically comprises: adopting a bicubic interpolation algorithm to carry out down-sampling on the images in the training set, and outputting an interpolation image I with the size of 512 multiplied by 512HRAs target images of the training set and the test set, and then interpolating the image IHRDownsampling 4 times to 64 times 64 by adopting bicubic interpolation as training and testing input image ILR(ii) a Then inputting the image ILRAdopting double cubic interpolation up-sampling from 4 times to 256 times 256 as semantic segmentation network input image IS
4. The face super-resolution method based on spatial feature transformation and cross-scale feature integration according to claim 3, wherein the step 2 specifically comprises:
the face analysis pre-training model BisNet is used as a base network generated by a semantic segmentation probability map, and the output layer of the face analysis pre-training model BisNet is modified, and the method specifically comprises the following steps: adding a softmax function into an output layer of a face analysis pre-training model BisNet, and inputting the semantic segmentation network input image I obtained in the step 1SInputting the semantic probability output result into a modified face analysis pre-training model BisNet, outputting the semantic probability output result into a pth file, namely a Pythrch model file, and obtaining a semantic segmentation probability map ISeg
5. The face super-resolution method based on spatial feature transformation and cross-scale feature integration according to claim 4, wherein the step 4 specifically comprises:
step 4.1, setting training parameters, inputting training and testing into image ILRTarget image I of training set and test setHRAnd semantic segmentation probability map ISegLoading network input end, namely input end of semantic segmentation probability map intermediate condition generation module, wherein the semantic segmentation probability map intermediate condition generation module inputs semantic segmentation probability map ISegProcessing to generate a semantic information intermediate condition psi;
step 4.2, the semantic segmentation probability map intermediate condition generation module inputs training and testing to the image ILRGenerating a feature map as a front layer feature map through a layer of convolution;
step 4.3, taking the intermediate condition psi of the front-layer feature map and the semantic information as the input of a spatial feature transformation module, and outputting a feature map F1 by the spatial feature transformation module;
step 4.4, inputting the output feature map F1 in the step 4.3 into the cross-scale integration module to obtain different scale features, then inputting the different scale features into the fusion output module to obtain a super-resolution image, and recording the super-resolution image as ISR
Step 4.5, the super-resolution image ISRAnd corresponding interpolated image IHRInput discriminator DηThe discrimination information is transmitted back to the generator for generation of the countermeasure network model, i.e. generator Gθ
And 4.6, continuously iterating the steps 4.4-4.5 to minimize the sum of the confrontation loss and the perception loss, and then taking the corresponding parameters as the trained model parameters to obtain the trained generated confrontation network model.
6. The face super-resolution method based on spatial feature transformation and cross-scale feature integration according to claim 5, wherein the semantic segmentation probability map intermediate condition generation module includes five convolutional layers connected in sequence, the number of input channels of the first convolutional layer is 19, the number of output channels is 128, the size of a convolutional kernel is 4 x 4, the convolutional step is 4, and the negative non-zero slope of the modified linear unit is 0.1; the number of input channels of the second convolutional layer is 128, the number of output channels is 128, the size of a convolutional kernel is 1 multiplied by 1, the convolution step length is 4, and the negative nonzero slope of the modified linear unit is 0.1; the number of input channels of the third convolutional layer is 128, the number of output channels is 128, the size of a convolutional kernel is 1 multiplied by 1, the convolution step is 1, the negative non-zero slope of the modified linear unit is 0.1, the number of input channels of the fourth convolutional layer is 128, the number of output channels is 128, the size of the convolutional kernel is 1 multiplied by 1, and the convolution step is 1; finally, the number of input channels of one convolutional layer is 128, the number of output channels is 32, the size of a convolutional kernel is 1 multiplied by 1, the convolution step length is 1, and finally, an intermediate condition containing semantic information is output by one convolutional layer and is marked as psi;
the spatial feature transformation module is composed of 8 residual error units with spatial feature transformation layers, and each residual error unit is composed of a spatial feature transformation layer, a convolution layer and a nonlinear activation layer.
7. The face super-resolution method based on spatial feature transformation and cross-scale feature integration according to claim 6, wherein the step 4.4 inputs the feature map F1 output in step 4.3 into the cross-scale integration module, and the obtaining of different scale features specifically comprises:
in the cross-scale integration module, the dimension of the output feature map F1 is raised by 4 times through a convolution layer pair, and the feature map F2 is obtained by performing up-sampling on the output feature map F1 by 2 times through sub-pixel convolution; meanwhile, the output feature map F1 is amplified by 2 times through double cubic interpolation and then is fused with the feature map F2 on a channel to obtain a feature map F3_1, and the feature map is transmitted backwards; the feature map F2 is reduced by two times through convolution with a step length of 2, and then is fused with the feature map F1 on a channel, so that a feature map F3_2 is obtained and is transmitted backwards; f3_1 and F3_2 are respectively input into two residual error feature extraction modules, the output feature maps are respectively marked as a feature map F4_1 and a feature map F4_2, the feature map F4_1 respectively obtains a feature map F5_2 through direct output, performs down-sampling 2-fold output by using convolution with the step length of 2 to obtain a feature map F5_1, and performs up-sampling 2-fold output by using bicubic interpolation to obtain a feature map F5_ 3;
the feature map F4_1 is up-sampled by 2 times by using a second sub-pixel to output a feature map F5, and then the feature map F5 is directly output to obtain F6_3, down-sampled by 2 times by using convolution with the step length of 2 to obtain F6_2, and down-sampled by 4 times by using convolution with the step length of 4 to obtain F6_ 1;
f4_2 are directly output to obtain F7_1, bicubic interpolation is performed for 2 times to obtain F7_2, and bicubic interpolation is performed for 4 times to obtain F7_ 3; then, after feature fusion is carried out on F5_1, F6_1 and F7_1 which are small-scale, the feature fusion is input into a feature extraction module consisting of 4 residual blocks, and the feature graph is output and amplified by 4 times through an interpolation up-sampling module to output a feature graph F8_ 1; similarly, feature fusion is carried out on feature maps F5_2, F6_2 and F7_2 which are on the same scale, then the feature maps are input into a residual feature extraction module consisting of 4 residual blocks, and the output feature map is amplified by 2 times through an interpolation upsampling module to output F8_ 2; and F5_3, F6_3 and F7_3 with the same large scale are subjected to feature fusion and then input into a residual feature extraction module consisting of 4 residual blocks, and a feature map is output to directly output F8_ 3.
8. The face super-resolution method based on spatial feature transformation and cross-scale feature integration according to claim 7, wherein in step 4.4, features of different scales are input to the fusion output module, and the obtained super-resolution result after reconstruction specifically comprises:
feature fusion is carried out on feature maps F8_1, F8_2 and F8_3 with different scales, and then two convolution layers are used for outputting step-by-step dimensionality reduction to obtain a reconstructed super-resolution image which is marked as ISR
9. The face super-resolution method based on spatial feature transformation and cross-scale feature integration according to claim 8, wherein the perceptual loss function in step 4.6 is:
Figure FDA0002733105210000051
the penalty function is:
LD=∑ilog(1-Dη(Gθ(ILR)))
wherein phi (I)SR),φ(IHR) Representing the characteristic diagram G extracted after the result diagram and the target diagram are respectively subjected to the pre-trained Vgg networkθRepresentative of a generating network, DηRepresenting a discriminative network.
CN202011124368.0A 2020-10-20 2020-10-20 Face super-resolution method based on spatial feature transformation and trans-scale feature integration Active CN112270644B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011124368.0A CN112270644B (en) 2020-10-20 2020-10-20 Face super-resolution method based on spatial feature transformation and trans-scale feature integration

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011124368.0A CN112270644B (en) 2020-10-20 2020-10-20 Face super-resolution method based on spatial feature transformation and trans-scale feature integration

Publications (2)

Publication Number Publication Date
CN112270644A true CN112270644A (en) 2021-01-26
CN112270644B CN112270644B (en) 2024-05-28

Family

ID=74338729

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011124368.0A Active CN112270644B (en) 2020-10-20 2020-10-20 Face super-resolution method based on spatial feature transformation and trans-scale feature integration

Country Status (1)

Country Link
CN (1) CN112270644B (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112949707A (en) * 2021-02-26 2021-06-11 西安电子科技大学 Cross-mode face image generation method based on multi-scale semantic information supervision
CN113128624A (en) * 2021-05-11 2021-07-16 山东财经大学 Graph network face recovery method based on multi-scale dictionary
CN113177882A (en) * 2021-04-29 2021-07-27 浙江大学 Single-frame image super-resolution processing method based on diffusion model
CN113240792A (en) * 2021-04-29 2021-08-10 浙江大学 Image fusion generation type face changing method based on face reconstruction
CN113298740A (en) * 2021-05-27 2021-08-24 中国科学院深圳先进技术研究院 Image enhancement method and device, terminal equipment and storage medium
CN113538307A (en) * 2021-06-21 2021-10-22 陕西师范大学 Synthetic aperture imaging method based on multi-view super-resolution depth network
CN113643687A (en) * 2021-07-08 2021-11-12 南京邮电大学 Non-parallel many-to-many voice conversion method fusing DSNet and EDSR network
CN113723414A (en) * 2021-08-12 2021-11-30 中国科学院信息工程研究所 Mask face shelter segmentation method and device
CN113850813A (en) * 2021-09-16 2021-12-28 太原理工大学 Unsupervised remote sensing image semantic segmentation method based on spatial resolution domain self-adaption
CN115174620A (en) * 2022-07-01 2022-10-11 北京博数嘉科技有限公司 Intelligent tourism comprehensive service system and method
CN117611442A (en) * 2024-01-19 2024-02-27 第六镜科技(成都)有限公司 Near infrared face image generation method

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107154023A (en) * 2017-05-17 2017-09-12 电子科技大学 Face super-resolution reconstruction method based on generation confrontation network and sub-pix convolution
WO2017210690A1 (en) * 2016-06-03 2017-12-07 Lu Le Spatial aggregation of holistically-nested convolutional neural networks for automated organ localization and segmentation in 3d medical scans
WO2019109524A1 (en) * 2017-12-07 2019-06-13 平安科技(深圳)有限公司 Foreign object detection method, application server, and computer readable storage medium
CN110136063A (en) * 2019-05-13 2019-08-16 南京信息工程大学 A kind of single image super resolution ratio reconstruction method generating confrontation network based on condition
CN111027575A (en) * 2019-12-13 2020-04-17 广西师范大学 Semi-supervised semantic segmentation method for self-attention confrontation learning
CN111080645A (en) * 2019-11-12 2020-04-28 中国矿业大学 Remote sensing image semi-supervised semantic segmentation method based on generating type countermeasure network
KR20200080970A (en) * 2018-12-27 2020-07-07 포항공과대학교 산학협력단 Semantic segmentation method of 3D reconstructed model using incremental fusion of 2D semantic predictions
CN111695455A (en) * 2020-05-28 2020-09-22 西安工程大学 Low-resolution face recognition method based on coupling discrimination manifold alignment

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017210690A1 (en) * 2016-06-03 2017-12-07 Lu Le Spatial aggregation of holistically-nested convolutional neural networks for automated organ localization and segmentation in 3d medical scans
CN107154023A (en) * 2017-05-17 2017-09-12 电子科技大学 Face super-resolution reconstruction method based on generation confrontation network and sub-pix convolution
WO2019109524A1 (en) * 2017-12-07 2019-06-13 平安科技(深圳)有限公司 Foreign object detection method, application server, and computer readable storage medium
KR20200080970A (en) * 2018-12-27 2020-07-07 포항공과대학교 산학협력단 Semantic segmentation method of 3D reconstructed model using incremental fusion of 2D semantic predictions
CN110136063A (en) * 2019-05-13 2019-08-16 南京信息工程大学 A kind of single image super resolution ratio reconstruction method generating confrontation network based on condition
CN111080645A (en) * 2019-11-12 2020-04-28 中国矿业大学 Remote sensing image semi-supervised semantic segmentation method based on generating type countermeasure network
CN111027575A (en) * 2019-12-13 2020-04-17 广西师范大学 Semi-supervised semantic segmentation method for self-attention confrontation learning
CN111695455A (en) * 2020-05-28 2020-09-22 西安工程大学 Low-resolution face recognition method based on coupling discrimination manifold alignment

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
DONGLI WANG, ET AL.,: ""TwinsAdvNet:Adversarial Learning for Semantic Segmentation"", 《2019 IEEE GLOBAL CONFERENCE ON SIGNAL AND INFORMATION PROCESSING(GLOBALSIP)》 *
李昂: ""基于对抗神经网络和语义分割技术的 图像超分辨率系统的研发和应用"", 《有线电视技术》, no. 11, pages 28 - 33 *
赵增顺;高寒旭;孙骞;滕升华;常发亮;DAPENG OLIVER WU;: "生成对抗网络理论框架、衍生模型与应用最新进展", 小型微型计算机系统, no. 12 *

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112949707B (en) * 2021-02-26 2024-02-09 西安电子科技大学 Cross-modal face image generation method based on multi-scale semantic information supervision
CN112949707A (en) * 2021-02-26 2021-06-11 西安电子科技大学 Cross-mode face image generation method based on multi-scale semantic information supervision
CN113177882A (en) * 2021-04-29 2021-07-27 浙江大学 Single-frame image super-resolution processing method based on diffusion model
CN113240792A (en) * 2021-04-29 2021-08-10 浙江大学 Image fusion generation type face changing method based on face reconstruction
CN113128624A (en) * 2021-05-11 2021-07-16 山东财经大学 Graph network face recovery method based on multi-scale dictionary
WO2022247232A1 (en) * 2021-05-27 2022-12-01 中国科学院深圳先进技术研究院 Image enhancement method and apparatus, terminal device, and storage medium
CN113298740A (en) * 2021-05-27 2021-08-24 中国科学院深圳先进技术研究院 Image enhancement method and device, terminal equipment and storage medium
CN113538307A (en) * 2021-06-21 2021-10-22 陕西师范大学 Synthetic aperture imaging method based on multi-view super-resolution depth network
CN113643687B (en) * 2021-07-08 2023-07-18 南京邮电大学 Non-parallel many-to-many voice conversion method integrating DSNet and EDSR networks
CN113643687A (en) * 2021-07-08 2021-11-12 南京邮电大学 Non-parallel many-to-many voice conversion method fusing DSNet and EDSR network
CN113723414A (en) * 2021-08-12 2021-11-30 中国科学院信息工程研究所 Mask face shelter segmentation method and device
CN113723414B (en) * 2021-08-12 2023-12-15 中国科学院信息工程研究所 Method and device for dividing mask face shielding object
CN113850813A (en) * 2021-09-16 2021-12-28 太原理工大学 Unsupervised remote sensing image semantic segmentation method based on spatial resolution domain self-adaption
CN113850813B (en) * 2021-09-16 2024-05-28 太原理工大学 Spatial resolution domain self-adaption based unsupervised remote sensing image semantic segmentation method
CN115174620A (en) * 2022-07-01 2022-10-11 北京博数嘉科技有限公司 Intelligent tourism comprehensive service system and method
CN115174620B (en) * 2022-07-01 2023-06-16 北京博数嘉科技有限公司 Intelligent comprehensive travel service system and method
CN117611442A (en) * 2024-01-19 2024-02-27 第六镜科技(成都)有限公司 Near infrared face image generation method

Also Published As

Publication number Publication date
CN112270644B (en) 2024-05-28

Similar Documents

Publication Publication Date Title
CN112270644B (en) Face super-resolution method based on spatial feature transformation and trans-scale feature integration
CN110211045B (en) Super-resolution face image reconstruction method based on SRGAN network
CN110033410B (en) Image reconstruction model training method, image super-resolution reconstruction method and device
Luo et al. Lattice network for lightweight image restoration
CN112037131A (en) Single-image super-resolution reconstruction method based on generation countermeasure network
CN111932461A (en) Convolutional neural network-based self-learning image super-resolution reconstruction method and system
CN112561799A (en) Infrared image super-resolution reconstruction method
CN113538246A (en) Remote sensing image super-resolution reconstruction method based on unsupervised multi-stage fusion network
CN113781308A (en) Image super-resolution reconstruction method and device, storage medium and electronic equipment
CN116664397B (en) TransSR-Net structured image super-resolution reconstruction method
CN113469884A (en) Video super-resolution method, system, equipment and storage medium based on data simulation
CN112163998A (en) Single-image super-resolution analysis method matched with natural degradation conditions
CN115880158A (en) Blind image super-resolution reconstruction method and system based on variational self-coding
Liu et al. Learning cascaded convolutional networks for blind single image super-resolution
CN116468605A (en) Video super-resolution reconstruction method based on time-space layered mask attention fusion
Chen et al. Image denoising via deep network based on edge enhancement
CN115115514A (en) Image super-resolution reconstruction method based on high-frequency information feature fusion
CN116188272B (en) Two-stage depth network image super-resolution reconstruction method suitable for multiple fuzzy cores
CN114066729A (en) Face super-resolution reconstruction method capable of recovering identity information
CN116703725A (en) Method for realizing super resolution for real world text image by double branch network for sensing multiple characteristics
CN116797456A (en) Image super-resolution reconstruction method, system, device and storage medium
CN113674154B (en) Single image super-resolution reconstruction method and system based on generation countermeasure network
CN115936983A (en) Method and device for super-resolution of nuclear magnetic image based on style migration and computer storage medium
Tian et al. Retinal fundus image superresolution generated by optical coherence tomography based on a realistic mixed attention GAN
CN117745541A (en) Image super-resolution reconstruction method based on lightweight mixed attention network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20240131

Address after: 518000 1002, Building A, Zhiyun Industrial Park, No. 13, Huaxing Road, Henglang Community, Longhua District, Shenzhen, Guangdong Province

Applicant after: Shenzhen Wanzhida Technology Co.,Ltd.

Country or region after: China

Address before: 710048 Shaanxi province Xi'an Beilin District Jinhua Road No. 19

Applicant before: XI'AN POLYTECHNIC University

Country or region before: China

TA01 Transfer of patent application right
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20240419

Address after: 117000, Building 155, Pingshan Road, Mingshan District, Benxi City, Liaoning Province, China, 1-4-5

Applicant after: Rao Jinbao

Country or region after: China

Address before: 518000 1002, Building A, Zhiyun Industrial Park, No. 13, Huaxing Road, Henglang Community, Longhua District, Shenzhen, Guangdong Province

Applicant before: Shenzhen Wanzhida Technology Co.,Ltd.

Country or region before: China

GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20240619

Address after: 117000, Building 11, Zijin Road, Mingshan District, Benxi City, Liaoning Province, China, 3-4 to 12

Patentee after: Sui Jiaoyang

Country or region after: China

Address before: 117000, Building 155, Pingshan Road, Mingshan District, Benxi City, Liaoning Province, China, 1-4-5

Patentee before: Rao Jinbao

Country or region before: China

TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20240717

Address after: 117000 Detai Street, Pingshan District, Benxi City, Liaoning Province

Patentee after: Liaoning Houfa Xianzhi Technology Co.,Ltd.

Country or region after: China

Address before: 117000, Building 11, Zijin Road, Mingshan District, Benxi City, Liaoning Province, China, 3-4 to 12

Patentee before: Sui Jiaoyang

Country or region before: China

TR01 Transfer of patent right