WO2022057837A1 - 图像处理和人像超分辨率重建及模型训练方法、装置、电子设备及存储介质 - Google Patents
图像处理和人像超分辨率重建及模型训练方法、装置、电子设备及存储介质 Download PDFInfo
- Publication number
- WO2022057837A1 WO2022057837A1 PCT/CN2021/118591 CN2021118591W WO2022057837A1 WO 2022057837 A1 WO2022057837 A1 WO 2022057837A1 CN 2021118591 W CN2021118591 W CN 2021118591W WO 2022057837 A1 WO2022057837 A1 WO 2022057837A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- image
- training
- resolution
- super
- processing
- Prior art date
Links
- 238000012549 training Methods 0.000 title claims abstract description 278
- 238000000034 method Methods 0.000 title claims abstract description 178
- 238000003672 processing method Methods 0.000 title claims description 38
- 238000012545 processing Methods 0.000 claims abstract description 209
- 238000000605 extraction Methods 0.000 claims abstract description 92
- 230000008569 process Effects 0.000 claims abstract description 33
- 208000037170 Delayed Emergence from Anesthesia Diseases 0.000 claims description 59
- 238000001514 detection method Methods 0.000 claims description 38
- 230000001149 cognitive effect Effects 0.000 claims description 12
- 230000004927 fusion Effects 0.000 claims description 9
- 238000005070 sampling Methods 0.000 claims description 8
- 230000004913 activation Effects 0.000 claims description 4
- 238000012935 Averaging Methods 0.000 claims description 3
- 230000000694 effects Effects 0.000 abstract description 18
- 230000001965 increasing effect Effects 0.000 abstract description 7
- 238000004364 calculation method Methods 0.000 abstract description 6
- 230000006870 function Effects 0.000 description 117
- 238000010586 diagram Methods 0.000 description 22
- 230000019771 cognition Effects 0.000 description 5
- 238000004422 calculation algorithm Methods 0.000 description 4
- 230000005540 biological transmission Effects 0.000 description 3
- 238000004590 computer program Methods 0.000 description 3
- 238000013527 convolutional neural network Methods 0.000 description 3
- 230000007423 decrease Effects 0.000 description 3
- 238000003062 neural network model Methods 0.000 description 3
- 238000007781 pre-processing Methods 0.000 description 3
- 238000012360 testing method Methods 0.000 description 3
- 238000004891 communication Methods 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 230000000306 recurrent effect Effects 0.000 description 2
- 230000009467 reduction Effects 0.000 description 2
- 230000002441 reversible effect Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000002059 diagnostic imaging Methods 0.000 description 1
- 238000006073 displacement reaction Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 230000001815 facial effect Effects 0.000 description 1
- 238000003384 imaging method Methods 0.000 description 1
- 238000012806 monitoring device Methods 0.000 description 1
- 230000006798 recombination Effects 0.000 description 1
- 238000005215 recombination Methods 0.000 description 1
- 230000000717 retained effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformations in the plane of the image
- G06T3/40—Scaling of whole images or parts thereof, e.g. expanding or contracting
Definitions
- the present application relates to the technical field of computer vision, and in particular, to a method, apparatus, electronic device, and storage medium for image processing, super-resolution reconstruction of portraits, and model training.
- Image super-resolution reconstruction or image super-resolution restoration refers to the process of restoring a given low-resolution image or image sequence into a corresponding high-resolution image through specific processing. It is widely used in various types of videos or images that need to be improved. Quality fields, such as video image processing, medical imaging, remote sensing imaging, video surveillance, etc.
- image super-resolution reconstruction technology is also widely used in many fields such as face recognition, big data analysis, security, etc., which is of great help to achieve portrait restoration, portrait recognition, and matching.
- image super-resolution reconstruction for example, in the process of super-resolution reconstruction of a human portrait, the method usually adopted is to reconstruct the entire image, because this method does not focus on the human eye. Perceive the more important information, which makes the reconstructed image difficult to meet the actual needs.
- Embodiments of the present application provide an image processing and model training method, apparatus, electronic device, and storage medium, so as to improve the processing speed while ensuring the reconstruction effect.
- Embodiments of the present application also provide a portrait super-resolution reconstruction method, a model training method, an apparatus, an electronic device, and a readable storage medium, which can improve the recognition of the obtained super-resolution image and meet user requirements.
- Some embodiments of the present application provide an image processing method, the method may include:
- the reconstructed feature map is enlarged by using the sub-pixel convolution layer of the image reconstruction model to obtain a reconstructed image.
- the feature extraction network may include a convolutional layer, a plurality of concatenated blocks and a plurality of first convolutional layers, a plurality of the concatenated blocks and a plurality of the first convolutional layers Alternately set, the feature extraction network can adopt a global cascade structure;
- the step of using the feature extraction network of the image reconstruction model to perform multi-scale feature extraction on the to-be-processed image to obtain a reconstructed feature map may include:
- the output of the last first convolutional layer is used as the reconstructed feature map.
- the number of the concatenated blocks may be 3 to 5, and the number of the first convolutional layers may be 3 to 5.
- the concatenated block may include a plurality of residual blocks and a plurality of second convolution layers, and the plurality of the residual blocks and the plurality of the second convolution layers are alternately arranged, so
- the cascading block can adopt a local cascading structure
- the step of using the cascaded blocks to perform multi-scale feature extraction and outputting an intermediate feature map may include:
- the residual block learns the residual features, and obtains the residual feature map
- the input of the concatenated block and the output of each of the residual blocks before the Nth second convolutional layer are channel-stacked, and input to the Nth second convolutional layer for convolution after stacking accumulated processing;
- the output of the last second convolutional layer is used as the intermediate feature map.
- the number of the residual blocks may be 3 to 5
- the number of the second convolutional layers may be 3 to 5.
- the residual block may include a grouped convolutional layer, a third convolutional layer and a fourth convolutional layer, the grouped convolutional layer adopts a ReLu activation function, the grouped convolutional layer and The third convolutional layer is connected to form a residual path, and the residual block can adopt a local skip connection structure;
- the step of using the residual block to learn residual features to obtain a residual feature map may include:
- the input of the residual block is used as the input of the grouped convolution layer, and features are extracted through the residual path;
- Feature fusion is performed between the input of the residual block and the output of the third convolution layer, and after fusion, the input is input to the fourth convolution layer for convolution processing, and the residual feature map is output.
- the step of using the sub-pixel convolution layer of the image reconstruction model to amplify the reconstructed feature map to obtain a reconstructed image may include:
- inventions of the present application also provide an image reconstruction model training method, the method may include:
- training samples include low-resolution images and high-resolution images, and the low-resolution images are obtained by down-sampling the high-resolution images;
- the image reconstruction model includes a feature extraction network and a sub-pixel convolution layer;
- Back-propagation training is performed on the image reconstruction model based on the training reconstructed image, the high-resolution image and the preset objective function to obtain a trained image reconstruction model.
- the objective function may be an L2 loss function
- Back-propagation training is performed on the image reconstruction model based on the training reconstructed image, the high-resolution image and the L2 loss function to adjust the parameters of the image reconstruction model until the preset training is completed condition to obtain the image reconstruction model after training.
- the image reconstruction model training method may further include:
- the trained image reconstruction model is pruned to preserve long-line cascades and delete short-line cascades.
- the method may further include:
- a self-reducing average is performed on the low-resolution image to highlight texture details of the low-resolution image.
- the method may further include:
- the step of inputting the low-resolution image into a pre-built image reconstruction model may include:
- the step of using the feature extraction network to perform multi-scale feature extraction on the low-resolution image to obtain a training feature map may include:
- Still other embodiments of the present application also provide an image processing apparatus, and the apparatus may include:
- an image acquisition module which can be configured to acquire an image to be processed
- the first execution module can be configured to input the image to be processed into an image reconstruction model, and use the feature extraction network of the image reconstruction model to perform multi-scale feature extraction on the to-be-processed image and expand image channels to obtain a reconstruction feature map;
- the second execution module may be configured to use the sub-pixel convolution layer of the image reconstruction model to amplify the reconstructed feature map to obtain a reconstructed image.
- Still other embodiments of the present application also provide an image reconstruction model training apparatus, the apparatus may include:
- a sample acquisition module which can be configured to acquire training samples, where the training samples include low-resolution images and high-resolution images, and the low-resolution images are obtained by down-sampling the high-resolution images;
- a first processing module which can be configured to input the low-resolution image into a pre-built image reconstruction model, the image reconstruction model including a feature extraction network and a sub-pixel convolution layer;
- the second processing module can be configured to use the feature extraction network to perform multi-scale feature extraction on the low-resolution image and expand image channels to obtain a training feature map;
- a third processing module may be configured to use the sub-pixel convolutional layer to amplify the training feature map to obtain a training reconstructed image
- the fourth processing module may be configured to perform back-propagation training on the image reconstruction model based on the training reconstructed image, the high-resolution image and a preset objective function to obtain a trained image reconstruction model.
- an image processing and model training method, device, electronic device, and storage medium provided by the embodiments of the present application are obtained by acquiring an image to be processed and inputting an image reconstruction model.
- the image reconstruction model includes a feature extraction network and sub-pixel convolution.
- the feature extraction network is used to extract the multi-scale feature of the image to be processed and expand the image channel to obtain the reconstructed feature map, and then use the sub-pixel convolution layer to amplify the reconstructed feature map to obtain the reconstructed image. Since the feature extraction network can extract multi-scale features and expand image channels, it is possible to obtain a better reconstruction effect without increasing the depth of the network.
- Image processing greatly reduces the amount of calculation and parameters; thus improving the processing speed while ensuring the reconstruction effect.
- Some embodiments of the present application provide a method for super-resolution reconstruction of a portrait, the method may include:
- the super-resolution reconstruction process is performed using the image processing method described above.
- the key point detection, super-resolution reconstruction processing and restoration processing may include multiple rounds of iterative processing, and the to-be-processed image is an unprocessed to-be-processed image, or an image that has been processed in a previous round of iterations.
- the super-resolution image obtained after the key point detection, super-resolution reconstruction processing and restoration processing.
- the face key points may include multiple, and the image to be processed is restored by using the high-frequency information of the image to obtain a super-resolution image corresponding to the to-be-processed image steps that can include:
- restoration processing is performed on the to-be-processed image to obtain a super-resolution image corresponding to the to-be-processed image.
- performing restoration processing on the to-be-processed image based on the position information of each of the face key points and the high-frequency information of the image to obtain a super-resolution corresponding to the to-be-processed image Image steps can include:
- restoration processing is performed on the corresponding face key points in the to-be-processed image.
- the reconstructed model may include a discriminator and a generation network, and the generation network is obtained after training with training samples under the supervision of the trained discriminator.
- the face key points may include the contours of the left eye, the right eye, the nose, the mouth and the chin.
- inventions of the present application provide a method for training a portrait super-resolution reconstruction model, the method may include:
- the training continues until a reconstructed model is obtained when a first preset condition is satisfied.
- the output image and the target sample are compared, and the generation network is adjusted based on the comparison result after network parameters are adjusted, and the training is continued until the reconstruction is obtained when a first preset condition is satisfied.
- the steps of the model can include:
- the reconstruction model further includes a discriminator, and the discriminator is used to supervise the training of the generation network, and the method may further include:
- the parameters of the discriminator are adjusted until the trained discriminator is obtained when the second preset condition is satisfied.
- the output image and the target sample are compared, and the generation network is adjusted based on the comparison result after network parameters are adjusted, and the training is continued until the reconstruction is obtained when a first preset condition is satisfied.
- the steps of the model can include:
- the step of performing network parameter adjustment on the generation network according to the discrimination information and the comparison result and continuing to train until the reconstructed model is obtained when a first preset condition is satisfied may include: :
- a third loss function is constructed based on the discriminant information of the output image by the discriminator, and a fourth loss function is constructed based on the image difference between the output image and the target sample obtained by the pre-built portrait cognitive model;
- the reconstructed model is obtained when the function value satisfies the first preset condition.
- the device may include:
- the detection module can be configured to use the pre-built reconstruction model to perform key point detection on the image to be processed to obtain face key points;
- a processing module which can be configured to perform super-resolution reconstruction processing according to the face key points and image features obtained based on the to-be-processed image to obtain high-frequency image information
- the restoration module may be configured to perform restoration processing on the to-be-processed image by using the high-frequency information of the image to obtain a super-resolution image corresponding to the to-be-processed image.
- the apparatus for super-resolution reconstruction of a human portrait may further include: according to the above-mentioned image processing apparatus, the image processing apparatus may be configured to perform super-resolution reconstruction processing.
- Still other embodiments of the present application provide a human portrait super-resolution reconstruction model training device, and the human portrait super-resolution reconstruction device may include:
- an acquisition module which can be configured to acquire training samples and target samples corresponding to the training samples
- a key point obtaining module which can be configured to perform key point detection on the training sample by using the constructed generation network to obtain training key points
- an output image obtaining module which can be configured to perform super-resolution reconstruction processing and restoration processing based on the training key points and the training samples to obtain an output image
- a training module which can be configured to compare the output image and the target sample, and adjust the network parameters of the generation network based on the comparison result and continue training until a reconstructed model is obtained when the first preset condition is satisfied .
- the electronic device may include: one or more processors; one or more storage media for storing one or more machine-executable instructions, when the One or more machine-executable instructions, when executed by the one or more processors, cause the one or more processors to implement the image processing method according to some embodiments, or the image processing method according to other embodiments
- Still other embodiments of the present application provide a computer-readable storage medium storing machine-executable instructions that, when executed, implement image processing according to some embodiments method, or the image reconstruction model training method according to other embodiments, or the portrait super-resolution reconstruction method according to still other embodiments, or the portrait super-resolution reconstruction model training method according to still some embodiments .
- the key points of the image to be processed are detected by using the pre-built reconstruction model to obtain the key points of the face, and then according to the face
- the key points and the image features obtained based on the image to be processed are subjected to super-resolution reconstruction processing to obtain high-frequency information of the image.
- the super-resolution reconstruction of the image is realized by combining the detection of the key points of the face and the restoration of the face, and the recognition of the obtained super-resolution image is improved, which meets the needs of users in practical applications.
- FIG. 1 shows an application scenario diagram of the image processing method provided by the embodiment of the present application.
- FIG. 2 shows a schematic flowchart of an image processing method provided by an embodiment of the present application.
- FIG. 3 shows an example diagram of an image reconstruction model provided by an embodiment of the present application.
- FIG. 4 shows an example diagram of a cascaded block provided by an embodiment of the present application.
- FIG. 5 shows an example diagram of a residual block provided by an embodiment of the present application.
- FIG. 6 shows another example diagram of an image reconstruction model provided by an embodiment of the present application.
- FIG. 7 shows an image processing result presentation diagram provided by an embodiment of the present application.
- FIG. 8 is a flowchart of a method for super-resolution reconstruction of a portrait provided by an embodiment of the present application.
- FIG. 9 is a schematic diagram of a processing flow of a method for super-resolution reconstruction of a portrait provided by an embodiment of the present application.
- FIG. 10 is another schematic diagram of a processing flow of the method for super-resolution reconstruction of a portrait provided by an embodiment of the present application.
- FIG. 11 is a flowchart of a method for obtaining a super-resolution image in the method for super-resolution reconstruction of a portrait provided by an embodiment of the present application.
- FIG. 12 is another schematic diagram of a processing flow of the method for super-resolution reconstruction of a portrait provided by an embodiment of the present application.
- FIG. 13 shows a schematic flowchart of an image reconstruction model training method provided by an embodiment of the present application.
- FIG. 14 is a flowchart of a method for training a portrait super-resolution reconstruction model provided by an embodiment of the present application.
- FIG. 15 is one of the flowcharts of a method for obtaining a reconstructed model in the method for training a super-resolution reconstruction model of a portrait provided by an embodiment of the present application.
- FIG. 16 is the second flowchart of a method for obtaining a reconstructed model in the method for training a super-resolution reconstruction model of a portrait provided by an embodiment of the present application.
- 17(a) to 17(c) are schematic diagrams of output images obtained by the interpolation processing method, the method without adding the discriminator, and the method adding the discriminator, respectively.
- FIG. 18 shows a schematic block diagram of an image processing apparatus provided by an embodiment of the present application.
- FIG. 19 shows a schematic block diagram of an apparatus for training an image reconstruction model provided by an embodiment of the present application.
- FIG. 20 shows a schematic block diagram of an electronic device provided by an embodiment of the present application.
- FIG. 21 is a structural block diagram of an electronic device provided by an embodiment of the present application.
- FIG. 22 is a block diagram of functional modules of an apparatus for super-resolution reconstruction of a portrait provided by an embodiment of the present application.
- FIG. 23 is a block diagram of functional modules of an apparatus for training a super-resolution reconstruction model of a portrait provided by an embodiment of the present application.
- Icons 10-electronic equipment; 11-processor; 12-memory; 13-bus; 20-first terminal; 30-second terminal; 40-network; 50-server; 100-image processing device; 110-image acquisition module; 120-first execution module; 130-second execution module; 200-model training device; 210-sample acquisition module; 220-first processing module; 230-second processing module; 240-third processing module; 250 - Fourth processing module.
- 2110-storage medium 2120-processor; 2130-machine executable instructions; 131-portrait super-resolution reconstruction device; 1311-detection module; 1312-processing module; 1313-restoration module; 132-model training device; 1321-acquisition module; 1322-key point acquisition module; 1323-output image acquisition module; 1324-training module; 140-communication interface.
- FIG. 1 shows an application scenario diagram of the image processing method provided by the embodiment of the present application, including a first terminal 20 , a second terminal 30 , a network 40 and a server 50 , the first terminal 20 and the second terminal 20 .
- the terminals 30 are each connected to the server 50 through the network 40 .
- the first terminal 20 and the second terminal 30 may be mobile terminals, and various application programs (Application, App) may be installed on the mobile terminals, for example, a video playing App, an instant messaging App, a video/image capturing App, and a shopping App. Wait.
- the network 40 may be a wide area network or a local area network, or a combination of the two, using a wireless link for data transmission.
- the first terminal 20 and the second terminal 30 may be any mobile terminals having a screen display function, for example, a smart phone, a notebook computer, a tablet computer, a desktop computer, a smart TV, and the like.
- the first terminal 20 may upload the video file or picture to the server 50, and the server 50 may store the video file or picture after receiving the video file or picture uploaded by the first terminal 20.
- the second terminal 30 can request the video file or picture from the server 50 , and the server 50 can return the video file or picture to the second terminal 30 .
- the video file or picture will be compressed, so the resolution of the video file or picture is lower.
- the second terminal 30 After receiving the video file or picture, the second terminal 30 can perform real-time processing on the video file or picture by using the image processing method provided in the embodiment of the present application to obtain a high-resolution video or picture, and display it on the second terminal 30 in the display interface to improve the user's picture quality experience.
- the image processing method provided by the embodiment of the present application may be integrated into a video playback App or a gallery App of the second terminal 30 as a functional plug-in.
- the first terminal 20 may be the mobile terminal of the host, and the second terminal 30 may be the mobile terminal of the viewer.
- the first terminal 20 can upload the live video to the server 50, and the server 50 can store the live video.
- the server 50 can return the live broadcast to the second terminal 30. video.
- the second terminal 30 can process the live video in real time by using the image processing method provided in the embodiment of the present application to obtain a high-resolution live video and display it, so that the audience can watch the live video clearly. Live video.
- the image processing method provided in this embodiment of the present application can be applied to a mobile terminal.
- the above description is given by taking the application to the second terminal 30 as an example, it should be understood that the image processing method can also be applied to the first terminal. 20.
- the specific value can be determined according to the actual application scenario, which is not limited here.
- FIG. 2 shows a schematic flowchart of an image processing method provided by an embodiment of the present application.
- the image processing method may include the following steps:
- the image to be processed may be a picture displayed on the mobile terminal that needs to be reconstructed by super-resolution to improve the image quality or a video frame in a video stream, for example, it may be a low-resolution video obtained by the second terminal 30 from the server 50. video frame.
- the mobile terminal can directly perform super-resolution reconstruction when receiving a low-resolution picture or a low-resolution video file; it can also display a low-resolution picture or a low-resolution video file first after receiving Display the interface, wait until the user performs the resolution switching operation, and then perform the super-resolution reconstruction. For example, when a low-resolution video is received, play it first, and when the user switches the resolution from "standard definition" to "ultra-definition", then Perform super-resolution reconstruction.
- S102 input the image to be processed into an image reconstruction model, and use a feature extraction network of the image reconstruction model to perform multi-scale feature extraction on the image to be processed and expand image channels to obtain a reconstructed feature map.
- the to-be-processed image is input into the image reconstruction model for super-resolution reconstruction.
- the image reconstruction model includes a feature extraction network and a sub-pixel convolution layer.
- the feature extraction network is used to extract multi-scale features of the image to be processed and expand the image channel, and the sub-pixel convolution layer is used to reconstruct the output of the feature extraction network.
- the feature map is zoomed in.
- Multi-scale feature extraction refers to extracting feature information at different levels by means of global cascade and local cascade.
- feature extraction can be performed step by step from the bottom layer to the high layer, or the bottom layer information can be directly transferred to the high layer.
- An image channel refers to one or more color channels after an image is divided according to color components.
- an image can be divided into a single-channel image, a three-channel image and a four-channel image according to the image channel.
- a single-channel image means that each pixel in the image is represented by only one value, such as a grayscale image;
- a three-channel image means that each pixel in the image is represented by three values, such as an RGB color image; four-channel image The image is based on the three-channel image plus transparency, Alpha color space, etc.
- Extending image channels means increasing the number of channels in the image without changing the size of the image.
- the input is an image of H ⁇ W ⁇ C, where H ⁇ W is the size of the input image and C is the number of channels of the input image;
- the output is an image of H ⁇ W ⁇ r 2 C, where H ⁇ W is the size of the output image size, r 2 C is the number of channels of the output image.
- the sub-pixel convolution layer also known as PixelShuffle, is a convolutional layer that can be computed efficiently. Get high-resolution feature maps. Compared to artificial boosting filters such as bilinear or bicubic samplers, subpixel convolutional layers can be trained to learn more complex boosting operations, while the overall computation time is reduced.
- the main function of the sub-pixel convolution layer is to combine the feature maps of r 2 channels into a new r ⁇ H, r ⁇ W upsampling result, namely (r ⁇ H) ⁇ (r ⁇ W) ⁇ C, the output image of rH ⁇ rW ⁇ C is obtained, and the r-fold enlargement of the input feature map to the output image is completed.
- the working process of the sub-pixel convolutional layer can be as follows: firstly, the original low-resolution pixel is divided into r ⁇ r small grids; then according to certain rules, the values of the corresponding positions of the r ⁇ r input feature maps are used to fill these small grids ; The recombination process is completed by filling the small grids divided by each low-resolution pixel in the same way.
- a sub-pixel convolutional layer may be used to adjust pixel positions in the reconstructed feature map to obtain a reconstructed image.
- the reconstructed feature map output by the feature extraction network is H ⁇ W ⁇ r 2 C
- the sub-pixel convolution layer is used to adjust the pixel position to obtain a reconstructed image of rH ⁇ rW ⁇ C, and then complete the r-fold magnification.
- the sub-pixel convolutional layer can support multiple magnification sizes.
- a 4-times magnification operation can be accomplished with a combination of 2-times sub-pixel convolutional layers, or a 2-times and 3-times sub-pixel volume
- the layered combination completes a 6x magnification operation.
- the existing super-resolution reconstruction algorithm is to first interpolate to high resolution and then make corrections, while the image reconstruction model in the embodiment of the present application is designed to enlarge the sub-pixel convolution layer at the end, which can ensure the characteristics of the front part of the model.
- the extraction network processes small-sized images, which greatly reduces the amount of computation and parameters.
- Step S102 will be described in detail below.
- the feature extraction network includes convolutional layers, multiple concatenated blocks and multiple first convolutional layers, multiple concatenated blocks and multiple first convolutional layers are alternately arranged, and the feature extraction network adopts a global level link structure.
- the global cascade structure refers to the left fast channel and the right fast channel in Figure 3.
- the output of the cascaded block can be directly sent to each first convolutional layer after the cascaded block through the left fast channel.
- the side fast channel can feed the output of the convolutional layers directly to each first convolutional layer.
- the transport here refers to the superposition of channels, not the addition of data.
- the feature extraction network of the image reconstruction model is used to perform multi-scale feature extraction on the image to be processed and expand the image channel to obtain the reconstructed feature map, which may include:
- the multi-scale feature extraction is performed by using the concatenated block, and the output intermediate feature map;
- the convolutional layer and the first convolutional layer can expand the image channel, and the convolutional layer, concatenated block and the first convolutional layer can extract features.
- the channel stacking of the initial feature map and the intermediate feature map refers to combining the channels of the initial feature map and the channels of the intermediate feature map.
- the initial feature map has 4 channels and the intermediate feature map has 8 channels.
- the superimposed feature map has 12 channels; in other words, each pixel in the initial feature map is represented by 4 values, and each pixel in the intermediate feature map is represented by 8 values.
- the feature map after channel stacking Each pixel is represented by 12 values.
- the structure of the concatenated block is shown in FIG. 4 , the concatenated block includes multiple residual blocks and multiple second convolution layers, and multiple residual blocks and multiple second convolution layers are alternately arranged , the cascade block adopts a local cascade structure.
- the local cascade structure refers to the left fast channel and the right fast channel in Figure 4.
- the output of the residual block can be directly sent to each second convolution layer after the residual block through the left fast channel, and the The side fast channel can feed the input of the concatenated block directly to each second convolutional layer.
- the transmission here refers to the superposition of channels, not the addition of data.
- the multi-scale feature extraction is performed by using the cascaded block, and the way of outputting the intermediate feature map may include:
- the input of the concatenated block and the output of each residual block before the Nth second convolutional layer are channel-superposed, and after the superposition is inputted into the Nth second convolutional layer for convolution processing;
- the second convolutional layer can expand the image channel, and the residual block and the second convolutional layer can extract features.
- the process of channel stacking the input of the cascaded block and the output of the residual block is similar to the above-mentioned process of channel stacking the initial feature map and the intermediate feature map, and will not be repeated here.
- the structure of the residual block is shown in Figure 5.
- the residual block may include a grouped convolutional layer, a third convolutional layer and a fourth convolutional layer.
- the grouped convolutional layer adopts the ReLu activation function, and the grouped convolutional layer
- the convolutional layer and the third convolutional layer are connected to form a residual path, and the residual block adopts a local skip connection structure.
- the local skip connection structure refers to the fusion of the input of the residual block and the output of the residual path to learn residual features.
- the residual feature is learned by using the residual block to obtain the residual feature map, which may include:
- the input of the residual block and the output of the third convolution layer are feature fusion, and after fusion, the input is input to the fourth convolution layer for convolution processing, and the residual feature map is output.
- the third convolutional layer and the fourth convolutional layer can expand the image channel, and the grouped convolutional layer can extract features.
- the Group Convolution layer can group the input feature maps. Each group is then convolved separately. Compared with regular convolution, grouped convolution can reduce model parameters, thereby increasing the processing speed of the model.
- the number of layers of a plurality of grouped convolutional layers and the number of groups of each grouped convolutional layer to the input feature map can be flexibly selected by the user according to actual needs.
- the number of layers of the grouped convolutional layers is 2, the number of groups is 3 and so on.
- the types of the convolutional layer, the first convolutional layer, the second convolutional layer, the third convolutional layer, and the fourth convolutional layer in this embodiment are not limited. ⁇ 1 point convolution, depth convolution, etc., can be flexibly adjusted according to actual needs.
- the expressiveness of an image reconstruction model increases with the complexity of the global cascade or the local cascade, that is, the greater the number of cascaded blocks and the first convolutional layer in the feature extraction network, or the number of cascaded blocks in the The greater the number of residual blocks and second convolutional layers, the more expressive the image reconstruction model will be.
- the more complex the network structure the slower the calculation speed. Therefore, in order to improve the processing speed while ensuring the reconstruction effect, the number of each module should not be too large.
- the number of concatenated blocks and the first convolutional layer in the feature extraction network can both be 3 to 5, and the number of residual blocks and the second convolutional layer in the concatenated block can both be 3 to 5,
- the number of grouped convolutional layers in the residual block can be 2 to 4.
- the feature extraction network can be set to include 3 concatenated blocks and 3 first convolutional layers, the concatenated block includes 3 residual blocks and 3 second convolutional layers, and the residual block includes 2 Layer grouping convolutional layers.
- the shared parameters of each module in the cascaded block can be set, that is, the shared parameters of multiple residual blocks and the shared parameters of multiple second convolution layers, so that the image reconstruction model can be further lightened and the processing speed can be improved.
- the image reconstruction model can be further lightened and the processing speed can be improved.
- the left picture and the middle picture are reconstructed images obtained by adopting the image processing method provided by the embodiment of the present application, the left picture does not share parameters, and the middle picture shares parameters; the right picture shows the use of bicubic interpolation (Bicubic ) algorithm to obtain the reconstructed image.
- Bicubic bicubic interpolation
- FIG. 8 shows a schematic flowchart of a method for super-resolution reconstruction of a portrait.
- the detailed steps of the portrait super-resolution reconstruction method are introduced as follows.
- Step S110 using a pre-built reconstruction model to perform key point detection on the image to be processed to obtain face key points.
- Step S120 Perform super-resolution reconstruction processing according to the face key points and the image features obtained based on the to-be-processed image to obtain image high-frequency information.
- Step S130 performing restoration processing on the to-be-processed image by using the high-frequency information of the image to obtain a super-resolution image corresponding to the to-be-processed image.
- the target image to be processed may be an image with low definition, such images often have low definition of the face, and for example, image recognition, image matching, etc. problems that create obstacles.
- the image to be processed may be a face image collected by a monitoring device, or a face image obtained by taking a screenshot of a web page, or a host's face image collected during a live broadcast, and so on.
- the constructed reconstruction model may be used to first perform key point detection on the image to be processed to obtain the face key points.
- the obtained face key points may include left eye, right eye, nose, mouth and chin contour. Based on these face key points, the outline of the face can be roughly outlined, and the most important part of the face for human eye cognition among the key points of the face is included.
- the key points of the face are obtained by means of key point detection; Combining the face key points and the obtained image features for super-resolution reconstruction processing, the high-frequency information of the image is obtained.
- the high-frequency information of the image mainly embodies the information at some edges and contours in the image, while the part of the slowly changing grayscale within the contour is the low-frequency information.
- the high-frequency information of the image can reflect the information of the relative change area, so it is very important for the reconstruction of the image.
- the image processing method according to the present application described in conjunction with FIG. 2 may be used to perform super-resolution reconstruction processing of the image, so as to obtain the high-frequency information of the image.
- the image high-frequency information obtained in the exemplary embodiment of the present application is local information in the face image, and it is necessary to restore the image high-frequency information to the to-be-processed image, so as to perform restoration processing on the to-be-processed image, and obtain the super-high-frequency information corresponding to the to-be-processed image. resolution image.
- the super-resolution reconstruction method of a portrait detects the key points of the face, obtains the high-frequency image information by using the key points of the face and the image features of the image, and then uses the high-frequency information of the image to restore the image to be processed,
- the recognition of the obtained super-resolution image can be improved, and it meets the needs of users in practical applications.
- the above-mentioned key point detection, super-resolution reconstruction processing and restoration processing may include multiple rounds of iterative processing.
- the above image to be processed may be an unprocessed image to be processed, or a super-resolution image obtained after the key point detection, super-resolution reconstruction processing and restoration processing in the previous iteration.
- the super-resolution after this round of iterative processing can be obtained after the above-mentioned key point detection, super-resolution reconstruction processing and restoration processing Image SRFace (Super Resolution Face). Then, on the basis of the obtained super-resolution image, the above-mentioned key point detection, super-resolution reconstruction processing and restoration processing are performed to obtain the super-resolution image after the second round of iteration. According to the processing logic, the final super-resolution image is obtained when certain requirements are met after multiple iterations.
- the image high-frequency information can be obtained , and then obtain the first-round super-resolution image Face SR1 according to the high-frequency information of the image and Input.
- key points are detected for Face SR1, and the corresponding face key points Face Points 1 are obtained.
- the image high-frequency information is obtained, and then according to the image high-frequency information and Face SR1 Get the second-round super-resolution image Face SR2.
- the final super-resolution image Face SR N can be obtained after N iterations (N is the preset number of iteration stops or the image obtained after N iterations meets the preset requirements).
- the image obtained by the previous round of processing is used as the detection object to perform multiple loop processing in a recursive manner, which can continuously improve the quality of the obtained super-resolution image.
- model parameters in multiple loops can be shared, thereby making the model more lightweight and providing support for applying the model to devices with weak processing capabilities, such as mobile terminals.
- the network width can be preferentially increased within a certain range, that is, the number of feature extraction channels, instead of focusing on the network depth, that is, the number of network layers, combined with the use of recursive processing methods, the recognition accuracy of the model can be improved.
- the detected face key points include a plurality of face key points, and the image high-frequency information is obtained based on the face key points and image features to restore the image to be processed.
- the above-mentioned restoration processing can be implemented by the following steps:
- step S131 the image to be processed is processed by using a pre-built portrait cognitive model, and the position information of each of the key points of the face is output.
- Step S132 performing restoration processing on the corresponding face key points in the to-be-processed image according to each of the face key points and their corresponding position information and image high-frequency information to obtain the super-resolution corresponding to the to-be-processed image image.
- a neural network model can be constructed, and the neural network model can be, for example, a convolutional neural network model (Convolutional Neural Networks, CNN) or the like.
- CNN convolutional Neural Networks
- Multiple training samples can be collected, wherein each training sample contains a face image, and the face key points in each face image carry position information, and the position information can be the position of each face key point in the face area.
- the face area can also be mapped into the coordinate system, and the coordinate value of the key point of the face in the coordinate system is used as its position information.
- the training samples to train the constructed neural network model to obtain a portrait cognitive model that meets the requirements can be identified and obtained by using the face recognition model.
- the key points of the face in the LR Face of the image to be processed such as the left eye, right eye, nose, mouth, and chin contour, can be
- the position information of the obtained face key points and the high-frequency information of the corresponding face key points contained in the high-frequency information of the image are restored and processed to obtain the final super-resolution image SR Face.
- the method of obtaining the position information of each face key point by adopting the portrait cognitive model can accurately process the processing based on the position of each face key point during restoration.
- the corresponding position in the image is restored to avoid the phenomenon of restoration and displacement of the corresponding key points of the face.
- the specific restoration requirements of different face key points are often different during restoration. For example, for the eyes, it is hoped that the restored eyes can be brighter, while for the chin contour, it may be desirable to restore the processed eyes.
- the chin contour is more defined.
- the restoration attribute corresponding to each face key point may be obtained first, and the restoration attribute is the value of the restoration process described above. Different request information. Then, according to the position information of each face key point, the restoration attribute and the high frequency information of the image, the restoration processing is performed on the image to be processed, and the corresponding super-resolution image is obtained.
- the face key points can be independently restored by distinguishing each face key point and based on its corresponding position information and restoration attributes, which can not only satisfy different needs
- the specific requirements for the restoration of key points of the face, and the reconstruction model can also be processed synchronously based on the group convolution method, which can greatly reduce the processing time.
- the super-resolution reconstruction process is implemented by using a reconstruction model constructed and trained in advance.
- the model training method provided in the embodiments of the present application can be applied to any electronic device with an image processing function, for example, a server, a mobile terminal, a general-purpose computer, or a special-purpose computer.
- FIG. 13 shows a schematic flowchart of an image reconstruction model training method provided by an embodiment of the present application.
- the model training method may include the following steps:
- training samples include low-resolution images and high-resolution images, and the low-resolution images are obtained by down-sampling the high-resolution images.
- the training sample here is a dataset, which can obtain a large number of high-resolution images (for example, the resolution is higher than a certain preset value) as the original sample, and these high-resolution images can be various types of pictures or videos.
- the video frame for example, may be a high-definition live video in a live video scene, and the like.
- down-sampling is performed on the original samples, that is, down-sampling is performed on each high-resolution image according to the same method to obtain training samples.
- the way of downsampling processing can be bicubic interpolation or the like.
- S202 Input the low-resolution image into a pre-built image reconstruction model, where the image reconstruction model includes a feature extraction network and a sub-pixel convolution layer.
- steps S203-S204 are similar to the processing procedures of steps S102-S103, and are not repeated here.
- the objective function may be an L2 loss function, which is also called a mean square error (Mean Square Error, MSE) function, which is a type of regression loss function.
- MSE mean square error
- the curve of the L2 loss function is smooth, continuous, and derivable everywhere, which is convenient for using the gradient descent algorithm; and as the error decreases, the gradient is also decreasing, which is conducive to convergence, even if a fixed learning rate is used, it can be faster. converge to the minimum value.
- back-propagation training can be performed on the image reconstruction model based on the training reconstructed image, the high-resolution image, and the L2 loss function, so as to adjust the parameters of the image reconstruction model until the preset training completion condition is reached, and the result is obtained
- the trained image reconstructs the model.
- the training completion condition can be that the number of iterations reaches a set value (for example, 2000 times), or the L2 loss function converges to a minimum value, etc., which is not limited here and can be set according to actual needs.
- the trained image reconstruction model can be pruned according to the requirements and test results, and the long-line cascades are retained and deleted. Cascading short lines, thereby reducing excessive jumps in the middle, making the model more lightweight.
- the low-resolution image can be pre-processed first, and then the image reconstruction model can be input after the pre-processing, and the pre-processing can be the self-subtraction of the image. Therefore, before step S202, the model training method may further include:
- the low-resolution image is self-reduced to highlight the texture details of the low-resolution image.
- the self-subtracting average processing can be performed without processing the foreground in the image, but subtracting the pixel average value of the background image from each pixel in the background, thereby enhancing the contrast between the background part and the foreground part and highlighting the texture details.
- the preprocessing in order to extract more features from the feature extraction network, can also be performed on the image by flipping the symmetry operation and then inputting the model, and then performing the reverse flip symmetry on the output result of the model and calculating the average value, thereby Reduce the deviation of some feature layers or parameters caused by anisotropy. Therefore, before step S202, the model training method may further include:
- At least one processed low-resolution image is input into the image reconstruction model, and the feature extraction network is used to perform multi-scale feature extraction on the at least one processed low-resolution image to obtain at least one auxiliary feature map; Perform reverse-flip symmetry processing, and average values after reverse-flip-symmetric processing to obtain training feature maps.
- n ⁇ n flip it 3 times in the clockwise direction, 90° each time, so that 4 images of n ⁇ n can be obtained; then the 4 images of n ⁇ n are input into the image reconstruction model , the feature extraction network outputs 4 auxiliary feature maps; then flip the corresponding 3 auxiliary feature maps by 90°, 180° and 270° in the counterclockwise direction; and then perform pixel averaging on the processed 4 auxiliary feature maps to obtain The final training feature map.
- the low-resolution image can be subjected to self-reducing average processing first, and then the low-resolution image can be flipped symmetrically; Self-decreasing average processing. It can be flexibly set according to actual needs, and is not limited here.
- a new model can be trained on the basis of the completed model. For example, when training 3x and 4x magnification models, assuming that the 2x magnification model has been trained, the parameters of the 2x magnification model can be used as the initial parameters of the 3x and 4x magnification models. Here based on training.
- FIG. 14 shows A schematic flowchart of a method for training a portrait super-resolution reconstruction model provided in an embodiment of the present application.
- the method for training a portrait super-resolution reconstruction model includes:
- Step S2100 acquiring training samples and target samples corresponding to the training samples
- Step S2200 using the constructed generation network to perform key point detection on the training sample to obtain training key points;
- Step S2300 performing super-resolution reconstruction processing and restoration processing based on the training key points and the training samples to obtain an output image
- Step S2400 compare the output image with the target sample, and adjust the network parameters of the generating network based on the comparison result, and then continue training until a reconstructed model is obtained when a first preset condition is satisfied.
- the reconstruction of the obtained reconstruction model can be improved. Accuracy.
- a plurality of training samples are collected in advance, and each training sample may be a sample image including a face image with lower definition.
- the target sample corresponding to the training sample is the one that meets the requirements, that is, the high-definition sample expected to be obtained after processing the training sample.
- the pre-built generative network may be a recurrent recurrent network, and the process of using the generative network to perform key point detection, super-resolution reconstruction processing and restoration processing on training samples can be referred to the above description.
- the generative network After processing, the generative network can output the output images corresponding to the training samples.
- the target sample is used as a comparison standard for the processing quality of the generation network.
- the generation network can be continuously trained according to the comparison result, so that the difference between the output image and the target sample is reduced to meet the requirements.
- the reconstructed model is obtained.
- the samples may be preprocessed, for example, by means of self-reduction, so as to bring out the details of the image texture, so as to improve the effect of subsequent processing and recognition.
- the preprocessed samples can also be inverted symmetrically and then input to the generation network.
- the output results of each network layer of the generation network can be reversed and symmetrically averaged. In this way, we can Reduce the deviation of some network layers or parameters caused by anisotropy.
- the network can be pruned according to the requirements and the results of the test to retain several previous cycles that have a greater impact on the results.
- the training to improve the reconstruction accuracy of the resulting generative network, and the peak signal-to-noise ratio and structural similarity of the subsequently processed images can also be greatly improved.
- a loss function may be constructed to detect training of the generative network.
- step S2400 of the method for training a portrait super-resolution reconstruction model of the present application it can be implemented in the following ways:
- Step S2410 constructing a first loss function based on the difference between the pixel information of the output image and the pixel information of the target sample;
- Step S2420 based on the difference between each face key point in the output image and the corresponding face key point in the target sample, construct a second loss function
- Step S2430 compare the output image and the target sample, and adjust the network parameters of the generating network based on the comparison result and continue training until the weighted first loss function and the second loss function are obtained.
- the reconstructed model is obtained when the function value satisfies the first preset condition.
- the first loss function and the second loss function may be constructed to comprehensively evaluate the training of the generative network.
- the first loss function is evaluated from the perspective of pixel differences between images.
- a second loss function constructed with the difference information between face key points is added. .
- the first loss function represents the overall pixel-level Euclidean distance between the output image of the generation network and the target sample (that is, the desired output effect)
- the second loss function represents the face key point detection of the generation network. The Euclidean distance between the face key points and the corresponding face key points in the target sample (desired output effect).
- the above-mentioned first loss function and second loss function are weighted and combined to jointly serve as the loss function of the generating function.
- the function value of the comprehensive loss function including the first loss function and the second loss function is calculated by comparing the output image and the target sample.
- the reconstructed model is obtained when the obtained function value satisfies the first preset condition.
- the first preset condition may be that the value of the loss function no longer decreases to achieve convergence, or that the value of the loss function is lower than a preset value.
- the training can be stopped to obtain a reconstructed model.
- the first loss function constructed based on the difference of pixel information and the second loss function based on the difference between the key points of the face are used to perform the training supervision and judgment of the reconstructed model, which can improve the subsequent application of the reconstructed model to improve the performance of the reconstruction model.
- the recognition of the obtained super-resolution image can be improved by applying the above-mentioned pre-built reconstruction model obtained by the generative network to the above-mentioned reconstruction of the to-be-processed image.
- the reconstruction model in the method for training a portrait super-resolution reconstruction model includes a generation network, which is constructed for pre-training and can process low-resolution images to output corresponding images. model of super-resolution images.
- the reconstruction model may further include a discriminator, and the discriminator may be used to supervise the training of the generation network. Therefore, in this embodiment, the generation network is a generation network obtained after training with training samples under the supervision of the trained discriminator.
- the method for training a portrait super-resolution reconstruction model according to the present application further comprises the following steps:
- the parameters of the discriminator are adjusted until the trained discriminator is obtained when the second preset condition is satisfied.
- the main realization principle of the discriminator is to discriminate a real image (that is, a high-resolution image that meets the requirements) as real as possible (for example, output a discriminant result of 1), and generate a network
- the output image of the generator can be judged as false as much as possible (for example, the output discrimination result is 0), so that the generator network can be supervised for continuous training, and finally the discriminator can judge the output image of the generator network as true. That is, the discriminator acts as the supervisor of the generative network to continuously optimize the training of the generative network.
- a loss function of the discriminator may be pre-built, and the loss function may be composed of discriminant information of the output image of the generation network and discriminant information of the target sample by the discriminator.
- the training process of the discriminator is the process of minimizing the above-mentioned loss function.
- the value of the above-mentioned loss function no longer decreases to achieve convergence, it can be determined that the training of the discriminator satisfies the second preset condition, and the training can be obtained.
- the discriminator can be fixed.
- a discriminator is added to the reconstruction model to form an adversarial network including the discriminator and the generation network, which can further improve the reconstruction effect of the obtained reconstruction model.
- the training and adjustment of the generation network may include the relevant discriminant of the discriminator.
- step S2400 in the training method of the portrait super-resolution reconstruction model of the present application, the following sub-steps may be included:
- Step S2410' inputting the output image to the trained discriminator to obtain discriminant information
- Step S2420' compare the output image and the target sample to obtain a comparison result
- Step S2430' after adjusting the network parameters of the generating network according to the discrimination information and the comparison result, continue training until a reconstructed model is obtained when the first preset condition is satisfied.
- the difference between the output image and the target sample and the discriminator's discriminant information on the output image can be combined to adjust the training of the generation network.
- the construction of the loss function may be performed in the following manner, and the reconstruction model training is performed by using the constructed loss function:
- a third loss function is constructed based on the discriminant information of the output image by the discriminator, and a fourth loss function is constructed based on the image difference between the output image and the target sample obtained by the pre-built portrait cognitive model;
- the reconstructed model is obtained when the function value satisfies the first preset condition.
- the influence of the difference between the output image and the target sample on the adjustment of the generation network can be represented by the first loss function and the second loss function.
- the influence of the discriminator's discriminant information on the output image on the training adjustment of the generation network can be represented by the third loss function.
- a fourth loss function constructed from the image difference between the output image obtained by the portrait cognition model and the target sample can also be added.
- the above-mentioned first loss function is constructed based on the difference between the pixel information of the output image and the pixel information of the corresponding target sample, and the second loss function is determined by the corresponding key points of each face in the output image and the target sample. It is constructed by the difference between the key points of the face. Since the purpose of constructing the discriminator to supervise the training of the generation network is to make the output image obtained by the generation network finally judged to be true by the discriminator, the third loss function is constructed by the discriminator's discriminative information on the output image. The fourth loss function is constructed by the difference of facial features between the output image obtained by the portrait cognitive model and the target sample.
- the finally obtained loss function of the generating network can be obtained by weighted combination of the above-mentioned first loss function, second loss function, third loss function and fourth loss function.
- the network parameters can be adjusted for the generation network according to the discrimination information of the discriminator and the comparison result between the output image and the target sample, and then the training can be continued.
- the calculation process of the function value of the loss function after the above-mentioned combination is adjusted for training, until the function value weighted by the first loss function, the second loss function, the third loss function and the fourth loss function satisfies the first preset condition, the trained reconstruction model can be obtained.
- FIGS. 17( a ) to 17 ( c ) wherein, FIG. 17 ( a ) is an image obtained after conventional interpolation processing, and FIG. 17 ( b ) is the embodiment of the application without adding a discriminator.
- the obtained image, and FIG. 17( c ) is the image obtained under the implementation of adding a discriminator in this application.
- the image obtained under the solution of the present application has significantly higher definition and better effect than the conventional interpolation processing method.
- the image obtained by adding the discriminator is more clear in the human eye cognition than the image obtained without adding the discriminator.
- FIG. 18 shows a schematic block diagram of an image processing apparatus 100 provided by an embodiment of the present application.
- the image processing apparatus 100 is applied to a mobile terminal, and includes an image acquisition module 110 , a first execution module 120 and a second execution module 130 .
- the image acquisition module 110 may be configured to acquire images to be processed.
- the first execution module 120 may be configured to input the image to be processed into the image reconstruction model, and use the feature extraction network of the image reconstruction model to perform multi-scale feature extraction on the image to be processed and expand image channels to obtain a reconstructed feature map.
- the feature extraction network includes a convolutional layer, a plurality of concatenated blocks and a plurality of first convolutional layers, and the plurality of concatenated blocks and the plurality of first convolutional layers are alternately arranged, and the feature extraction network adopts global cascade structure;
- the first execution module 120 may be specifically configured to: input the image to be processed into the convolution layer for convolution processing to obtain an initial feature map; use the initial feature map as the input of the first concatenated block; The output of the first convolutional layer is used as the input of the Nth convolutional block, and the multi-scale feature extraction is performed by the concatenated block, and the intermediate feature map is output; The intermediate feature maps output by the concatenated blocks are channel-stacked, and after stacking, they are input to the Nth first convolutional layer for convolution processing; the output of the last first convolutional layer is used as the reconstructed feature map.
- the concatenated block includes multiple residual blocks and multiple second convolutional layers, the multiple residual blocks and multiple second convolutional layers are alternately arranged, and the concatenated block adopts a local cascade structure ;
- the first execution module 120 may perform multi-scale feature extraction using concatenated blocks, and output an intermediate feature map, including: taking the input of the concatenated block as the input of the first residual block, and using the N-1 th block as the input of the first residual block.
- the output of the second convolutional layer is used as the input of the Nth residual block, and the residual feature is learned by using the residual block to obtain the residual feature map;
- the output of the difference block is subjected to channel stacking, and after stacking, it is input to the Nth second convolutional layer for convolution processing; the output of the last second convolutional layer is used as the intermediate feature map.
- the residual block includes a grouped convolutional layer, a third convolutional layer and a fourth convolutional layer, the grouped convolutional layer adopts a ReLu activation function, and the grouped convolutional layer and the third convolutional layer are connected to form Residual path, residual block adopts local skip connection structure;
- the first execution module 120 may perform the method of learning residual features by using the residual block to obtain the residual feature map, including: using the input of the residual block as the input of the grouped convolution layer, and extracting features through the residual path;
- the input of the block and the output of the third convolutional layer are feature fusion, and after fusion, they are input to the fourth convolutional layer for convolution processing, and the residual feature map is output.
- the second execution module 130 may be configured to use the sub-pixel convolution layer of the image reconstruction model to amplify the reconstructed feature map to obtain a reconstructed image.
- the second execution module 130 may be specifically configured to: use a sub-pixel convolution layer to adjust the pixel positions in the reconstructed feature map to obtain a reconstructed image.
- FIG. 19 shows a schematic block diagram of an image reconstruction model training apparatus 200 provided by an embodiment of the present application.
- the model training apparatus 200 is applied to any electronic device with image processing function, and may include: a sample acquisition module 210 , a first processing module 220 , a second processing module 230 , a third processing module 240 and a fourth processing module 250 .
- the sample acquisition module 210 may be configured to acquire training samples, where the training samples include low-resolution images and high-resolution images, and the low-resolution images are obtained by down-sampling the high-resolution images.
- the first processing module 220 may be configured to input the low-resolution image into a pre-built image reconstruction model, where the image reconstruction model includes a feature extraction network and a sub-pixel convolutional layer.
- the second processing module 230 may be configured to use a feature extraction network to perform multi-scale feature extraction on the low-resolution image and expand image channels to obtain a training feature map.
- the third processing module 240 may be configured to use a sub-pixel convolutional layer to amplify the training feature map to obtain a training reconstructed image.
- the fourth processing module 250 may be configured to perform back-propagation training on the image reconstruction model based on the training reconstructed image, the high-resolution image and the preset objective function to obtain a trained image reconstruction model.
- the objective function is an L2 loss function
- the fourth processing module 250 may be specifically configured to: perform back-propagation training on the image reconstruction model based on the training reconstructed image, the high-resolution image and the L2 loss function, so as to adjust the parameters of the image reconstruction model until the preset value is reached.
- the training completion condition is obtained, and the image reconstruction model after training is obtained.
- the first processing module 220 may also be configured to: prune the trained image reconstruction model, so as to retain long-line cascades and delete short-line cascades.
- the first processing module 220 may also be configured to: perform flip symmetry processing on the low-resolution image to obtain at least one processed low-resolution image.
- the second processing module 230 may be specifically configured to: input at least one processed low-resolution image into the image reconstruction model.
- the third processing module 240 may be specifically configured to: use a feature extraction network to perform multi-scale feature extraction on at least one processed low-resolution image to obtain at least one auxiliary feature map; perform reverse flip symmetry on the at least one auxiliary feature map processing, and averaged after anti-flip symmetry processing to obtain the trained feature map.
- FIG. 20 shows a schematic block diagram of an electronic device 10 provided by an embodiment of the present application.
- the electronic device 10 may be a mobile terminal that executes the above image processing method, or may be any electronic device having an image processing function that executes the above model training method.
- the electronic device 10 includes a processor 11 , a memory 12 and a bus 13 , and the processor 11 is connected to the memory 12 through the bus 13 .
- the memory 12 is used to store programs, such as the image processing apparatus 100 shown in FIG. 18 or the model training apparatus 200 shown in FIG. 19 .
- the image processing apparatus 100 includes at least one software function module that can be stored in the memory 12 in the form of software or firmware.
- the processor 11 executes the program to realize The image processing methods disclosed in the above embodiments.
- the memory 12 may include a high-speed random access memory (Random Access Memory, RAM), and may also include a non-volatile memory (non-volatile memory, NVM).
- RAM Random Access Memory
- NVM non-volatile memory
- the processor 11 may be an integrated circuit chip with signal processing capability. In the implementation process, each step of the above-mentioned method can be completed by an integrated logic circuit of hardware in the processor 11 or an instruction in the form of software.
- the above-mentioned processor 11 can be a general-purpose processor, including a central processing unit (Central Processing Unit, CPU), a microcontroller unit (Microcontroller Unit, MCU), a complex programmable logic device (Complex Programmable Logic Device, CPLD), field programmable Gate Array (Field Programmable Gate Array, FPGA), embedded ARM and other chips.
- CPU Central Processing Unit
- MCU microcontroller Unit
- CPLD Complex Programmable Logic Device
- FPGA Field Programmable Gate Array
- embedded ARM embedded ARM
- Embodiments of the present application further provide a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by the processor 11, implements the image processing method or the model training method disclosed in the foregoing embodiments.
- an image processing and model training method, device, electronic device, and storage medium obtained by the embodiments of the present application obtain an image to be processed and input an image reconstruction model, and the image reconstruction model includes a feature extraction network and a sub-pixel convolution
- the feature extraction network is used to extract the multi-scale feature of the image to be processed and expand the image channel to obtain the reconstructed feature map, and then use the sub-pixel convolution layer to amplify the reconstructed feature map to obtain the reconstructed image.
- the processing speed can be improved while ensuring the reconstruction effect.
- FIG. 21 is a schematic diagram of an exemplary component of an electronic device provided in an embodiment of the present application.
- the electronic device may include a storage medium 2110, a processor 2120, machine-executable instructions 2130 (the machine-executable instructions 2130 may be the portrait super-resolution reconstruction apparatus 131 or the portrait super-resolution reconstruction model training apparatus 132 according to the present application) and Communication interface 140 .
- the storage medium 2110 and the processor 2120 are both located in the electronic device and are provided separately.
- the storage medium 2110 may also be independent of the electronic device, and may be accessed by the processor 2120 through a bus interface.
- the storage medium 2110 may also be integrated into the processor 2120, for example, may be a cache and/or a general purpose register.
- the machine-executable instructions 2130 can be understood as the electronic device described in FIG. 21 , or the processor 2120 of the electronic device, and can also be understood as being implemented under the control of the electronic device independently of the electronic device or the processor 2120 described in FIG. 21
- the software function module of the above-mentioned portrait super-resolution reconstruction method or portrait super-resolution reconstruction model training method can be understood as the electronic device described in FIG. 21 , or the processor 2120 of the electronic device, and can also be understood as being implemented under the control of the electronic device independently of the electronic device or the processor 2120 described in FIG. 21
- the software function module of the above-mentioned portrait super-resolution reconstruction method or portrait super-resolution reconstruction model training method can be understood as the electronic device described in FIG. 21 , or the processor 2120 of the electronic device, and can also be understood as being implemented under the control of the electronic device independently of the electronic device or the processor 2120 described in FIG. 21
- the above-mentioned human portrait super-resolution reconstruction apparatus 131 may include a detection module 1311 , a processing module 1312 and a restoration module 1313 .
- the functions of each functional module of the portrait super-resolution reconstruction apparatus 131 will be described in detail below.
- the detection module 1311 can be configured to use a pre-built reconstruction model to perform key point detection on the image to be processed to obtain face key points;
- the detection module 1311 may be configured to perform the above step S110, and for the detailed implementation of the detection module 1311, please refer to the above-mentioned content related to the step S110.
- the processing module 1312 can be configured to perform super-resolution reconstruction processing according to the face key points and the image features obtained based on the to-be-processed image to obtain high-frequency image information;
- processing module 1312 may be configured to execute the above-mentioned step S120, and for the detailed implementation of the processing module 1312, please refer to the above-mentioned content related to the step S120.
- the restoration module 1313 may be configured to perform restoration processing on the to-be-processed image by using the high-frequency information of the image to obtain a super-resolution image corresponding to the to-be-processed image.
- the restoration module 1313 may be configured to perform the above step S130, and for the detailed implementation of the restoration module 1313, please refer to the above-mentioned content related to the step S130.
- the portrait super-resolution reconstruction apparatus may further include: the image processing apparatus according to FIG. 18 , the image processing apparatus being configured to perform super-resolution reconstruction processing.
- the key point detection, super-resolution reconstruction processing and restoration processing include multiple rounds of iterative processing, and the to-be-processed image is an unprocessed to-be-processed image, or an image that has been processed in a previous round of iterations.
- the super-resolution image obtained after the key point detection, super-resolution reconstruction processing and restoration processing.
- the face key points include multiple, and the above-mentioned restoration module 1313 can be used to obtain a super-resolution image in the following manner:
- restoration processing is performed on the to-be-processed image to obtain a super-resolution image corresponding to the to-be-processed image.
- the restoration module 1313 may be configured to obtain a super-resolution image based on the position information of each face key point and the high-frequency information of the image in the following manner:
- restoration processing is performed on the corresponding face key points in the to-be-processed image.
- the reconstructed model includes a discriminator and a generation network, and the generation network is obtained after training with training samples under the supervision of the trained discriminator.
- the face key points include left eye, right eye, nose, mouth and chin contours.
- the above-mentioned portrait super-resolution reconstruction model training device 132 may include an acquisition module 1321 , a key point acquisition module 1322 , an output image acquisition module 1323 and a training module 1324 .
- the functions of each functional module of the portrait super-resolution reconstruction model training device 132 will be described in detail below.
- an acquisition module 1321 which can be configured to acquire training samples and target samples corresponding to the training samples
- the acquisition module 1321 may be configured to perform the above step S2100, and for the detailed implementation of the acquisition module 1321, please refer to the above-mentioned content related to the step S2100.
- the key point obtaining module 1322 can be configured to perform key point detection on the training sample by using the constructed generating network to obtain training key points;
- the key point obtaining module 1322 may be configured to perform the above step S2200, and for the detailed implementation of the key point obtaining module 1322, reference may be made to the above-mentioned content related to step S2200.
- the output image obtaining module 1323 can be configured to perform super-resolution reconstruction processing and restoration processing based on the training key points and the training samples to obtain an output image;
- the output image obtaining module 1323 may be configured to perform the above-mentioned step S2300, and for the detailed implementation of the output image obtaining module 1323, reference may be made to the above-mentioned content related to the step S2300.
- the training module 1324 can be configured to compare the output image and the target sample, and adjust the network parameters of the generation network based on the comparison result and continue training until the reconstruction is obtained when the first preset condition is met Model.
- training module 1324 may be configured to perform the above-mentioned step S2400, and for the detailed implementation of the training module 1324, reference may be made to the above-mentioned content related to the step S2400.
- the training module 1324 may be configured to obtain a reconstructed model based on the comparison result between the output image and the target sample in the following manner:
- the reconstruction model further includes a discriminator, and the discriminator is used to supervise the training of the generation network
- the portrait super-resolution reconstruction model training device 132 further includes a building module, and the building module is used for :
- the parameters of the discriminator are adjusted until the trained discriminator is obtained when the second preset condition is satisfied.
- the training module 1324 can obtain the reconstructed model in the following manner:
- the training module 1324 may be configured to construct a reconstruction model based on the discriminant information and the alignment results in the following manner:
- a third loss function is constructed based on the discriminant information of the output image by the discriminator, and a fourth loss function is constructed based on the image difference between the output image and the target sample obtained by the pre-built portrait cognitive model;
- the reconstructed model is obtained when the function value satisfies the first preset condition.
- embodiments of the present application further provide a computer-readable storage medium, where the computer-readable storage medium stores machine-executable instructions 130, and when the machine-executable instructions 130 are executed, realize the super-resolution reconstruction method for portraits provided by the above embodiments Or portrait super-resolution reconstruction model training method.
- the computer-readable storage medium can be a general-purpose storage medium, such as a removable disk, a hard disk, etc., when the computer program on the computer-readable storage medium is run, the above-mentioned portrait super-resolution reconstruction method or portrait super-resolution method can be executed. Rate reconstruction model training method.
- the processes involved when the computer-readable storage medium and its executable instructions are executed reference may be made to the relevant descriptions in the foregoing method embodiments, which will not be described in detail here.
- the portrait super-resolution reconstruction method, the portrait super-resolution reconstruction model training method, the device, the electronic device, and the readable storage medium perform key points on the image to be processed by using the pre-built reconstruction model. Detection, get the key points of the face, and then perform super-resolution reconstruction processing according to the key points of the face and the image features obtained based on the image to be processed to obtain the high-frequency information of the image, and use the high-frequency information of the image to restore the image to be processed to obtain the to-be-processed image. Process the super-resolution image corresponding to the image.
- the super-resolution reconstruction of the image is realized by combining the detection of face key points and the restoration of the face, and the recognition of the obtained super-resolution image is improved, which meets the needs of users in practical applications.
- the present application provides an image processing method, a portrait super-resolution reconstruction method, an image reconstruction model training method, a portrait super-resolution reconstruction model training method, and related devices, electronic equipment and storage media.
- Image reconstruction model includes a feature extraction network and a sub-pixel convolution layer.
- the feature extraction network is used to extract multi-scale features of the image to be processed and expand the image channel to obtain the reconstructed feature map, and then use the sub-pixel convolution layer to reconstruct the feature map. Zoom in to get the reconstructed image. Since the feature extraction network can extract multi-scale features and expand image channels, it is possible to obtain a better reconstruction effect without increasing the depth of the network. Image processing greatly reduces the amount of calculation and parameters; thus improving the processing speed while ensuring the reconstruction effect.
- the image reconstruction model training method, the portrait super-resolution reconstruction model training method, and the related devices, electronic equipment and storage media can be reproduced. and can be used in a variety of industrial applications.
- the image processing method and portrait super-resolution reconstruction method of the present application, the image reconstruction model training method and the portrait super-resolution reconstruction model training method, and related apparatuses, electronic equipment and storage media can be used for low-resolution Any apparatus for image super-resolution reconstruction of an image or sequence of images.
Landscapes
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Image Processing (AREA)
Abstract
本申请实施例涉及计算机视觉技术领域,提供一种图像处理和模型训练方法、装置、电子设备及存储介质,通过获取待处理图像并输入图像重建模型,图像重建模型包括特征提取网络和子像素卷积层,先利用特征提取网络对待处理图像进行多尺度特征提取及扩展图像通道得到重建特征图,再利用子像素卷积层对重建特征图进行放大得到重建图像。由于特征提取网络能够提取多尺度特征和扩展图像通道,因此,不需要增加网络深度就能够得到较好的重建效果;同时,模型末端采用子像素卷积层做图像放大,特征提取网络以小尺寸图像做处理,大幅减少了计算量和参数量;从而在保证重建效果的同时提高了处理速度。此外,本申请实施例还提供了一种人像超分辨率重建方法、模型训练方法、装置、电子设备和可读存储介质。
Description
相关申请的交叉引用
本公开要求于2020年09月16日提交中国国家知识产权局的申请号为202010977254.4、名称为“图像处理和模型训练方法、装置、电子设备及存储介质”的中国专利申请以及于2020年09月22日提交中国国家知识产权局的申请号为202011000670.5、名称为“人像超分辨率重建方法、模型训练方法、装置、电子设备和可读存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本公开中。
本申请涉及计算机视觉技术领域,具体而言,涉及一种图像处理和人像超分辨率重建及模型训练方法、装置、电子设备及存储介质。
图像超分辨率重建或图像超分辨率复原是指将给定的低分辨率图像或图像序列通过特定的处理恢复成相应的高分辨率图像的过程,被广泛应用于各类需要提升视频或图像质量的领域,例如,视频图像处理、医学成像、遥感成像、视频监控等。
目前通过深度学习算法进行超分辨率重建时,需要使用层数足够深的网络来得到较好的重建效果,因此网络结构通常很复杂,计算量大,影响处理速度。
此外,图像超分辨率重建技术还广泛应用于如人脸识别、大数据分析、安防等诸多领域,对于实现人像还原、人像识别、匹配等具有重大帮助。但是,目前在进行图像超分辨率重建过程中,例如,对人像进行超分辨率重建的过程中,通常采用的方式是针对整张图像进行重建处理,这样的方式由于并未着重于对于人眼感知较为重要的信息,从而导致重建得到的图像难以满足实际的需求。
发明内容
本申请的实施例提供了一种图像处理和模型训练方法、装置、电子设备及存储介质,用以在保证重建效果的同时提高处理速度。
本申请的实施例还提供了一种人像超分辨率重建方法、模型训练方法、装置、电子设备和可读存储介质,其能够提高得到的超分辨率图像的认知度、符合用户需求。
为了实现上述目的,本申请实施例采用的技术方案如下:
本申请的一些实施例提供了一种图像处理方法,所述方法可以包括:
获取待处理图像;
将所述待处理图像输入图像重建模型,利用所述图像重建模型的特征提取网络对所述待处理图像进行多尺度特征提取及扩展图像通道,得到重建特征图;
利用所述图像重建模型的子像素卷积层对所述重建特征图进行放大,得到重建图像。
在可选的实施方式中,所述特征提取网络可以包括卷积层、多个级联块和多个第一卷积层,多个所述级联块和多个所述第一卷积层交替设置,所述特征提取网络可以采用全局级联结构;
所述利用所述图像重建模型的特征提取网络对所述待处理图像进行多尺度特征提取,得到重建特征图的步骤,可以包括:
将所述待处理图像输入所述卷积层进行卷积处理,得到初始特征图;
将所述初始特征图作为第一个所述级联块的输入、以及将第N-1个所述第一卷积层的输出作为第N个所述级联块的输入,利用所述级联块进行多尺度特征提取,输出中间特征图;
将所述初始特征图和第N个所述第一卷积层前每个所述级联块输出的所述中间特征图进行通道叠加,并在叠加后输入第N个所述第一卷积层进行卷积处理;
将最后一个所述第一卷积层的输出作为所述重建特征图。
在可选的实施方式中,所述级联块的数量可以为3至5,所述第一卷积层的数量可以为3至5。
在可选的实施方式中,所述级联块可以包括多个残差块和多个第二卷积层,多个所述残差块和多个所述第二卷积层交替设置,所述级联块可以采用局部级联结构;
所述利用所述级联块进行多尺度特征提取,输出中间特征图的步骤,可以包括:
将所述级联块的输入作为第一个所述残差块的输入、以及将第N-1个所述第二卷积层的输出作为第N个所述残差块的输入,利用所述残差块学习残差特征,得到残差特征图;
将所述级联块的输入和第N个所述第二卷积层前每个所述残差块的输出进行通道叠加,并在叠加后输入第N个所述第二卷积层进行卷积处理;
将最后一个所述第二卷积层的输出作为所述中间特征图。
在可选的实施方式中,所述残差块的数量可以为3至5,所述第二卷积层的数量可以为3至5。
在可选的实施方式中,所述残差块可以包括分组卷积层、第三卷积层和第四卷积层,所述分组卷积层采用ReLu激活函数,所述分组卷积层和所述第三卷积层连接形成残差路径,所述残差块可以采用局部跳跃连接结构;
所述利用所述残差块学习残差特征,得到残差特征图的步骤,可以包括:
将所述残差块的输入作为所述分组卷积层的输入,通过所述残差路径提取特征;
将所述残差块的输入和所述第三卷积层的输出进行特征融合,并在融合后输入所述第四卷积层进行卷积处理,输出所述残差特征图。
在可选的实施方式中,所述利用所述图像重建模型的子像素卷积层对所述重建特征图进行放大,得到重建图像的步骤,可以包括:
利用所述子像素卷积层调整所述重建特征图中的像素位置,得到所述重建图像。
本申请的另一些实施例还提供了一种图像重建模型训练方法,所述方法可以包括:
获取训练样本,所述训练样本包括低分辨率图像和高分辨率图像,所述低分辨率图像是对所述高分辨率图像进行下采样得到的;
将所述低分辨率图像输入预先构建的图像重建模型,所述图像重建模型包括特征提取网络和子像素卷积层;
利用所述特征提取网络对所述低分辨率图像进行多尺度特征提取及扩展图像通道,得到训练特征图;
利用所述子像素卷积层对所述训练特征图进行放大,得到训练重建图像;
基于所述训练重建图像、所述高分辨率图像和预设的目标函数对所述图像重建模型进行反向传播训练,得到训练后的图像重建模型。
在可选的实施方式中,所述目标函数可以为L2损失函数;
所述基于所述训练重建图像、所述高分辨率图像和预设的目标函数对所述图像重建模型进行反向传播训练,得到训练后的图像重建模型的步骤:
基于所述训练重建图像、所述高分辨率图像和所述L2损失函数对所述图像重建模型进行反向传播训练,以对所述图像重建模型的参数进行调整,直至达到预设的训练完成条件,得到训练后的图像重建模型。
在可选的实施方式中,所述图像重建模型训练方法还可以包括:
对所述训练后的图像重建模型进行剪枝,以保留长线级联及删除短线级联。
在可选的实施方式中,所述将所述低分辨率图像输入预先构建的图像重建模型的步骤之前,所述方法还可以包括:
对所述低分辨率图像进行自减平均值处理,以突出所述低分辨率图像的纹理细节。
在可选的实施方式中,所述将所述低分辨率图像输入预先构建的图像重建模型的步骤之前,所述方法还可以包括:
对所述低分辨率图像进行翻转对称处理,得到至少一个处理后的低分辨率图像;
所述将所述低分辨率图像输入预先构建的图像重建模型的步骤,可以包括:
将所述至少一个处理后的低分辨率图像输入所述图像重建模型;
所述利用所述特征提取网络对所述低分辨率图像进行多尺度特征提取,得到训练特征图的步骤,可以包括:
利用所述特征提取网络对所述至少一个处理后的低分辨率图像进行多尺度特征提取,得到至少一个辅助特征图;
对至少一个辅助特征图进行反翻转对称处理,并在反翻转对称处理后求平均值,得到所述训练特征图。
本申请的又一些实施例还提供了一种图像处理装置,所述装置可以包括:
图像获取模块,可以被配置成用于获取待处理图像;
第一执行模块,可以被配置成用于将所述待处理图像输入图像重建模型,利用所述图像重建模型的特征提取网络对所述待处理图像进行多尺度特征提取及扩展图像通道,得到重建特征图;
第二执行模块,可以被配置成用于利用所述图像重建模型的子像素卷积层对所述重建特征图进行放大,得到重建图像。
本申请的又一些实施例还提供了一种图像重建模型训练装置,所述装置可以包括:
样本获取模块,可以被配置成用于获取训练样本,所述训练样本包括低分辨率图像和高分辨率图像,所述低分辨率图像是对所述高分辨率图像进行下采样得到的;
第一处理模块,可以被配置成用于将所述低分辨率图像输入预先构建的图像重建模型,所述图像重建模型包括特征提取网络和子像素卷积层;
第二处理模块,可以被配置成用于利用所述特征提取网络对所述低分辨率图像进行多尺度特征提取及扩展图像通道,得到训练特征图;
第三处理模块,可以被配置成用于利用所述子像素卷积层对所述训练特征图进行放大,得到训练重建图像;
第四处理模块,可以被配置成用于基于所述训练重建图像、所述高分辨率图像和预设的目标函数对所述图像重建模型进行反向传播训练,得到训练后的图像重建模型。
相对现有技术,本申请实施例提供的一种图像处理和模型训练方法、装置、电子设备及存储介质,通过获取待处理图像并输入图像重建模型,图像重建模型包括特征提取网络和子像素卷积层,先利用特征提取网络对待处理图像进行多尺度特征提取及扩展图像通道得到重建特征图,再利用子像素卷积层对重建特征图进行放大得到重建图像。由于特征提取网络能够提取多尺度特征和扩展图像通道,因此,不需要增加网络深度就能够得到较好的重建效果;同时,模型末端采用子像素卷积层做图像放大,特征提取网络以小尺寸图像做处理,大幅减少了计算量和参数量;从而在保证重建效果的同时提高了处理速度。
本申请的一些实施例提供了一种人像超分辨率重建方法,所述方法可以包括:
利用图像重建模型对待处理图像进行关键点检测,得到人脸关键点;
根据所述人脸关键点和基于所述待处理图像得到的图像特征进行超分辨率重建处理,得到图像高频信息;
利用所述图像高频信息对所述待处理图像进行复原处理,得到所述待处理图像对应的超分辨率图像。
在可选的实施方式中,使用如上所述的图像处理方法来进行超分辨率重建处理。
在可选的实施方式中,所述关键点检测、超分辨率重建处理及复原处理可以包括多轮迭代处理,所述待处理图像为未经处理的待处理图像,或前一轮迭代中经过所述关键点检测、超分辨率重建处理以及复原处理后得到的超分辨率图像。
在可选的实施方式中,所述人脸关键点可以包括多个,所述利用所述图像高频信息对所述待处理图像进行复原处理,得到所述待处理图像对应的超分辨率图像的步骤,可以包括:
利用预先构建的人像认知模型对所述待处理图像进行处理,输出各所述人脸关键点的位置信息;
基于各所述人脸关键点的位置信息以及所述图像高频信息,对所述待处理图像进行复原处理,得到所述待处理图像对应的超分辨率图像。
在可选的实施方式中,所述基于各所述人脸关键点的位置信息以及所述图像高频信息,对所述待处理图像进行复原处理,得到所述待处理图像对应的超分辨率图像的步骤,可以包括:
获取各所述人脸关键点对应的复原属性;
根据各所述人脸关键点以及其对应的位置信息、图像高频信息、复原属性,对所述待处理图像中对应人脸关键点进行复原处理。
在可选的实施方式中,所述重建模型可以包括判别器和生成网络,所述生成网络为在训练好的判别器的监督下,利用训练样本进行训练后获得。
在可选的实施方式中,所述人脸关键点可以包括左眼、右眼、鼻子、嘴巴及下巴轮廓。
本申请的另一些实施例提供一种人像超分辨率重建模型训练方法,所述方法可以包括:
获取训练样本以及所述训练样本对应的目标样本;
利用构建的生成网络对所述训练样本进行关键点检测,得到训练关键点;
基于所述训练关键点和所述训练样本进行超分辨率重建处理和复原处理,得到输出图像;
比对所述输出图像和所述目标样本,并基于比对结果对所述生成网络进行网络参数调整后继续训练,直至满足第一预设条件时得到重建模型。
在可选的实施方式中,所述比对所述输出图像和所述目标样本,并基于比对结果对所述生成网络进行网络参数调整后继续训练,直至满足第一预设条件时得到重建模型的步骤,可以包括:
基于所述输出图像的像素信息和所述目标样本的像素信息之间的差异,构建第一损失函数;
基于所述输出图像中各个人脸关键点和所述目标样本中对应人脸关键点之间的差异,构建第二损失函数;
比对所述输出图像和所述目标样本,并基于比对结果对所述生成网络进行网络参数调整后继续训练,直至所述第一损失函数和所述第二损失函数加权后的函数值满足第一预设条件时得到重建模型。
在可选的实施方式中,所述重建模型还包括判别器,所述判别器用于监督所述生成网络的训练,所述方法还可以包括:
构建判别器,利用所述判别器对所述输出图像以及所述输出图像对应的目标样本进行判别处理;
根据得到的判别结果对所述判别器进行参数调整,直至满足第二预设条件时得到训练好的判别器。
在可选的实施方式中,所述比对所述输出图像和所述目标样本,并基于比对结果对所述生成网络进行网络参数调整后继续训练,直至满足第一预设条件时得到重建模型的步骤,可以包括:
将所述输出图像输入至训练好的所述判别器得到判别信息;
比对所述输出图像和所述目标样本,得到比对结果;
根据所述判别信息和所述比对结果对所述生成网络进行网络参数调整后继续训练,直至满足第一预设条件时得到重建模型。
在可选的实施方式中,所述根据所述判别信息和所述比对结果对所述生成网络进行网络参数调整后继续训练,直至满足第一预设条件时得到重建模型的步骤,可以包括:
基于所述输出图像的像素信息和所述目标样本的像素信息之间的差异,构建第一损失函数;
基于所述输出图像中各个人脸关键点和所述目标样本中对应人脸关键点之间的差异,构建第二损失函数;
基于所述判别器对所述输出图像的判别信息构建第三损失函数,并基于预先构建的人像认知模型得到的所述输出图像和所述目标样本之间的图像差异构建第四损失函数;
根据所述判别信息和所述比对结果对所述生成网络进行网络参数调整后继续训练,直至所述第一损失函数、第二损失函数、第三损失函数和第四损失函数加权后得到的函数值满足第一预设条件时得到所述重建模型。
本申请的又一些实施例提供了一种人像超分辨率重建装置,所述装置可以包括:
检测模块,可以被配置成用于利用预先构建的重建模型对待处理图像进行关键点检测,得到人脸关键点;
处理模块,可以被配置成用于根据所述人脸关键点和基于所述待处理图像得到的图像特征进行超分辨率重建处理,得到图像高频信息;
复原模块,可以被配置成用于利用所述图像高频信息对所述待处理图像进行复原处理,得到所述待处理图像对应的超分辨率图像。
在可选的实施方式中,所述人像超分辨率重建装置还可以包括:根据如前所述的图像处理装置,所述图像处理装置可以被配置成用于进行超分辨率重建处理。
本申请的又一些实施例提供了一种人像超分辨率重建模型训练装置,所述人像超分辨率重建装置可以包括:
获取模块,可以被配置成用于获取训练样本以及所述训练样本对应的目标样本;
关键点获得模块,可以被配置成用于利用构建的生成网络对所述训练样本进行关键点检测,得到训练关键点;
输出图像获得模块,可以被配置成用于基于所述训练关键点和所述训练样本进行超分辨率重建处理和复原处理,得到输出图像;
训练模块,可以被配置成用于比对所述输出图像和所述目标样本,并基于比对结果对所述生成网络进行网络参数调整后继续训练,直 至满足第一预设条件时得到重建模型。
本申请的又一些实施例提供了一种电子设备,所述电子设备可以包括:一个或多个处理器;一个或多个存储介质,用于存储一个或多个机器可执行指令,当所述一个或多个机器可执行指令被所述一个或多个处理器执行时,使得所述一个或多个处理器实现根据一些实施方式所述的图像处理方法,或者根据另一些实施方式所述的图像重建模型训练方法,或者根据又一些实施方式所述的人像超分辨率重建方法,或者根据又一些实施方式所述的人像超分辨率重建模型训练方法。
本申请的又一些实施例提供了一种计算机可读存储介质,所述计算机可读存储介质存储有机器可执行指令,所述机器可执行指令被执行时实现根据一些实施方式所述的图像处理方法,或者根据另一些实施方式所述的图像重建模型训练方法,或者根据又一些实施方式所述的人像超分辨率重建方法,或者根据又一些实施方式所述的人像超分辨率重建模型训练方法。
本申请实施例的有益效果包括,例如:
本申请提供的人像超分辨率重建方法、模型训练方法、装置、电子设备和可读存储介质,通过利用预先构建的重建模型对待处理图像进行关键点检测,得到人脸关键点,再根据人脸关键点和基于待处理图像得到的图像特征进行超分辨率重建处理,得到图像高频信息,利用图像高频信息对待处理图像进行复原处理,得到待处理图像对应的超分辨率图像。本申请中,结合人脸关键点检测以及人脸恢复,实现图像的超分辨率重建,提高得到的超分辨率图像的认知度,符合实际应用中用户的需求。
为了更清楚地说明本申请实施例的技术方案,下面将对实施例中所需要使用的附图作简单地介绍,应当理解,以下附图仅示出了本申请的某些实施例,因此不应被看作是对范围的限定,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他相关的附图。
图1示出了本申请实施例提供的图像处理方法的一种应用场景图。
图2示出了本申请实施例提供的图像处理方法的一种流程示意图。
图3示出了本申请实施例提供的图像重建模型的一种示例图。
图4示出了本申请实施例提供的级联块的一种示例图。
图5示出了本申请实施例提供的残差块的一种示例图。
图6示出了本申请实施例提供的图像重建模型的另一种示例图。
图7示出了本申请实施例提供的一种图像处理结果展示图。
图8为本申请实施例提供的人像超分辨率重建方法的流程图。
图9为本申请实施例提供的人像超分辨率重建方法的处理流程的示意图。
图10为本申请实施例提供的人像超分辨率重建方法的处理流程的另一示意图。
图11为本申请实施例提供的人像超分辨率重建方法中,获得超分辨率图像的方法的流程图。
图12为本申请实施例提供的人像超分辨率重建方法的处理流程的又一示意图。
图13示出了本申请实施例提供的图像重建模型训练方法的一种流程示意图。
图14为本申请实施例提供的人像超分辨率重建模型训练方法的流程图。
图15为本申请实施例提供的人像超分辨率重建模型训练方法中,获得重建模型的方法的流程图之一。
图16为本申请实施例提供的人像超分辨率重建模型训练方法中,获得重建模型的方法的流程图之二。
图17(a)至图17(c)分别为插值处理方法、未加入判别器的方法以及加入判别器的方法得到的输出图像的示意图。
图18示出了本申请实施例提供的图像处理装置的一种方框示意图。
图19示出了本申请实施例提供的图像重建模型训练装置的一种方框示意图。
图20示出了本申请实施例提供的电子设备的一种方框示意图。
图21为本申请实施例提供的电子设备的结构框图。
图22为本申请实施例提供的人像超分辨率重建装置的功能模块框图。
图23为本申请实施例提供的人像超分辨率重建模型训练装置的功能模块框图。
图标:10-电子设备;11-处理器;12-存储器;13-总线;20-第一终端;30-第二终端;40-网络;50-服务器;100-图像处理装置;110-图像获取模块;120-第一执行模块;130-第二执行模块;200-模型训练装置;210-样本获取模块;220-第一处理模块;230-第二处理模块;240-第三处理模块;250-第四处理模块。
2110-存储介质;2120-处理器;2130-机器可执行指令;131-人像超分辨率重建装置;1311-检测模块;1312-处理模块;1313-复原模块;132-模型训练装置;1321-获取模块;1322-关键点获得模块;1323-输出图像获得模块;1324-训练模块;140-通信接口。
为使本申请实施例的目的、技术方案和优点更加清楚,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。通常在此处附图中描述和示出的本申请实施例的组件 可以以各种不同的配置来布置和设计。
因此,以下对在附图中提供的本申请的实施例的详细描述并非旨在限制要求保护的本申请的范围,而是仅仅表示本申请的选定实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
应注意到:相似的标号和字母在下面的附图中表示类似项,因此,一旦某一项在一个附图中被定义,则在随后的附图中不需要对其进行进一步定义和解释。
在本申请的描述中,需要说明的是,若出现术语“第一”、“第二”等仅用于区分描述,而不能理解为指示或暗示相对重要性。需要说明的是,在不冲突的情况下,本申请的实施例中的特征可以相互结合。
下面将结合本申请实施例中附图,对本申请实施例中的技术方案进行清楚、完整地描述。
请参照图1,图1示出了本申请实施例提供的图像处理方法的一种应用场景图,包括第一终端20、第二终端30、网络40及服务器50,第一终端20和第二终端30均通过网络40连接到服务器50。第一终端20和第二终端30可以是移动终端,在移动终端上可以安装有各种应用程序(Application,App),例如可以是视频播放App、即时通讯App、视频/图像采集App、购物App等。网络40可以是广域网或者局域网,又或者是二者的组合,使用无线链路实现数据传输。
第一终端20和第二终端30可以是任何具有屏幕显示功能的移动终端,例如,智能手机、笔记本电脑、平板电脑、台式计算机、智能电视等。
第一终端20可以将视频文件或者图片上传至服务器50,服务器50在接收到第一终端20上传的视频文件或者图片后,可以将该视频文件或者图片进行存储。当用户通过第二终端30观看视频或者查看图片时,第二终端30可以向服务器50请求该视频文件或者图片,服务器50可以向第二终端30返回该视频文件或者图片。通常,为了提高传输速度,会对该视频文件或图片进行压缩处理,故该视频文件或图片的分辨率较低。
第二终端30在接收到该视频文件或者图片后,可以利用本申请实施例提供的图像处理方法对该视频文件或者图片进行实时处理,得到高分辨率的视频或图片,并显示在第二终端30的显示界面中,以提高用户的画质体验。本申请实施例提供的图像处理方法可以作为一个功能插件集成在第二终端30的视频播放App或者图库App中。
以视频直播场景为例,第一终端20可以是主播的移动终端,第二终端30可以是观众的移动终端。主播在直播时,第一终端20可以将直播视频上传至服务器50,服务器50可以对该直播视频进行存储,当观众通过第二终端30观看直播时,服务器50可以向第二终端30返回该直播视频。第二终端30在接收到该直播视频后,可以利用本申请实施例提供的图像处理方法对该直播视频进行实时处理,得到高分辨率的直播视频并进行显示,这样观众就能观看到清晰的直播视频。
需要指出的是,本申请实施例提供的图像处理方法可以应用于移动终端,虽然上述是以应用于第二终端30为例进行说明,但是应当理解,该图像处理方法也可以应用于第一终端20,具体可以根据实际应用场景确定,在此不作限制。
下面对本申请实施例提供的图像处理方法进行详细介绍。
在图1所示的应用场景示意图的基础上,请参照图2,图2示出了本申请实施例提供的图像处理方法的一种流程示意图,该图像处理方法可以包括以下步骤:
S101,获取待处理图像。
待处理图像可以是在移动终端上显示的、需要进行超分辨率重建以提高图像质量的图片或者视频流中的视频帧,例如,可以是第二终端30从服务器50获取的低分辨率视频中的视频帧。
在本实施例中,移动终端可以在接收到低分辨率图片或者低分辨率视频文件时,直接进行超分辨率重建;也可以在接收到低分辨率图片或者低分辨率视频文件后先显示到显示界面,等到用户进行分辨率切换操作时再进行超分辨率重建,例如,在接收到低分辨率视频时先进行播放,当用户将清晰度从“标清”切换为“超清”时,再进行超分辨率重建。
S102,将待处理图像输入图像重建模型,利用图像重建模型的特征提取网络对待处理图像进行多尺度特征提取及扩展图像通道,得到重建特征图。
获取到待处理图像后,将待处理图像输入图像重建模型进行超分辨率重建。请参照图3,图像重建模型包括特征提取网络和子像素卷积层,特征提取网络用于提取待处理图像的多尺度特征并扩展图像通道,子像素卷积层用于对特征提取网络输出的重建特征图进行放大。
多尺度特征提取是指通过全局级联和局部级联的方式,提取不同层次的特征信息,例如,可以从底层到高层逐步进行特征提取,也可以将底层信息直接传递到高层。
图像通道是指将图像按照颜色成分划分后的一个或多个颜色通道,通常可以按照图像通道将图像分为单通道图像、三通道图像和四通道图像。单通道图像是指图像中的每个像素点只由一个数值表示,例如,灰度图;三通道图像是指图像中的每个像素点由三个数值表示,例如,RGB彩色图像;四通道图像是在三通道图像的基础上加上透明程度、Alpha色彩空间等。
扩展图像通道是指不改变图像的大小且增加图像的通道数。例如,输入为H×W×C的图像,其中,H×W为输入图像的大小,C为输入图像的通道数;输出为H×W×r
2C的图像,H×W为输出图像的大小,r
2C为输出图像的通道数。
S103,利用图像重建模型的子像素卷积层对重建特征图进行放大,得到重建图像。
子像素卷积层(sub-pixel convolution layer)也称像素重组(PixelShuffle),是一种可以高效计算的卷积层,主要功能是将低分辨的特 征图,通过卷积和多通道间的重组得到高分辨率的特征图。相比于双线性或双三次采样器(bilinear or bicubic sampler)等人工提升滤波器,子像素卷积层能够通过训练学习更复杂的提升操作,同时计算的总体时间也被降低。
例如,输入特征图为H×W×r
2C,子像素卷积层的主要功能就是将r
2个通道的特征图组合为新的r×H、r×W的上采样结果,即(r×H)×(r×W)×C,得到rH×rW×C的输出图像,完成输入特征图到输出图像的r倍放大。
子像素卷积层的工作过程可以是:首先将原来一个低分辨像素划分为r×r个小格子;然后按照一定的规则,利用r×r个输入特征图对应位置的值来填充这些小格子;按照同样的方法将每个低分辨像素划分出的小格子填满就完成了重组过程。
在一个实施例中,可以利用子像素卷积层调整重建特征图中的像素位置,得到重建图像。
例如,特征提取网络输出的重建特征图为H×W×r
2C,利用子像素卷积层调整像素位置,得到rH×rW×C的重建图像,进而完成r倍放大。
在本实施例中,子像素卷积层可以支持多种放大尺寸,例如,可以用2倍的子像素卷积层组合完成4倍的放大操作,或者,用2倍和3倍的子像素卷积层组合完成6倍的放大操作。
同时,现有的超分辨率重建算法是先插值到高分辨率再做修正,而本申请实施例中的图像重建模型将子像素卷积层设计在末端做放大,这样可以保证模型前段的特征提取网络以小尺寸图像做处理,大幅减少了计算量和参数量。
下面对步骤S102进行详细介绍。
请再次参照图3,特征提取网络包括卷积层、多个级联块和多个第一卷积层,多个级联块和多个第一卷积层交替设置,特征提取网络采用全局级联结构。全局级联结构是指图3中的左侧快速通道和右侧快速通道,通过左侧快速通道可以将级联块的输出直接输送给该级联块后的各个第一卷积层,通过右侧快速通道可以将卷积层的输出直接输送给各个第一卷积层。这里的输送是指通道的叠加,不是指数据相加。
在图3所示的特征提取网络的基础上,步骤S102中利用图像重建模型的特征提取网络对待处理图像进行多尺度特征提取及扩展图像通道,得到重建特征图的方式,可以包括:
将待处理图像输入卷积层进行卷积处理,得到初始特征图;
将初始特征图作为第一个级联块的输入、以及将第N-1个第一卷积层的输出作为第N个级联块的输入,利用级联块进行多尺度特征提取,输出中间特征图;
将初始特征图和第N个第一卷积层前每个级联块输出的中间特征图进行通道叠加,并在叠加后输入第N个第一卷积层进行卷积处理;
将最后一个第一卷积层的输出作为重建特征图。
其中,卷积层和第一卷积层可以扩展图像通道,卷积层、级联块和第一卷积层可以提取特征。
将初始特征图和中间特征图进行通道叠加,是指合并初始特征图的通道和中间特征图的通道,例如,初始特征图有4个通道,中间特征图有8个通道,将二者进行通道叠加,则叠加后的特征图就有12个通道;换句话说,初始特征图中每个像素由4个数值表示,中间特征图中每个像素由8个数值表示,通道叠加后的特征图中每个像素由12个数值表示。
在一个实施例中,级联块的结构如图4所示,级联块包括多个残差块和多个第二卷积层,多个残差块和多个第二卷积层交替设置,级联块采用局部级联结构。局部级联结构是指图4中的左侧快速通道和右侧快速通道,通过左侧快速通道可以将残差块的输出直接输送给该残差块后的各个第二卷积层,通过右侧快速通道可以将级联块的输入直接输送给各个第二卷积层。同上,这里的输送是指通道的叠加,不是指数据相加。
在图4所示的级联块的基础上,利用级联块进行多尺度特征提取,输出中间特征图的方式,可以包括:
将级联块的输入作为第一个残差块的输入、以及将第N-1个第二卷积层的输出作为第N个残差块的输入,利用残差块学习残差特征,得到残差特征图;
将级联块的输入和第N个第二卷积层前每个残差块的输出进行通道叠加,并在叠加后输入第N个第二卷积层进行卷积处理;
将最后一个第二卷积层的输出作为中间特征图。
其中,第二卷积层可以扩展图像通道,残差块和第二卷积层可以提取特征。
将级联块的输入与残差块的输出进行通道叠加的过程,与上述将初始特征图和中间特征图进行通道叠加的过程类似,在此不再赘述。
在一个实施例中,残差块的结构如图5所示,残差块可以包括分组卷积层、第三卷积层和第四卷积层,分组卷积层采用ReLu激活函数,分组卷积层和第三卷积层连接形成残差路径,残差块采用局部跳跃连接结构。局部跳跃连接结构是指,将残差块的输入与残差路径的输出进行融合以学习残差特征。
在图5所示的残差块的基础上,利用残差块学习残差特征,得到残差特征图的方式,可以包括:
将残差块的输入作为分组卷积层的输入,通过残差路径提取特征;
将残差块的输入和第三卷积层的输出进行特征融合,并在融合后输入第四卷积层进行卷积处理,输出残差特征图。
其中,第三卷积层和第四卷积层可以扩展图像通道,分组卷积层可以提取特征。
分组卷积层(Group Convolution layer)可以对输入特征图进行分组。然后每组分别进行卷积处理。与常规卷积相比,分组卷积能够减少模型参数,从而提高模型的处理速度。
在本实施例中,多个分组卷积层的层数、以及每个分组卷积层对输入特征图的分组数,可以由用户根据实际需要灵活选择,例如,分组卷积层的层数为2,分组数为3等。
本实施例中的卷积层、第一卷积层、第二卷积层、第三卷积层和第四卷积层的类型并不限定,例如,可以是常规卷积(Conv)、1×1点卷积、深度卷积等,可以根据实际需求灵活调整。
通常,图像重建模型的表现力会随着全局级联或者局部级联的复杂度增加,也就是说,特征提取网络中级联块和第一卷积层的数量越多、或者级联块中残差块和第二卷积层的数量越多,图像重建模型的表现力会越强。但是,网络结构越复杂,计算速度也会越慢。因此,为了在保证重建效果的同时提高处理速度,各模块的数量不宜过多。
在一个实施方式中,特征提取网络中级联块和第一卷积层的数量均可以是3至5,级联块中残差块和第二卷积层的数量均可以是3至5,残差块中分组卷积层的数量可以是2至4。例如,请参照图6,可以设置特征提取网络包括3个级联块和3层第一卷积层,级联块包括3个残差块和3层第二卷积层,残差块包括2层分组卷积层。
另外,可以设置级联块中的各模块共享参数,即,多个残差块共享参数以及多个第二卷积层共享参数,从而使图像重建模型进一步轻量化,以提高处理速度,但是,共享参数后会有一定的效果损失。
例如,请参照图7,其中,左图和中图为采用本申请实施例提供的图像处理方法得到的重建图像,左图未共享参数,中图共享参数;右图为采用双三次插值(Bicubic)算法得到的重建图像。从图中可以看出,从图中可以看出,左图和中图明显比右图清晰。
根据本申请的示例性实施方式,图8示出了一种人像超分辨率重建方法的流程示意图。该人像超分辨率重建方法的详细步骤介绍如下。
步骤S110,利用预先构建的重建模型对待处理图像进行关键点检测,得到人脸关键点。
步骤S120,根据所述人脸关键点和基于所述待处理图像得到的图像特征进行超分辨率重建处理,得到图像高频信息。
步骤S130,利用所述图像高频信息对所述待处理图像进行复原处理,得到所述待处理图像对应的超分辨率图像。
应当理解,在本申请的其它实施例中,本实施例的人像超分辨率重建方法其中部分步骤的顺序可以根据实际需要相互交换,或者其中的部分步骤也可以省略或删除。
人脸图像的超分辨率处理,也就是对人脸图像的清晰度进行提升的处理。在根据本申请实施例的人像超分辨率重建方法中,针对的待处理图像可以是清晰度较低的图像,这类图像往往存在人脸清晰度较低,而对例如图像识别、图像匹配等造成障碍的问题。例如,待处理图像可以是监控设备所采集到的人脸图像、或者是网页截屏获得的人脸图像、又或者是直播过程中所采集的主播人脸图像等等。
人脸上其人脸关键点往往是对于人眼认知最重要的信息,若人脸关键点的清晰度得到有效的提升,则得到的人脸图像将更能满足人像重建的要求。因此,在根据本申请实施例的人像超分辨率重建方法中,可首先利用构建的重建模型对待处理图像进行关键点检测得到人脸关键点。
其中,得到的人脸关键点可包括左眼、右眼、鼻子、嘴巴以及下巴轮廓。基于这些人脸关键点可以大致勾勒出人脸轮廓,并且,包含人脸关键点中对于人眼认知最重要的眼睛部分。
虽然人脸关键点对于人像超分辨率重建十分重要,但是,在重建过程中同样需要考虑人脸图像中的其他区域的重建处理,如此,可得到局部重点处理、全局也得到处理的符合要求的超分辨率图像。
因此,在根据本申请实施例的人像超分辨率重建方法中,一方面通过关键点检测的方式得到人脸关键点,另一方面,可同时对待处理图像进行图像特征提取以得到图像特征。结合人脸关键点和得到的图像特征进行超分辨率重建处理,得到图像高频信息。
其中,图像高频信息主要体现图像中一些边缘、轮廓处的信息,而轮廓之内的灰度缓慢变化的部分则为低频信息。图像高频信息可体现出具有相对变化区域的信息,因此,对于图像的重建十分重要。
在本申请的一些实施例中,可以使用结合图2所描述的根据本申请的图像处理方法来进行图像的超分辨率重建处理,以得到所述图像高频信息。
本申请的示例性实施例中得到的图像高频信息为人脸图像中的局部信息,需要将图像高频信息还原至待处理图像中,以对待处理图像进行复原处理,得到待处理图像对应的超分辨率图像。
本实施例所提供的人像超分辨率重建方法,通过人脸关键点检测,并利用人脸关键点和图像的图像特征得到图像高频信息,再利用图像高频信息对待处理图像进行复原处理,可提高得到的超分辨率图像的认知度,符合实际应用中用户的需求。
实际应用中,由于待处理图像的清晰度往往较低,因此,基于低清晰度的图像进行关键点检测得到的人脸关键点其效果并不理想,进而导致后续得到的超分辨率图像的效果不佳。因此,根据本申请的实施例中,上述的关键点检测、超分辨率重建处理及复原处理可包括多轮迭代处理。而上述的待处理图像可以是未经处理的待处理图像,或者是前一轮迭代中经过所述关键点检测、超分辨率重建处理以及复原处理后得到的超分辨率图像。
具体地,参阅图9,对于未经处理的待处理图像LR Face(Low Resolution Face),通过上述的关键点检测、超分辨率重建处理以及复原处理后可得到该轮迭代处理后的超分辨率图像SRFace(Super Resolution Face)。然后,在得到的超分辨率图像的基础上,再进行上述的关键点检测、超分辨率重建处理以及复原处理,得到第二轮迭代后的超分辨率图像。按该处理逻辑,在进行多次迭代处理后,在满足一定要求时,得到最终的超分辨率图像。
请结合参阅图10,针对输入的待处理图像Input,首先,可以对其进行关键点检测,得到对应的人脸关键点Face Points 0,基于Face Points 0并结合Input的图像特征得到图像高频信息,再根据图像高频信息和Input得到第一轮的超分辨率图像Face SR1。在此基础上,针对Face SR1进行关键点检测,得到对应的人脸关键点Face Points 1,基于Face Points 1并结合Face SR1的图像特征得到图像高频信息,再根据图像高频信息和Face SR1得到第二轮的超分辨率图像Face SR2。按此处理逻辑,在进行N次迭代处理之后(N为预设的迭代停止次数或N次迭代后得到的图像满足预设要求),可得到最终的超分辨率图像Face SR N。
在根据本申请实施例的人像超分辨率重建方法中,采用循环递归的方式,将上一轮处理得到的图像作为检测对象以进行多次循环处理,可不断提高得到的超分辨率图像的质量。
相应地,可以将多次循环中的模型参数进行共享,从而使模型更加轻量化,以为将模型应用于如移动端等处理能够较弱的设备提供支持。此外,在根据本申请实施例的人像超分辨率重建方法中,在处理资源有限的情况下,除了可以共享参数之外,还可在一定范围内优先增加网络宽度,即特征提取通道的数量,而不需要着重于网络深度,即网络层数上,结合利用循环递归处理的方式,以提高模型的识别准确性。
由上述可知,在根据本申请实施例的人像超分辨率重建方法中,检测的人脸关键点包括多个,而在基于人脸关键点和图像特征得到图像高频信息以对待处理图像进行复原时,需要能够将对应的人脸关键点对应至待处理图像中的准确地位置,以避免出现关键点偏移的现象。因此,请参阅图11,在根据本申请实施例的人像超分辨率重建方法中,上述进行复原处理时,可通过以下步骤实现:
步骤S131,利用预先构建的人像认知模型对所述待处理图像进行处理,输出各所述人脸关键点的位置信息。
步骤S132,根据各所述人脸关键点以及其对应的位置信息、图像高频信息,对所述待处理图像中对应人脸关键点进行复原处理,得到所述待处理图像对应的超分辨率图像。
在一些实施例中,可以构建神经网络模型,该神经网络模型可以是例如卷积神经网络模型(Convolutional Neural Networks,CNN)等。可采集多个训练样本,其中,各个训练样本中包含人脸图像,且各人脸图像中的人脸关键点携带有位置信息,该位置信息可以是各人脸关键点在人脸区域中的相对位置信息,也可以是将人脸区域映射至坐标系中,以人脸关键点在坐标系中的坐标值作为其位置信息。
利用训练样本对构建的神经网络模型进行训练,以得到满足要求的人像认知模型。利用该人像认知模型可识别得到待处理图像中各个人脸关键点的位置信息。
如此,在进行复原处理时,结合参阅图12,对于待处理图像LR Face中的人脸关键点,如左眼、右眼、鼻子、嘴巴、以及下巴轮廓,则可以基于根据人像认知模型所得到的人脸关键点的位置信息以及图像高频信息包含的对应人脸关键点的高频信息,对待处理图像中该人脸关键点进行复原处理,以得到最终的超分辨率图像SR Face。
在根据本申请实施例的人像超分辨率重建方法中,由于只需利用人像认知模型识别各个人脸关键点的位置信息,所需分析、处理的数据信息较少,因此,该人像认知模型可基于轻量级的网络模型所构建,以避免网络模型构建以及运行对处理资源的不必要的过多占用。
在根据本申请实施例的人像超分辨率重建方法中,采用人像认知模型得到各人脸关键点的位置信息的方式,可在进行复原时,准确地基于各人脸关键点的位置对待处理图像中的对应位置进行复原处理,避免出现对应人脸关键点的复原移位的现象出现。
此外,不同的人脸关键点在复原时其具体的复原要求往往不同,例如,对于眼睛而言,希望复原处理后的眼睛可以亮度较高,而对于下巴轮廓而言,可能希望复原处理后的下巴轮廓线条更清晰。
因此,在根据本申请实施例的人像超分辨率重建方法中,基于上述考虑,在进行复原处理时,可首先获取各个人脸关键点对应的复原属性,该复原属性即为上述的复原处理的不同要求信息。再根据各人脸关键点的位置信息、复原属性以及图像高频信息,对待处理图像进行复原处理,得到对应的超分辨率图像。
在根据本申请实施例的人像超分辨率重建方法中,通过上述方式,可按区分各人脸关键点并基于其对应的位置信息、复原属性进行人脸关键点的独立恢复,不仅可以满足不同人脸关键点的复原的针对性需求,且重建模型在处理时,也可基于分组卷积的方式进行同步处理,可大幅减少处理的时间。
在以上根据本申请的示例性实施方式的人像超分辨率重建方法中,超分辨率重建处理是采用预先所构建并训练得到的重建模型来实现的。
接下来,结合图13对根据本申请实施例的图像重建模型的训练过程进行详细介绍。
本申请实施例提供的模型训练方法可以应用于任何具有图像处理功能的电子设备,例如,服务器、移动终端、通用计算机或者特殊用途的计算机等。
请参照图13,图13示出了本申请实施例提供的图像重建模型训练方法的一种流程示意图,该模型训练方法可以包括以下步骤:
S201,获取训练样本,训练样本包括低分辨率图像和高分辨率图像,低分辨率图像是对高分辨率图像进行下采样得到的。
这里的训练样本是一个数据集,可以获取大量的高分辨率图像(例如,分辨率高于某一预设值)作为原始样本,这些高分辨率图像可以是各种类型的图片或视频中的视频帧,例如,可以是视频直播场景中的高清直播视频等。
在获取到原始样本后,对原始样本进行下采样处理,也就是对每张高分辨率图像均按照同样的方法进行下采样处理,得到训练样本。下采样处理的方式可以是双三次插值等。
另外,如果想要在超分辨率重建的同时完成降噪,则可以对训练样本中的低分辨率提箱增加噪声,之后再输入模型进行训练,这样训练后的模型就能既完成超分辨率重建又完成降噪。
S202,将低分辨率图像输入预先构建的图像重建模型,图像重建模型包括特征提取网络和子像素卷积层。
S203,利用特征提取网络对低分辨率图像进行多尺度特征提取及扩展图像通道,得到训练特征图。
S204,利用子像素卷积层对训练特征图进行放大,得到训练重建图像。
需要指出的是,步骤S203~S204的处理过程与步骤S102~S103的处理过程类似,在此不再赘述。
S205,基于训练重建图像、高分辨率图像和预设的目标函数对图像重建模型进行反向传播训练,得到训练后的图像重建模型。
在本实施例中,目标函数可以是L2损失函数,L2损失函数又称均方误差(Mean Square Error,MSE)函数,是回归损失函数的一种。L2损失函数的曲线光滑、连续,处处可导,便于使用梯度下降算法;并且随着误差的减小,梯度也在减小,从而有利于收敛,即使使用固定的学习速率,也能较快的收敛到最小值。
在本实施例中,可以基于训练重建图像、高分辨率图像和L2损失函数对图像重建模型进行反向传播训练,以对图像重建模型的参数进行调整,直至达到预设的训练完成条件,得到训练后的图像重建模型。
训练完成条件可以是迭代次数达到设定值(例如,2000次),或者,L2损失函数收敛到最小值等,在此不作限制,可以根据实际需求设置。
通常,对于特征提取网络来说,越到后段提取到的特征越少,因此,在训练完成后,可以根据需求和测试结果对训练后的图像重建模型进行剪枝,保留长线级联而删除短线级联,从而减少中间过多的跳跃,使模型更加轻量化。
在一个实施例中,可以先对低分辨率图像做预处理,预处理之后再输入图像重建模型,预处理可以是将图片自减平均值。因此,在步骤S202之前,该模型训练方法还可以包括:
对低分辨率图像进行自减平均值处理,以突出低分辨率图像的纹理细节。
自减平均值处理可以是对图像中的前景不做处理,而对背景中的每个像素减去背景图像的像素平均值,以此来增强背景部分和前景部分的对比度,突出纹理细节。
在另一个实施例中,为了使特征提取网络提取到更多的特征,预处理还可以是对图片进行翻转对称操作后输入模型,再对模型输出的结果进行反翻转对称并求平均值,从而减少各向异性带来的某些特征层或者参数的偏差。因此,在步骤S202之前,该模型训练方法还可以包括:
对低分辨率图像进行翻转对称处理,得到至少一个处理后的低分辨率图像。
之后将至少一个处理后的低分辨率图像输入图像重建模型,利用特征提取网络对至少一个处理后的低分辨率图像进行多尺度特征提取,得到至少一个辅助特征图;再对至少一个辅助特征图进行反翻转对称处理,并在反翻转对称处理后求平均值,得到训练特征图。
例如,对于1张n×n的图像,按照顺时针方向翻转3次,每次翻转90°,这样就能得到4张n×n的图像;之后将4张n×n的图像输入图像重建模型,特征提取网络输出4张辅助特征图;再按照逆时针方向将对应的3张辅助特征图分别翻转90°、180°和270°;再对处理后的4张辅助特征图进行像素平均,得到最终的训练特征图。
需要指出的是,可以先对低分辨率图像进行自减平均值处理,再对低分辨率图像进行翻转对称处理;也可以先对低分辨率图像进行翻转对称处理,再对低分辨率图像进行自减平均值处理。可以根据实际需要灵活设置,在此不作限制。
另外,在实际应用中,为了提高模型的处理速度,可以在完成训练的模型基础上,进行新的模型的训练。例如,在训练3倍、4倍的放大模型时,假设2倍的放大模型是完成训练的,则可以将2倍放大的模型的参数作为3倍、4倍的放大模型的初始参数,在此基础上进行训练。
根据本申请的示例性实施方式,还提供一种人像超分辨率重建模型训练方法,用于训练得到用于前述示例性实施方式中的人像超分辨率重建方法的重建模型,图14示出了本申请实施例提供的人像超分辨率重建模型训练方法的流程示意图。
如图所示,根据本申请的人像超分辨率重建模型训练方法包括:
步骤S2100,获取训练样本以及所述训练样本对应的目标样本;
步骤S2200,利用构建的生成网络对所述训练样本进行关键点检测,得到训练关键点;
步骤S2300,基于所述训练关键点和所述训练样本进行超分辨率重建处理和复原处理,得到输出图像;
步骤S2400,比对所述输出图像和所述目标样本,并基于比对结果对所述生成网络进行网络参数调整后继续训练,直至满足第一预设条件时得到重建模型。
本申请实施例提供的人像超分辨率重建模型训练方法中,通过对训练样本进行关键点检测,并基于训练关键点和训练样本的图像特征结合进行模型的训练,可提高得到的重建模型的重建准确度。
在一些实施例中,预先采集多个训练样本,各个训练样本可以是包含清晰度较低的人脸图像的样本图像。而训练样本所对应的目标样本,即为满足要求的,即在对训练样本进行处理后所希望得到的高清晰度的样本。
在一些实施例中,预先构建的生成网络可以是循环递归网络,利用生成网络对训练样本进行关键点检测、超分辨率重建处理和复原处理的过程可参见上述描述。在处理之后,生成网络可输出训练样本所对应的输出图像。
目标样本作为生成网络处理质量的比对标准,可以通过比对输出图像和目标样本之间差异,以根据比对结果来不断训练生成网络,以使得到的输出图像与目标样本的差异降低至满足一定要求的情况时,得到重建模型。
在一些实施例中,针对输入至生成网络的样本,可对样本进行预处理,例如自减平均值的方式,从而提出图片纹理的细节,以提高后续处理、识别的效果。
在此基础上,还可对预处理后的样本进行翻转对称操作再输入至生成网络,对于生成网络的各网络层的输出结果,可对输出结果进行反翻转对称求平均值处理,如此,可以减少各向异性带来的某些网络层或者参数的偏差。
在对生成网络进行训练以及测试的过程中,可以根据需求以及测试的结果对网络进行剪枝处理,以保留前面对结果影响较大的若干次循环,在此基础上,再以此为基础继续训练,从而提高得到的生成网络的重建精度,后续处理后的图像的峰值信噪比和结构相似性也可得到较大的提升。
在本申请的一些实施例中,可构建损失函数以检测生成网络的训练。
请参阅图15,根据本申请的人像超分辨率重建模型训练方法的上述步骤S2400可通过以下方式实现:
步骤S2410,基于所述输出图像的像素信息和所述目标样本的像素信息之间的差异,构建第一损失函数;
步骤S2420,基于所述输出图像中各个人脸关键点和所述目标样本中对应人脸关键点之间的差异,构建第二损失函数;
步骤S2430,比对所述输出图像和所述目标样本,并基于比对结果对所述生成网络进行网络参数调整后继续训练,直至所述第一损失函数和所述第二损失函数加权后的函数值满足第一预设条件时得到重建模型。
在根据本申请的实施例中,可构建第一损失函数和第二损失函数以综合评价生成网络的训练。其中,第一损失函数从图像之间的像素差异的角度进行评价。此外,考虑到本实施例中图像经过了人脸关键点的检测,而人脸关键点对于人像重建尤为重要,因此,加入了以人脸关键点之间的差异信息所构建的第二损失函数。
其中,第一损失函数表征生成网络的输出图像与目标样本(即想要的输出效果)之间的整体的像素级的欧式距离,第二损失函数表征生成网络的进行人脸关键点检测之后各人脸关键点与目标样本(想要的输出效果)中对应人脸关键点之间的欧式距离。
将上述第一损失函数和第二损失函数进行加权组合,以共同作为生成函数的损失函数。在生成网络的训练过程中,通过比对输出图像和目标样本,也即进行包含第一损失函数和第二损失函数的综合损失函数的函数值的计算。以在得到的函数值满足第一预设条件时,得到重建模型。其中,第一预设条件可以是损失函数值不再下降以达到收敛,或者是损失函数值低于某个预设值。或者也可以在迭代次数达到预设最大次数时,停止训练得到重建模型。
根据本申请的实施例,采用基于像素信息差异所构建的第一损失函数和基于人脸关键点之间的差异的第二损失函数,进行重建模型的训练监督判断,可提高后续应用重建模型进行重建处理时得到的超分辨率图像的认知度。
根据本申请的实施例,通过上述预先所构建的由生成网络所得到的重建模型,以应用于上述待处理图像的重建,可提高得到的超分辨率图像的认知度。
由上述可知,根据本申请实施例的人像超分辨率重建模型训练方法中的重建模型包含生成网络,该生成网络为预先训练而构建得到的,可以对低清晰度的图像进行处理,以输出对应的超分辨率图像的模型。
在根据本申请的一种可能的实施方式中,为了进一步提高得到的重建模型的重建效果,重建模型还可包括判别器,该判别器可以用于监督生成网络的训练。由此,在本实施方式中,生成网络为在训练好的判别器的监督下利用训练样本进行训练后所获得的生成网络。
在一些实施方式中,根据本申请的人像超分辨率重建模型训练方法还包括以下步骤:
构建判别器,利用所述判别器对所述输出图像进行判别处理;
根据得到的判别结果对所述判别器进行参数调整,直至满足第二预设条件时得到训练好的判别器。
在根据本申请的实施例中,判别器的主要实现原理是,尽可能地将真的图像(即满足要求的高分辨率图像)判别为真(例如输出判别结果为1),而将生成网络的输出图像尽可能地判别为假(例如输出判别结果为0),如此,可以监督生成网络不断进行训练,最终达到使判别器将生成网络的输出图像判别为真的效果。也即判别器作为生成网络的监督器,以不断优化生成网络的训练。
在利用判别器作为监督器以优化生成网络时,首先需要进行判别器的训练优化,使得判别器能够进行准确地判定。本实施例中,可预先构建判别器的损失函数,该损失函数可由生成网络的输出图像的判别信息以及判别器对目标样本的判别信息所构成。
对判别器的训练过程,即为对上述损失函数进行最小化的过程,在上述损失函数值不再下降以达到收敛时,可判定对判别器的训练满足第二预设条件,可得到训练好的判别器,即可将判别器固定下来。
在根据本申请的实施例中,在重建模型中加入判别器,以构成包含判别器和生成网络的对抗网络,可以进一步提高得到的重建模型的重建效果。
在一种可能的实施方式下,在重建模型中加入判别器以构成对抗网络的情形下,对生成网络的训练及调整,可加入判别器的相关判别性。
请参阅图16,根据本申请的人像超分辨率重建模型训练方法中的上述步骤S2400可以包括以下子步骤:
步骤S2410’,将所述输出图像输入至训练好的所述判别器得到判别信息;
步骤S2420’,比对所述输出图像和所述目标样本,得到比对结果;
步骤S2430’,根据所述判别信息和所述比对结果对所述生成网络进行网络参数调整后继续训练,直至满足第一预设条件时得到重建模型。
根据上述实施例,在加入判别器的情况下,可结合输出图像与目标样本之间的差异,以及判别器对输出图像的判别信息,以对生成网络进行训练调整。
在根据本申请的一些实施例中,可通过以下方式进行损失函数的构建,并利用构建的损失函数进行重建模型训练:
基于所述输出图像的像素信息和所述目标样本的像素信息之间的差异,构建第一损失函数;
基于所述输出图像中各个人脸关键点和所述目标样本中对应人脸关键点之间的差异,构建第二损失函数;
基于所述判别器对所述输出图像的判别信息构建第三损失函数,并基于预先构建的人像认知模型得到的所述输出图像和所述目标样本之间的图像差异构建第四损失函数;
根据所述判别信息和所述比对结果对所述生成网络进行网络参数调整后继续训练,直至所述第一损失函数、第二损失函数、第三损失函数和第四损失函数加权后得到的函数值满足第一预设条件时得到重建模型。
其中,输出图像与目标样本之间的差异对生成网络的调整所产生的影响可利用第一损失函数和第二损失函数来体现。判别器对输出图像的判别信息对生成网络的训练调整所产生的影响可利用第三损失函数来体现。此外,为了加强得到的超分辨率图像的人眼认知程度,还可加入由人像认知模型所得到的输出图像与目标样本之间的图像差异所构建的第四损失函数。
本实施例中,上述第一损失函数为基于输出图像的像素信息与对应的目标样本的像素信息之间的差异所构建,第二损失函数由输出图像中各个人脸关键点和目标样本中对应人脸关键点之间的差异所构建得到。由于构建判别器以监督生成网络的训练目的是为了使得生成网络得到的输出图像最终能够被判别器判别为真,因此,第三损失函数即由判别器对输出图像的判别信息所构建。而第四损失函数是由人像认知模型得到的输出图像与目标样本之间的人脸特征差异所构建得到。
最终得到的生成网络的损失函数可由上述第一损失函数、第二损失函数、第三损失函数和第四损失函数进行加权组合得到。
因此,根据本申请的实施例的人像超分辨率重建模型训练方法,可根据判别器的判别信息、输出图像与目标样本之间的比对结果对生成网络进行网络参数调整后继续训练,实质即为训练调整并进行上述组合后的损失函数的函数值的计算过程,直至由第一损失函数、第二损失函数、第三损失函数和第四损失函数加权后得到的函数值满足第一预设条件时,可得到训练好的重建模型。
本实施例中,通过加入判别器以监督生成网络的训练,可以提高得到的输出图像的人眼认知度,得到的图像清晰度更高。请参阅图17(a)至图17(c),其中,图17(a)为经过常规的插值处理后得到的图像,图17(b)为本申请中未加入判别器的实施方式下所得到的图像,而图17(c)为本申请中加入判别器的实施方式下所得到的图像。由图可见,本申请的方案下得到的图像相比常规的插值处理方式而言,清晰度明显更高、效果更好。而其中,加入判别器所得到的图像相比未加入判别器所得到的图像,在人眼认知上,图像更为清晰。
现在请参照图18。图18示出了本申请实施例提供的图像处理装置100的方框示意图。图像处理装置100应用于移动终端,包括:图像获取模块110、第一执行模块120及第二执行模块130。
图像获取模块110,可以被配置成用于获取待处理图像。
第一执行模块120,可以被配置成用于将待处理图像输入图像重建模型,利用图像重建模型的特征提取网络对待处理图像进行多尺度特征提取及扩展图像通道,得到重建特征图。
在可选的实施方式中,特征提取网络包括卷积层、多个级联块和多个第一卷积层,多个级联块和多个第一卷积层交替设置,特征提取网络采用全局级联结构;
第一执行模块120具体可以被配置成用于:将待处理图像输入卷积层进行卷积处理,得到初始特征图;将初始特征图作为第一个级联块的输入、以及将第N-1个第一卷积层的输出作为第N个级联块的输入,利用级联块进行多尺度特征提取,输出中间特征图;将初始特征图和第N个第一卷积层前每个级联块输出的中间特征图进行通道叠加,并在叠加后输入第N个第一卷积层进行卷积处理;将最后一个第一卷积层的输出作为重建特征图。
在可选的实施方式中,级联块包括多个残差块和多个第二卷积层,多个残差块和多个第二卷积层交替设置,级联块采用局部级联结构;
第一执行模块120可以执行利用级联块进行多尺度特征提取,输出中间特征图的方式,包括:将级联块的输入作为第一个残差块的输入、以及将第N-1个第二卷积层的输出作为第N个残差块的输入,利用残差块学习残差特征,得到残差特征图;将级联块的输入和第N个第二卷积层前每个残差块的输出进行通道叠加,并在叠加后输入第N个第二卷积层进行卷积处理;将最后一个第二卷积层的输出作为中间特征图。
在可选的实施方式中,残差块包括分组卷积层、第三卷积层和第四卷积层,分组卷积层采用ReLu激活函数,分组卷积层和第三卷积层连接形成残差路径,残差块采用局部跳跃连接结构;
第一执行模块120可以执行利用残差块学习残差特征,得到残差特征图的方式,包括:将残差块的输入作为分组卷积层的输入,通过残差路径提取特征;将残差块的输入和第三卷积层的输出进行特征融合,并在融合后输入第四卷积层进行卷积处理,输出残差特征图。
第二执行模块130,可以被配置成用于利用图像重建模型的子像素卷积层对重建特征图进行放大,得到重建图像。
在可选的实施方式中,第二执行模块130具体可以被配置成用于:利用子像素卷积层调整重建特征图中的像素位置,得到重建图像。
请参照图19,图19示出了本申请实施例提供的图像重建模型训练装置200的方框示意图。模型训练装置200应用于任何具有图像处理功能的电子设备,可以包括:样本获取模块210、第一处理模块220、第二处理模块230、第三处理模块240及第四处理模块250。
样本获取模块210,可以被配置成用于获取训练样本,训练样本包括低分辨率图像和高分辨率图像,低分辨率图像是对高分辨率图像 进行下采样得到的。
第一处理模块220,可以被配置成用于将低分辨率图像输入预先构建的图像重建模型,图像重建模型包括特征提取网络和子像素卷积层。
第二处理模块230,可以被配置成用于利用特征提取网络对低分辨率图像进行多尺度特征提取及扩展图像通道,得到训练特征图。
第三处理模块240,可以被配置成用于利用子像素卷积层对训练特征图进行放大,得到训练重建图像。
第四处理模块250,可以被配置成用于基于训练重建图像、高分辨率图像和预设的目标函数对图像重建模型进行反向传播训练,得到训练后的图像重建模型。
在可选的实施方式中,目标函数为L2损失函数;
第四处理模块250具体可以被配置成用于:基于训练重建图像、高分辨率图像和L2损失函数对图像重建模型进行反向传播训练,以对图像重建模型的参数进行调整,直至达到预设的训练完成条件,得到训练后的图像重建模型。
在可选的实施方式中,第一处理模块220还可以被配置成用于:对训练后的图像重建模型进行剪枝,以保留长线级联及删除短线级联。
在可选的实施方式中,第一处理模块220还可以被配置成用于:对低分辨率图像进行翻转对称处理,得到至少一个处理后的低分辨率图像。
第二处理模块230具体可以被配置成用于:将至少一个处理后的低分辨率图像输入图像重建模型。
第三处理模块240具体可以被配置成用于:利用特征提取网络对至少一个处理后的低分辨率图像进行多尺度特征提取,得到至少一个辅助特征图;对至少一个辅助特征图进行反翻转对称处理,并在反翻转对称处理后求平均值,得到所训练特征图。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的图像处理装置100和模型训练装置200的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。请参照图20,图20示出了本申请实施例提供的电子设备10的方框示意图。电子设备10可以是执行上述图像处理方法的移动终端,也可以是执行上述模型训练方法的任何具有图像处理功能的电子设备。电子设备10包括处理器11、存储器12及总线13,处理器11通过总线13与存储器12连接。
存储器12用于存储程序,例如图18所示的图像处理装置100、或者图19所示的模型训练装置200。以图像处理装置100为例,图像处理装置100包括至少一个可以软件或固件(firmware)的形式存储于存储器12中的软件功能模块,处理器11在接收到执行指令后,执行所述程序以实现上述实施例揭示的图像处理方法。
存储器12可能包括高速随机存取存储器(Random Access Memory,RAM),也可能还包括非易失存储器(non-volatile memory,NVM)。
处理器11可能是一种集成电路芯片,具有信号的处理能力。在实现过程中,上述方法的各步骤可以通过处理器11中的硬件的集成逻辑电路或者软件形式的指令完成。上述的处理器11可以是通用处理器,包括中央处理器(Central Processing Unit,CPU)、微控制单元(Microcontroller Unit,MCU)、复杂可编程逻辑器件(Complex Programmable Logic Device,CPLD)、现场可编程门阵列(Field Programmable Gate Array,FPGA)、嵌入式ARM等芯片。
本申请实施例还提供了一种计算机可读存储介质,其上存储有计算机程序,计算机程序被处理器11执行时实现上述实施例揭示的图像处理方法、或者模型训练方法。
综上所述,本申请实施例提供的一种图像处理和模型训练方法、装置、电子设备及存储介质,通过获取待处理图像并输入图像重建模型,图像重建模型包括特征提取网络和子像素卷积层,先利用特征提取网络对待处理图像进行多尺度特征提取及扩展图像通道得到重建特征图,再利用子像素卷积层对重建特征图进行放大得到重建图像。能够在保证重建效果的同时提高了处理速度。
请参阅图21,为本申请实施例提供的电子设备的示例性组件示意图。该电子设备可包括存储介质2110、处理器2120、机器可执行指令2130(该机器可执行指令2130可以是根据本申请的人像超分辨率重建装置131或人像超分辨率重建模型训练装置132)及通信接口140。本实施例中,存储介质2110与处理器2120均位于电子设备中且二者分离设置。然而,应当理解的是,存储介质2110也可以是独立于电子设备之外,且可以由处理器2120通过总线接口来访问。可替换地,存储介质2110也可以集成到处理器2120中,例如,可以是高速缓存和/或通用寄存器。
机器可执行指令2130可以理解为图21所述的电子设备,或电子设备的处理器2120,也可以理解为独立于图21所述的电子设备或处理器2120之外的在电子设备控制下实现上述人像超分辨率重建方法或人像超分辨率重建模型训练方法的软件功能模块。
如图22所示,上述人像超分辨率重建装置131可以包括检测模块1311、处理模块1312和复原模块1313。下面分别对该人像超分辨率重建装置131的各个功能模块的功能进行详细阐述。
检测模块1311,可以被配置成用于利用预先构建的重建模型对待处理图像进行关键点检测,得到人脸关键点;
可以理解,该检测模块1311可以被配置成用于执行上述步骤S110,关于该检测模块1311的详细实现方式可以参照上述对步骤S110有关的内容。
处理模块1312,可以被配置成用于根据所述人脸关键点和基于所述待处理图像得到的图像特征进行超分辨率重建处理,得到图像高频信息;
可以理解,该处理模块1312可以被配置成用于执行上述步骤S120,关于该处理模块1312的详细实现方式可以参照上述对步骤S120有关的内容。
复原模块1313,可以被配置成用于利用所述图像高频信息对所述待处理图像进行复原处理,得到所述待处理图像对应的超分辨率图像。
可以理解,该复原模块1313可以被配置成用于执行上述步骤S130,关于该复原模块1313的详细实现方式可以参照上述对步骤S130有关的内容。
所述人像超分辨率重建装置还可以包括:根据图18所述的图像处理装置,所述图像处理装置被配置成用于进行超分辨率重建处理。
在一种可能的实现方式中,所述关键点检测、超分辨率重建处理及复原处理包括多轮迭代处理,所述待处理图像为未经处理的待处理图像,或前一轮迭代中经过所述关键点检测、超分辨率重建处理以及复原处理后得到的超分辨率图像。
在一种可能的实现方式中,所述人脸关键点包括多个,上述复原模块1313可以用于通过以下方式得到超分辨率图像:
利用预先构建的人像认知模型对所述待处理图像进行处理,输出各所述人脸关键点的位置信息;
基于各所述人脸关键点的位置信息以及所述图像高频信息,对所述待处理图像进行复原处理,得到所述待处理图像对应的超分辨率图像。
在一种可能的实现方式中,复原模块1313可以用于通过以下方式基于各人脸关键点的位置信息以及图像高频信息,得到超分辨率图像:
获取各所述人脸关键点对应的复原属性;
根据各所述人脸关键点以及其对应的位置信息、图像高频信息、复原属性,对所述待处理图像中对应人脸关键点进行复原处理。
在一种可能的实现方式中,所述重建模型包括判别器和生成网络,所述生成网络为在训练好的判别器的监督下,利用训练样本进行训练后获得。
在一种可能的实现方式中,所述人脸关键点包括左眼、右眼、鼻子、嘴巴及下巴轮廓。
关于装置中的各模块的处理流程、以及各模块之间的交互流程的描述可以参照上述方法实施例中的相关说明,这里不再详述。如图23所示,上述人像超分辨率重建模型训练装置132可以包括获取模块1321、关键点获得模块1322、输出图像获得模块1323及训练模块1324。下面分别对该人像超分辨率重建模型训练装置132的各个功能模块的功能进行详细阐述。
获取模块1321,可以被配置成用于获取训练样本以及所述训练样本对应的目标样本;
可以理解,该获取模块1321可以被配置成用于执行上述步骤S2100,关于该获取模块1321的详细实现方式可以参照上述对步骤S2100有关的内容。
关键点获得模块1322,可以被配置成用于利用构建的生成网络对所述训练样本进行关键点检测,得到训练关键点;
可以理解,该关键点获得模块1322可以被配置成用于执行上述步骤S2200,关于该关键点获得模块1322的详细实现方式可以参照上述对步骤S2200有关的内容。
输出图像获得模块1323,可以被配置成用于基于所述训练关键点和所述训练样本进行超分辨率重建处理和复原处理,得到输出图像;
可以理解,该输出图像获得模块1323可以被配置成用于执行上述步骤S2300,关于该输出图像获得模块1323的详细实现方式可以参照上述对步骤S2300有关的内容。
训练模块1324,可以被配置成用于比对所述输出图像和所述目标样本,并基于比对结果对所述生成网络进行网络参数调整后继续训练,直至满足第一预设条件时得到重建模型。
可以理解,该训练模块1324可以被配置成用于执行上述步骤S2400,关于该训练模块1324的详细实现方式可以参照上述对步骤S2400有关的内容。
在一种可能的实现方式中,训练模块1324可以被配置成用于基于输出图像和目标样本之间比对结果,并通过以下方式得到重建模型:
基于所述输出图像的像素信息和所述目标样本的像素信息之间的差异,构建第一损失函数;
基于所述输出图像中各个人脸关键点和所述目标样本中对应人脸关键点之间的差异,构建第二损失函数;
比对所述输出图像和所述目标样本,并基于比对结果对所述生成网络进行网络参数调整后继续训练,直至所述第一损失函数和所述第二损失函数加权后的函数值满足第一预设条件时得到所述重建模型。
在一种可能的实现方式中,所述重建模型还包括判别器,所述判别器用于监督所述生成网络的训练,人像超分辨率重建模型训练装置132还包括构建模块,该构建模块用于:
构建判别器,利用所述判别器对所述输出图像以及所述输出图像对应的目标样本进行判别处理;
根据得到的判别结果对所述判别器进行参数调整,直至满足第二预设条件时得到训练好的判别器。
在一种可能的实现方式中,训练模块1324可以通过以下方式得到重建模型:
将所述输出图像输入至训练好的所述判别器得到判别信息;
比对所述输出图像和所述目标样本,得到比对结果;
根据所述判别信息和所述比对结果对所述生成网络进行网络参数调整后继续训练,直至满足第一预设条件时得到所述重建模型。
在一种可能的实现方式中,训练模块1324可以被配置成用于基于判别信息和比对结果,并通过以下方式构建重建模型:
基于所述输出图像的像素信息和所述目标样本的像素信息之间的差异,构建第一损失函数;
基于所述输出图像中各个人脸关键点和所述目标样本中对应人脸关键点之间的差异,构建第二损失函数;
基于所述判别器对所述输出图像的判别信息构建第三损失函数,并基于预先构建的人像认知模型得到的所述输出图像和所述目标样本之间的图像差异构建第四损失函数;
根据所述判别信息和所述比对结果对所述生成网络进行网络参数调整后继续训练,直至所述第一损失函数、第二损失函数、第三损失函数和第四损失函数加权后得到的函数值满足第一预设条件时得到所述重建模型。
关于装置中的各模块的处理流程、以及各模块之间的交互流程的描述可以参照上述方法实施例中的相关说明,这里不再详述。
进一步地,本申请实施例还提供一种计算机可读存储介质,计算机可读存储介质存储有机器可执行指令130,机器可执行指令130被执行时实现上述实施例提供的人像超分辨率重建方法或人像超分辨率重建模型训练方法。
具体地,该计算机可读存储介质能够为通用的存储介质,如移动磁盘、硬盘等,该计算机可读存储介质上的计算机程序被运行时,能够执行上述人像超分辨率重建方法或人像超分辨率重建模型训练方法。关于计算机可读存储介质中的及其可执行指令被运行时,所涉及的过程,可以参照上述方法实施例中的相关说明,这里不再详述。
综上所述,本申请实施例提供的人像超分辨率重建方法、人像超分辨率重建模型训练方法、装置、电子设备和可读存储介质,通过利用预先构建的重建模型对待处理图像进行关键点检测,得到人脸关键点,再根据人脸关键点和基于待处理图像得到的图像特征进行超分辨率重建处理,得到图像高频信息,利用图像高频信息对待处理图像进行复原处理,得到待处理图像对应的超分辨率图像。本申请中,结合人脸关键点检测以及人脸恢复,实现图像的超分辨率重建,提高得到的超分辨率图像的认知度,符合实际应用中用户的需求。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到的变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。
本申请提供了一种图像处理方法和人像超分辨率重建方法,图像重建模型训练方法和人像超分辨率重建模型训练方法、以及相关的装置、电子设备及存储介质,通过获取待处理图像并输入图像重建模型,图像重建模型包括特征提取网络和子像素卷积层,先利用特征提取网络对待处理图像进行多尺度特征提取及扩展图像通道得到重建特征图,再利用子像素卷积层对重建特征图进行放大得到重建图像。由于特征提取网络能够提取多尺度特征和扩展图像通道,因此,不需要增加网络深度就能够得到较好的重建效果;同时,模型末端采用子像素卷积层做图像放大,特征提取网络以小尺寸图像做处理,大幅减少了计算量和参数量;从而在保证重建效果的同时提高了处理速度。
此外,可以理解的是,根据本申请的图像处理方法和人像超分辨率重建方法,图像重建模型训练方法和人像超分辨率重建模型训练方法、以及相关的装置、电子设备及存储介质是可以重现的,并且可以用在多种工业应用中。例如,本申请的图像处理方法和人像超分辨率重建方法,图像重建模型训练方法和人像超分辨率重建模型训练方法、以及相关的装置、电子设备及存储介质可以用于需要用对低分辨率图像或图像序列进行图像超分辨率重建的任何装置。
Claims (31)
- 一种图像处理方法,其特征在于,所述图像处理方法包括:获取待处理图像;将所述待处理图像输入图像重建模型,利用所述图像重建模型的特征提取网络对所述待处理图像进行多尺度特征提取及扩展图像通道,得到重建特征图;利用所述图像重建模型的子像素卷积层对所述重建特征图进行放大,得到重建图像。
- 根据权利要求1所述的图像处理方法,其特征在于,所述特征提取网络包括卷积层、多个级联块和多个第一卷积层,多个所述级联块和多个所述第一卷积层交替设置,所述特征提取网络采用全局级联结构;所述利用所述图像重建模型的特征提取网络对所述待处理图像进行多尺度特征提取,得到重建特征图的步骤,包括:将所述待处理图像输入所述卷积层进行卷积处理,得到初始特征图;将所述初始特征图作为第一个所述级联块的输入、以及将第N-1个所述第一卷积层的输出作为第N个所述级联块的输入,利用所述级联块进行多尺度特征提取,输出中间特征图;将所述初始特征图和第N个所述第一卷积层前每个所述级联块输出的所述中间特征图进行通道叠加,并在叠加后输入第N个所述第一卷积层进行卷积处理;将最后一个所述第一卷积层的输出作为所述重建特征图。
- 根据权利要求2所述的图像处理方法,其特征在于,所述级联块的数量为3至5,所述第一卷积层的数量为3至5。
- 根据权利要求2或3所述的图像处理方法,其特征在于,所述级联块包括多个残差块和多个第二卷积层,多个所述残差块和多个所述第二卷积层交替设置,所述级联块采用局部级联结构;所述利用所述级联块进行多尺度特征提取,输出中间特征图的步骤,包括:将所述级联块的输入作为第一个所述残差块的输入、以及将第N-1个所述第二卷积层的输出作为第N个所述残差块的输入,利用所述残差块学习残差特征,得到残差特征图;将所述级联块的输入和第N个所述第二卷积层前每个所述残差块的输出进行通道叠加,并在叠加后输入第N个所述第二卷积层进行卷积处理;将最后一个所述第二卷积层的输出作为所述中间特征图。
- 根据权利要求4所述的图像处理方法,其特征在于,所述残差块的数量为3至5,所述第二卷积层的数量为3至5。
- 根据权利要求4或5所述的图像处理方法,其特征在于,所述残差块包括分组卷积层、第三卷积层和第四卷积层,所述分组卷积层采用ReLu激活函数,所述分组卷积层和所述第三卷积层连接形成残差路径,所述残差块采用局部跳跃连接结构;所述利用所述残差块学习残差特征,得到残差特征图的步骤,包括:将所述残差块的输入作为所述分组卷积层的输入,通过所述残差路径提取特征;将所述残差块的输入和所述第三卷积层的输出进行特征融合,并在融合后输入所述第四卷积层进行卷积处理,输出所述残差特征图。
- 根据权利要求1至6中任一项所述的图像处理方法,其特征在于,所述利用所述图像重建模型的子像素卷积层对所述重建特征图进行放大,得到重建图像的步骤,包括:利用所述子像素卷积层调整所述重建特征图中的像素位置,得到所述重建图像。
- 一种图像重建模型训练方法,其特征在于,所述图像重建模型训练方法包括:获取训练样本,所述训练样本包括低分辨率图像和高分辨率图像,所述低分辨率图像是对所述高分辨率图像进行下采样得到的;将所述低分辨率图像输入预先构建的图像重建模型,所述图像重建模型包括特征提取网络和子像素卷积层;利用所述特征提取网络对所述低分辨率图像进行多尺度特征提取及扩展图像通道,得到训练特征图;利用所述子像素卷积层对所述训练特征图进行放大,得到训练重建图像;基于所述训练重建图像、所述高分辨率图像和预设的目标函数对所述图像重建模型进行反向传播训练,得到训练后的图像重建模型。
- 根据权利要求8所述的图像重建模型训练方法,其特征在于,所述目标函数为L2损失函数;所述基于所述训练重建图像、所述高分辨率图像和预设的目标函数对所述图像重建模型进行反向传播训练,得到训练后的图像重建模型的步骤:基于所述训练重建图像、所述高分辨率图像和所述L2损失函数对所述图像重建模型进行反向传播训练,以对所述图像重建模型的参数进行调整,直至达到预设的训练完成条件,得到训练后的图像重建模型。
- 根据权利要求8或9所述的图像重建模型训练方法,其特征在于,所述图像重建模型训练方法还包括:对所述训练后的图像重建模型进行剪枝,以保留长线级联及删除短线级联。
- 根据权利要求8至10中任一项所述的图像重建模型训练方法,其特征在于,所述将所述低分辨率图像输入预先构建的图像重建模型的步骤之前,所述图像重建模型训练方法还包括:对所述低分辨率图像进行自减平均值处理,以突出所述低分辨率图像的纹理细节。
- 根据权利要求8至11中任一项所述的图像重建模型训练方法,其特征在于,所述将所述低分辨率图像输入预先构建的图像重建模型的步骤之前,所述图像重建模型训练方法还包括:对所述低分辨率图像进行翻转对称处理,得到至少一个处理后的低分辨率图像;所述将所述低分辨率图像输入预先构建的图像重建模型的步骤,包括:将所述至少一个处理后的低分辨率图像输入所述图像重建模型;所述利用所述特征提取网络对所述低分辨率图像进行多尺度特征提取,得到训练特征图的步骤,包括:利用所述特征提取网络对所述至少一个处理后的低分辨率图像进行多尺度特征提取,得到至少一个辅助特征图;对至少一个辅助特征图进行反翻转对称处理,并在反翻转对称处理后求平均值,得到所述训练特征图。
- 一种图像处理装置,其特征在于,所述图像处理装置包括:图像获取模块,被配置成用于获取待处理图像;第一执行模块,被配置成用于将所述待处理图像输入图像重建模型,利用所述图像重建模型的特征提取网络对所述待处理图像进行多尺度特征提取及扩展图像通道,得到重建特征图;第二执行模块,被配置成用于利用所述图像重建模型的子像素卷积层对所述重建特征图进行放大,得到重建图像。
- 一种图像重建模型训练装置,其特征在于,所述图像重建模型训练装置包括:样本获取模块,被配置成用于获取训练样本,所述训练样本包括低分辨率图像和高分辨率图像,所述低分辨率图像是对所述高分辨率图像进行下采样得到的;第一处理模块,被配置成用于将所述低分辨率图像输入预先构建的图像重建模型,所述图像重建模型包括特征提取网络和子像素卷积层;第二处理模块,被配置成用于利用所述特征提取网络对所述低分辨率图像进行多尺度特征提取及扩展图像通道,得到训练特征图;第三处理模块,被配置成用于利用所述子像素卷积层对所述训练特征图进行放大,得到训练重建图像;第四处理模块,被配置成用于基于所述训练重建图像、所述高分辨率图像和预设的目标函数对所述图像重建模型进行反向传播训练,得到训练后的图像重建模型。
- 一种人像超分辨率重建方法,其特征在于,所述人像超分辨率重建方法包括:利用图像重建模型对待处理图像进行关键点检测,得到人脸关键点;根据所述人脸关键点和基于所述待处理图像得到的图像特征进行超分辨率重建处理,得到图像高频信息;利用所述图像高频信息对所述待处理图像进行复原处理,得到所述待处理图像对应的超分辨率图像。
- 根据权利要求15所述的人像超分辨率重建方法,其特征在于,使用根据权利要求1至7中任一项所述的图像处理方法来进行超分辨率重建处理。
- 根据权利要求15或16所述的人像超分辨率重建方法,其特征在于,所述关键点检测、超分辨率重建处理及复原处理包括多轮迭代处理,所述待处理图像为未经处理的待处理图像,或前一轮迭代中经过所述关键点检测、超分辨率重建处理以及复原处理后得到的超分辨率图像。
- 根据权利要求15至17中任一项所述的人像超分辨率重建方法,其特征在于,所述人脸关键点包括多个,所述利用所述图像高频信息对所述待处理图像进行复原处理,得到所述待处理图像对应的超分辨率图像的步骤,包括:利用预先构建的人像认知模型对所述待处理图像进行处理,输出各所述人脸关键点的位置信息;基于各所述人脸关键点的位置信息以及所述图像高频信息,对所述待处理图像进行复原处理,得到所述待处理图像对应的超分辨率图像。
- 根据权利要求18所述的人像超分辨率重建方法,其特征在于,所述基于各所述人脸关键点的位置信息以及所述图像高频信息,对所述待处理图像进行复原处理,得到所述待处理图像对应的超分辨率图像的步骤,包括:获取各所述人脸关键点对应的复原属性;根据各所述人脸关键点以及其对应的位置信息、图像高频信息、复原属性,对所述待处理图像中对应人脸关键点进行复原处理。
- 根据权利要求15至19中任一项所述的人像超分辨率重建方法,其特征在于,所述重建模型包括判别器和生成网络,所述生成网络为在训练好的判别器的监督下,利用训练样本进行训练后获得。
- 根据权利要求15至20中任意一项所述的人像超分辨率重建方法,其特征在于,所述人脸关键点包括左眼、右眼、鼻子、嘴巴及下巴轮廓。
- 一种人像超分辨率重建模型训练方法,其特征在于,所述人像超分辨率重建模型训练方法包括:获取训练样本以及所述训练样本对应的目标样本;利用构建的生成网络对所述训练样本进行关键点检测,得到训练关键点;基于所述训练关键点和所述训练样本进行超分辨率重建处理和复原处理,得到输出图像;比对所述输出图像和所述目标样本,并基于比对结果对所述生成网络进行网络参数调整后继续训练,直至满足第一预设条件时得到重建模型。
- 根据权利要求22所述的人像超分辨率重建模型训练方法,其特征在于,所述比对所述输出图像和所述目标样本,并基于比对结果对所述生成网络进行网络参数调整后继续训练,直至满足第一预设条件时得到重建模型的步骤,包括:基于所述输出图像的像素信息和所述目标样本的像素信息之间的差异,构建第一损失函数;基于所述输出图像中各个人脸关键点和所述目标样本中对应人脸关键点之间的差异,构建第二损失函数;比对所述输出图像和所述目标样本,并基于比对结果对所述生成网络进行网络参数调整后继续训练,直至所述第一损失函数和所述第二损失函数加权后的函数值满足第一预设条件时得到重建模型。
- 根据权利要求22或23所述的人像超分辨率重建模型训练方法,其特征在于,所述重建模型还包括判别器,所述判别器用于监督所述生成网络的训练,所述人像超分辨率重建模型训练方法包括:构建判别器,利用所述判别器对所述输出图像以及所述输出图像对应的目标样本进行判别处理;根据得到的判别结果对所述判别器进行参数调整,直至满足第二预设条件时得到训练好的判别器。
- 根据权利要求22至24中任一项所述的人像超分辨率重建模型训练方法,其特征在于,所述比对所述输出图像和所述目标样本,并基于比对结果对所述生成网络进行网络参数调整后继续训练,直至满足第一预设条件时得到重建模型的步骤,包括:将所述输出图像输入至训练好的所述判别器得到判别信息;比对所述输出图像和所述目标样本,得到比对结果;根据所述判别信息和所述比对结果对所述生成网络进行网络参数调整后继续训练,直至满足第一预设条件时得到重建模型。
- 根据权利要求25所述的人像超分辨率重建模型训练方法,其特征在于,所述根据所述判别信息和所述比对结果对所述生成网络进行网络参数调整后继续训练,直至满足第一预设条件时得到重建模型的步骤,包括:基于所述输出图像的像素信息和所述目标样本的像素信息之间的差异,构建第一损失函数;基于所述输出图像中各个人脸关键点和所述目标样本中对应人脸关键点之间的差异,构建第二损失函数;基于所述判别器对所述输出图像的判别信息构建第三损失函数,并基于预先构建的人像认知模型得到的所述输出图像和所述目标样本之间的图像差异构建第四损失函数;根据所述判别信息和所述比对结果对所述生成网络进行网络参数调整后继续训练,直至所述第一损失函数、第二损失函数、第三损失函数和第四损失函数加权后得到的函数值满足第一预设条件时得到重建模型。
- 一种人像超分辨率重建装置,其特征在于,所述人像超分辨率重建装置包括:检测模块,被配置成用于利用预先构建的重建模型对待处理图像进行关键点检测,得到人脸关键点;处理模块,被配置成用于根据所述人脸关键点和基于所述待处理图像得到的图像特征进行超分辨率重建处理,得到图像高频信息;复原模块,被配置成用于利用所述图像高频信息对所述待处理图像进行复原处理,得到所述待处理图像对应的超分辨率图像。
- 根据权利要求27所述的人像超分辨率重建装置,其特征在于,所述处理模块中包括根据权利要求13所述的图像处理装置以用于 进行超分辨率重建处理。
- 一种人像超分辨率重建模型训练装置,其特征在于,所述人像超分辨率重建模型训练装置包括:获取模块,被配置成用于获取训练样本以及所述训练样本对应的目标样本;关键点获得模块,被配置成用于利用构建的生成网络对所述训练样本进行关键点检测,得到训练关键点;输出图像获得模块,被配置成用于基于所述训练关键点和所述训练样本进行超分辨率重建处理和复原处理,得到输出图像;训练模块,被配置成用于比对所述输出图像和所述目标样本,并基于比对结果对所述生成网络进行网络参数调整后继续训练,直至满足第一预设条件时得到重建模型。
- 一种电子设备,其特征在于,所述电子设备包括:一个或多个处理器;一个或多个存储介质,用于存储一个或多个机器可执行指令,当所述一个或多个机器可执行指令被所述一个或多个处理器执行时,使得所述一个或多个处理器实现根据权利要求1-7中任一项所述的图像处理方法,或者根据权利要求8-12中任一项所述的图像重建模型训练方法,或者根据权利要求15-21中任一项所述的人像超分辨率重建方法,或者根据权利要求22-26中任一项所述的人像超分辨率重建模型训练方法。
- 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质存储有机器可执行指令,所述机器可执行指令被执行时实现根据权利要求1-7中任一项所述的图像处理方法,或者根据权利要求8-12中任一项所述的图像重建模型训练方法,或者根据权利要求15-21中任一项所述的人像超分辨率重建方法,或者根据权利要求22-26中任一项所述的人像超分辨率重建模型训练方法。
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010977254.4 | 2020-09-16 | ||
CN202010977254.4A CN114266697A (zh) | 2020-09-16 | 2020-09-16 | 图像处理和模型训练方法、装置、电子设备及存储介质 |
CN202011000670.5A CN114298901A (zh) | 2020-09-22 | 2020-09-22 | 人像超分辨率重建方法、模型训练方法、装置、电子设备和可读存储介质 |
CN202011000670.5 | 2020-09-22 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2022057837A1 true WO2022057837A1 (zh) | 2022-03-24 |
Family
ID=80776497
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2021/118591 WO2022057837A1 (zh) | 2020-09-16 | 2021-09-15 | 图像处理和人像超分辨率重建及模型训练方法、装置、电子设备及存储介质 |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2022057837A1 (zh) |
Cited By (35)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114049254A (zh) * | 2021-10-29 | 2022-02-15 | 华南农业大学 | 低像素牛头图像重建识别方法、系统、设备及存储介质 |
CN114663288A (zh) * | 2022-04-11 | 2022-06-24 | 桂林电子科技大学 | 一种单轴向头部mri超分辨率重建方法 |
CN114757864A (zh) * | 2022-04-21 | 2022-07-15 | 西安交通大学 | 一种基于多尺度特征解耦的多层级细粒度图像生成方法 |
CN114841961A (zh) * | 2022-05-05 | 2022-08-02 | 扬州大学 | 基于图像增强和改进YOLOv5的小麦赤霉病检测方法 |
CN114841861A (zh) * | 2022-05-23 | 2022-08-02 | 北京邮电大学 | 基于经验模态分解的图像超分辨系统 |
CN114926568A (zh) * | 2022-05-30 | 2022-08-19 | 京东科技信息技术有限公司 | 模型训练方法、图像生成方法和装置 |
CN114936969A (zh) * | 2022-06-01 | 2022-08-23 | 西安商汤智能科技有限公司 | 热图像重建方法、网络的训练方法、装置、设备及介质 |
CN114943639A (zh) * | 2022-05-24 | 2022-08-26 | 北京瑞莱智慧科技有限公司 | 图像获取方法、相关装置及存储介质 |
CN114972041A (zh) * | 2022-07-28 | 2022-08-30 | 中国人民解放军国防科技大学 | 基于残差网络的极化雷达图像超分辨率重建方法和装置 |
CN114972561A (zh) * | 2022-05-16 | 2022-08-30 | 华南理工大学 | 基于信息复杂度分类的人脸超分辨率系统的人脸重建方法 |
CN115063803A (zh) * | 2022-05-31 | 2022-09-16 | 北京开拓鸿业高科技有限公司 | 图像处理方法、装置、存储介质及电子设备 |
CN115272706A (zh) * | 2022-07-28 | 2022-11-01 | 腾讯科技(深圳)有限公司 | 一种图像处理方法、装置、计算机设备及存储介质 |
CN115331077A (zh) * | 2022-08-22 | 2022-11-11 | 北京百度网讯科技有限公司 | 特征提取模型的训练方法、目标分类方法、装置、设备 |
CN115409716A (zh) * | 2022-11-01 | 2022-11-29 | 杭州网易智企科技有限公司 | 视频处理方法、装置、存储介质及设备 |
CN115409755A (zh) * | 2022-11-03 | 2022-11-29 | 腾讯科技(深圳)有限公司 | 贴图处理方法和装置、存储介质及电子设备 |
CN115546030A (zh) * | 2022-11-30 | 2022-12-30 | 武汉大学 | 基于孪生超分辨率网络的压缩视频超分辨率方法及系统 |
CN115908142A (zh) * | 2023-01-06 | 2023-04-04 | 诺比侃人工智能科技(成都)股份有限公司 | 一种基于视觉识别的接触网微小部件验损方法 |
CN115937794A (zh) * | 2023-03-08 | 2023-04-07 | 北京龙智数科科技服务有限公司 | 小目标对象检测方法、装置、电子设备及存储介质 |
CN115953296A (zh) * | 2022-12-09 | 2023-04-11 | 中山大学·深圳 | 一种基于transformer和卷积神经网络联合的人脸超分辨率重建方法和系统 |
CN116091712A (zh) * | 2023-04-12 | 2023-05-09 | 安徽大学 | 一种面向计算资源受限设备的多视图立体重建方法与系统 |
CN116309591A (zh) * | 2023-05-19 | 2023-06-23 | 杭州健培科技有限公司 | 一种医学影像3d关键点检测方法、模型训练方法及装置 |
CN116385318A (zh) * | 2023-06-06 | 2023-07-04 | 湖南纵骏信息科技有限公司 | 一种基于云桌面的图像画质增强方法及系统 |
CN116452424A (zh) * | 2023-05-19 | 2023-07-18 | 山东大学 | 一种基于双重广义蒸馏的人脸超分辨率重构方法及系统 |
CN117097876A (zh) * | 2023-07-07 | 2023-11-21 | 天津大学 | 基于神经网络的事件相机图像重建方法 |
CN117196947A (zh) * | 2023-09-06 | 2023-12-08 | 南通大学 | 一种面向高分辨率图像的高效压缩重建模型构建方法 |
CN117238020A (zh) * | 2023-11-10 | 2023-12-15 | 杭州启源视觉科技有限公司 | 人脸识别方法、装置和计算机设备 |
CN117425013A (zh) * | 2023-12-19 | 2024-01-19 | 杭州靖安防务科技有限公司 | 一种基于可逆架构的视频传输方法和系统 |
CN117575916A (zh) * | 2024-01-19 | 2024-02-20 | 青岛漫斯特数字科技有限公司 | 基于深度学习的图像质量优化方法、系统、设备及介质 |
CN117612017A (zh) * | 2024-01-23 | 2024-02-27 | 江西啄木蜂科技有限公司 | 一种环境自适应的遥感影像变化检测方法 |
CN117611442A (zh) * | 2024-01-19 | 2024-02-27 | 第六镜科技(成都)有限公司 | 一种近红外人脸图像生成方法 |
WO2024078403A1 (zh) * | 2022-10-13 | 2024-04-18 | 维沃移动通信有限公司 | 图像处理方法、装置及设备 |
CN118071603A (zh) * | 2024-04-19 | 2024-05-24 | 浙江优众新材料科技有限公司 | 空间角度信息交互的光场图像超分辨率方法、装置及介质 |
CN118096536A (zh) * | 2024-04-29 | 2024-05-28 | 中国科学院长春光学精密机械与物理研究所 | 基于超图神经网络的遥感高光谱图像超分辨率重构方法 |
CN118397130A (zh) * | 2024-07-01 | 2024-07-26 | 山东第一医科大学附属肿瘤医院(山东省肿瘤防治研究院、山东省肿瘤医院) | 一种肿瘤放疗效果ct图像处理方法 |
WO2024160178A1 (zh) * | 2023-01-30 | 2024-08-08 | 厦门美图之家科技有限公司 | 图像翻译模型的训练、图像翻译方法、设备及存储介质 |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0959433A2 (en) * | 1998-05-20 | 1999-11-24 | Itt Manufacturing Enterprises, Inc. | Super resolution apparatus and methods for electro-optical systems |
CN107154023A (zh) * | 2017-05-17 | 2017-09-12 | 电子科技大学 | 基于生成对抗网络和亚像素卷积的人脸超分辨率重建方法 |
CN109903219A (zh) * | 2019-02-28 | 2019-06-18 | 深圳市商汤科技有限公司 | 图像处理方法及装置、电子设备、计算机可读存储介质 |
CN110782395A (zh) * | 2019-10-28 | 2020-02-11 | 西安电子科技大学 | 图像处理方法及装置、电子设备和计算机可读存储介质 |
CN110992265A (zh) * | 2019-12-02 | 2020-04-10 | 北京数码视讯科技股份有限公司 | 图像处理方法及模型、模型的训练方法及电子设备 |
CN111192200A (zh) * | 2020-01-02 | 2020-05-22 | 南京邮电大学 | 基于融合注意力机制残差网络的图像超分辨率重建方法 |
CN111461983A (zh) * | 2020-03-31 | 2020-07-28 | 华中科技大学鄂州工业技术研究院 | 一种基于不同频度信息的图像超分辨率重建模型及方法 |
CN111488779A (zh) * | 2019-07-19 | 2020-08-04 | 同观科技(深圳)有限公司 | 视频图像超分辨率重建方法、装置、服务器及存储介质 |
-
2021
- 2021-09-15 WO PCT/CN2021/118591 patent/WO2022057837A1/zh active Application Filing
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0959433A2 (en) * | 1998-05-20 | 1999-11-24 | Itt Manufacturing Enterprises, Inc. | Super resolution apparatus and methods for electro-optical systems |
CN107154023A (zh) * | 2017-05-17 | 2017-09-12 | 电子科技大学 | 基于生成对抗网络和亚像素卷积的人脸超分辨率重建方法 |
CN109903219A (zh) * | 2019-02-28 | 2019-06-18 | 深圳市商汤科技有限公司 | 图像处理方法及装置、电子设备、计算机可读存储介质 |
CN111488779A (zh) * | 2019-07-19 | 2020-08-04 | 同观科技(深圳)有限公司 | 视频图像超分辨率重建方法、装置、服务器及存储介质 |
CN110782395A (zh) * | 2019-10-28 | 2020-02-11 | 西安电子科技大学 | 图像处理方法及装置、电子设备和计算机可读存储介质 |
CN110992265A (zh) * | 2019-12-02 | 2020-04-10 | 北京数码视讯科技股份有限公司 | 图像处理方法及模型、模型的训练方法及电子设备 |
CN111192200A (zh) * | 2020-01-02 | 2020-05-22 | 南京邮电大学 | 基于融合注意力机制残差网络的图像超分辨率重建方法 |
CN111461983A (zh) * | 2020-03-31 | 2020-07-28 | 华中科技大学鄂州工业技术研究院 | 一种基于不同频度信息的图像超分辨率重建模型及方法 |
Non-Patent Citations (2)
Title |
---|
LI SUMEI, LEI GUOQING;FAN RU: "Depth Map Super-Resolution Based on Two-Channel Convolutional Neural Network", ACTA OPTICA SINICA, SHANGHAI KEXUE JISHU CHUBANSHE , SHANGHAI, CN, vol. 38, no. 10, 31 October 2018 (2018-10-31), CN , pages 136 - 142, XP055911996, ISSN: 0253-2239, DOI: 10.3788/AOS201838.1010002 * |
LI WEI, XUDONG ZHANG: "Depth image super-resolution reconstruction based on convolution neural network", JOURNAL OF ELECTRONIC MEASUREMENT AND INSTRUMENT, vol. 31, no. 12, 31 December 2017 (2017-12-31), pages 1918 - 1928, XP055911992, ISSN: 1000-7105, DOI: 10.13382/j.jemi.2017.12.006 * |
Cited By (51)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114049254A (zh) * | 2021-10-29 | 2022-02-15 | 华南农业大学 | 低像素牛头图像重建识别方法、系统、设备及存储介质 |
CN114663288A (zh) * | 2022-04-11 | 2022-06-24 | 桂林电子科技大学 | 一种单轴向头部mri超分辨率重建方法 |
CN114757864A (zh) * | 2022-04-21 | 2022-07-15 | 西安交通大学 | 一种基于多尺度特征解耦的多层级细粒度图像生成方法 |
CN114841961B (zh) * | 2022-05-05 | 2024-04-05 | 扬州大学 | 基于图像增强和改进YOLOv5的小麦赤霉病检测方法 |
CN114841961A (zh) * | 2022-05-05 | 2022-08-02 | 扬州大学 | 基于图像增强和改进YOLOv5的小麦赤霉病检测方法 |
CN114972561A (zh) * | 2022-05-16 | 2022-08-30 | 华南理工大学 | 基于信息复杂度分类的人脸超分辨率系统的人脸重建方法 |
CN114841861A (zh) * | 2022-05-23 | 2022-08-02 | 北京邮电大学 | 基于经验模态分解的图像超分辨系统 |
CN114943639A (zh) * | 2022-05-24 | 2022-08-26 | 北京瑞莱智慧科技有限公司 | 图像获取方法、相关装置及存储介质 |
CN114926568A (zh) * | 2022-05-30 | 2022-08-19 | 京东科技信息技术有限公司 | 模型训练方法、图像生成方法和装置 |
CN115063803A (zh) * | 2022-05-31 | 2022-09-16 | 北京开拓鸿业高科技有限公司 | 图像处理方法、装置、存储介质及电子设备 |
CN114936969A (zh) * | 2022-06-01 | 2022-08-23 | 西安商汤智能科技有限公司 | 热图像重建方法、网络的训练方法、装置、设备及介质 |
CN114972041A (zh) * | 2022-07-28 | 2022-08-30 | 中国人民解放军国防科技大学 | 基于残差网络的极化雷达图像超分辨率重建方法和装置 |
CN115272706A (zh) * | 2022-07-28 | 2022-11-01 | 腾讯科技(深圳)有限公司 | 一种图像处理方法、装置、计算机设备及存储介质 |
CN115331077A (zh) * | 2022-08-22 | 2022-11-11 | 北京百度网讯科技有限公司 | 特征提取模型的训练方法、目标分类方法、装置、设备 |
CN115331077B (zh) * | 2022-08-22 | 2024-04-26 | 北京百度网讯科技有限公司 | 特征提取模型的训练方法、目标分类方法、装置、设备 |
WO2024078403A1 (zh) * | 2022-10-13 | 2024-04-18 | 维沃移动通信有限公司 | 图像处理方法、装置及设备 |
CN115409716A (zh) * | 2022-11-01 | 2022-11-29 | 杭州网易智企科技有限公司 | 视频处理方法、装置、存储介质及设备 |
CN115409755A (zh) * | 2022-11-03 | 2022-11-29 | 腾讯科技(深圳)有限公司 | 贴图处理方法和装置、存储介质及电子设备 |
CN115409755B (zh) * | 2022-11-03 | 2023-03-03 | 腾讯科技(深圳)有限公司 | 贴图处理方法和装置、存储介质及电子设备 |
CN115546030A (zh) * | 2022-11-30 | 2022-12-30 | 武汉大学 | 基于孪生超分辨率网络的压缩视频超分辨率方法及系统 |
CN115953296A (zh) * | 2022-12-09 | 2023-04-11 | 中山大学·深圳 | 一种基于transformer和卷积神经网络联合的人脸超分辨率重建方法和系统 |
CN115953296B (zh) * | 2022-12-09 | 2024-04-05 | 中山大学·深圳 | 一种基于transformer和卷积神经网络联合的人脸超分辨率重建方法和系统 |
CN115908142A (zh) * | 2023-01-06 | 2023-04-04 | 诺比侃人工智能科技(成都)股份有限公司 | 一种基于视觉识别的接触网微小部件验损方法 |
WO2024160178A1 (zh) * | 2023-01-30 | 2024-08-08 | 厦门美图之家科技有限公司 | 图像翻译模型的训练、图像翻译方法、设备及存储介质 |
CN115937794A (zh) * | 2023-03-08 | 2023-04-07 | 北京龙智数科科技服务有限公司 | 小目标对象检测方法、装置、电子设备及存储介质 |
CN115937794B (zh) * | 2023-03-08 | 2023-08-15 | 成都须弥云图建筑设计有限公司 | 小目标对象检测方法、装置、电子设备及存储介质 |
CN116091712A (zh) * | 2023-04-12 | 2023-05-09 | 安徽大学 | 一种面向计算资源受限设备的多视图立体重建方法与系统 |
CN116091712B (zh) * | 2023-04-12 | 2023-06-27 | 安徽大学 | 一种面向计算资源受限设备的多视图立体重建方法与系统 |
CN116452424A (zh) * | 2023-05-19 | 2023-07-18 | 山东大学 | 一种基于双重广义蒸馏的人脸超分辨率重构方法及系统 |
CN116309591A (zh) * | 2023-05-19 | 2023-06-23 | 杭州健培科技有限公司 | 一种医学影像3d关键点检测方法、模型训练方法及装置 |
CN116309591B (zh) * | 2023-05-19 | 2023-08-25 | 杭州健培科技有限公司 | 一种医学影像3d关键点检测方法、模型训练方法及装置 |
CN116452424B (zh) * | 2023-05-19 | 2023-10-10 | 山东大学 | 一种基于双重广义蒸馏的人脸超分辨率重构方法及系统 |
CN116385318B (zh) * | 2023-06-06 | 2023-10-10 | 湖南纵骏信息科技有限公司 | 一种基于云桌面的图像画质增强方法及系统 |
CN116385318A (zh) * | 2023-06-06 | 2023-07-04 | 湖南纵骏信息科技有限公司 | 一种基于云桌面的图像画质增强方法及系统 |
CN117097876A (zh) * | 2023-07-07 | 2023-11-21 | 天津大学 | 基于神经网络的事件相机图像重建方法 |
CN117097876B (zh) * | 2023-07-07 | 2024-03-08 | 天津大学 | 基于神经网络的事件相机图像重建方法 |
CN117196947A (zh) * | 2023-09-06 | 2023-12-08 | 南通大学 | 一种面向高分辨率图像的高效压缩重建模型构建方法 |
CN117196947B (zh) * | 2023-09-06 | 2024-03-22 | 南通大学 | 一种面向高分辨率图像的高效压缩重建模型构建方法 |
CN117238020A (zh) * | 2023-11-10 | 2023-12-15 | 杭州启源视觉科技有限公司 | 人脸识别方法、装置和计算机设备 |
CN117238020B (zh) * | 2023-11-10 | 2024-04-26 | 杭州启源视觉科技有限公司 | 人脸识别方法、装置和计算机设备 |
CN117425013A (zh) * | 2023-12-19 | 2024-01-19 | 杭州靖安防务科技有限公司 | 一种基于可逆架构的视频传输方法和系统 |
CN117425013B (zh) * | 2023-12-19 | 2024-04-02 | 杭州靖安防务科技有限公司 | 一种基于可逆架构的视频传输方法和系统 |
CN117575916B (zh) * | 2024-01-19 | 2024-04-30 | 青岛漫斯特数字科技有限公司 | 基于深度学习的图像质量优化方法、系统、设备及介质 |
CN117611442A (zh) * | 2024-01-19 | 2024-02-27 | 第六镜科技(成都)有限公司 | 一种近红外人脸图像生成方法 |
CN117575916A (zh) * | 2024-01-19 | 2024-02-20 | 青岛漫斯特数字科技有限公司 | 基于深度学习的图像质量优化方法、系统、设备及介质 |
CN117612017A (zh) * | 2024-01-23 | 2024-02-27 | 江西啄木蜂科技有限公司 | 一种环境自适应的遥感影像变化检测方法 |
CN117612017B (zh) * | 2024-01-23 | 2024-05-24 | 江西啄木蜂科技有限公司 | 一种环境自适应的遥感影像变化检测方法 |
CN118071603A (zh) * | 2024-04-19 | 2024-05-24 | 浙江优众新材料科技有限公司 | 空间角度信息交互的光场图像超分辨率方法、装置及介质 |
CN118096536A (zh) * | 2024-04-29 | 2024-05-28 | 中国科学院长春光学精密机械与物理研究所 | 基于超图神经网络的遥感高光谱图像超分辨率重构方法 |
CN118096536B (zh) * | 2024-04-29 | 2024-06-21 | 中国科学院长春光学精密机械与物理研究所 | 基于超图神经网络的遥感高光谱图像超分辨率重构方法 |
CN118397130A (zh) * | 2024-07-01 | 2024-07-26 | 山东第一医科大学附属肿瘤医院(山东省肿瘤防治研究院、山东省肿瘤医院) | 一种肿瘤放疗效果ct图像处理方法 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2022057837A1 (zh) | 图像处理和人像超分辨率重建及模型训练方法、装置、电子设备及存储介质 | |
TWI728465B (zh) | 圖像處理方法和裝置、電子設備及儲存介質 | |
US11688070B2 (en) | Video frame segmentation using reduced resolution neural network and masks from previous frames | |
CN113034358B (zh) | 一种超分辨率图像处理方法以及相关装置 | |
CN112598579B (zh) | 面向监控场景的图像超分辨率方法、装置及存储介质 | |
WO2022156626A1 (zh) | 一种图像的视线矫正方法、装置、电子设备、计算机可读存储介质及计算机程序产品 | |
US10848746B2 (en) | Apparatus including multiple cameras and image processing method | |
RU2697928C1 (ru) | Способ сверхразрешения изображения, имитирующего повышение детализации на основе оптической системы, выполняемый на мобильном устройстве, обладающем ограниченными ресурсами, и мобильное устройство, его реализующее | |
WO2023284401A1 (zh) | 图像美颜处理方法、装置、存储介质与电子设备 | |
WO2024217164A1 (zh) | 视频去噪模型的处理方法、装置、计算机设备和存储介质 | |
US11862053B2 (en) | Display method based on pulse signals, apparatus, electronic device and medium | |
WO2023103378A1 (zh) | 视频插帧模型训练方法、装置、计算机设备和存储介质 | |
US20240144429A1 (en) | Image processing method, apparatus and system, and storage medium | |
CN116385283A (zh) | 一种基于事件相机的图像去模糊方法及系统 | |
Tang et al. | Structure-embedded ghosting artifact suppression network for high dynamic range image reconstruction | |
CN111652794A (zh) | 一种面部的调整、直播方法、装置、电子设备和存储介质 | |
CN113822803A (zh) | 图像超分处理方法、装置、设备及计算机可读存储介质 | |
US11127111B2 (en) | Selective allocation of processing resources for processing image data | |
WO2024032331A9 (zh) | 图像处理方法及装置、电子设备、存储介质 | |
CN117768774A (zh) | 图像处理器、图像处理方法、拍摄装置和电子设备 | |
WO2023280266A1 (zh) | 鱼眼图像压缩、鱼眼视频流压缩以及全景视频生成方法 | |
CN112261296B (zh) | 一种图像增强方法、图像增强装置及移动终端 | |
US20150229848A1 (en) | Method and system for generating an image including optically zoomed and digitally zoomed regions | |
CN114266697A (zh) | 图像处理和模型训练方法、装置、电子设备及存储介质 | |
CN115100666B (zh) | 基于显著性检测和超分辨率重建的ar会议系统及构建方法 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21868659 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 21868659 Country of ref document: EP Kind code of ref document: A1 |