Nothing Special   »   [go: up one dir, main page]

CN113034353B - Intrinsic image decomposition method and system based on cross convolution neural network - Google Patents

Intrinsic image decomposition method and system based on cross convolution neural network Download PDF

Info

Publication number
CN113034353B
CN113034353B CN202110385353.8A CN202110385353A CN113034353B CN 113034353 B CN113034353 B CN 113034353B CN 202110385353 A CN202110385353 A CN 202110385353A CN 113034353 B CN113034353 B CN 113034353B
Authority
CN
China
Prior art keywords
network
layer
neural network
convolutional neural
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110385353.8A
Other languages
Chinese (zh)
Other versions
CN113034353A (en
Inventor
权炜
孙燕平
于军琪
董芳楠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian University of Architecture and Technology
Original Assignee
Xian University of Architecture and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian University of Architecture and Technology filed Critical Xian University of Architecture and Technology
Priority to CN202110385353.8A priority Critical patent/CN113034353B/en
Publication of CN113034353A publication Critical patent/CN113034353A/en
Application granted granted Critical
Publication of CN113034353B publication Critical patent/CN113034353B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/04Context-preserving transformations, e.g. by using an importance map
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/70Denoising; Smoothing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/73Deblurring; Sharpening
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20172Image enhancement details
    • G06T2207/20192Edge enhancement; Edge preservation

Landscapes

  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses an intrinsic image decomposition method and system based on a cross convolution neural network, wherein the method comprises the following steps: inputting an original image to be decomposed into a trained GoogLeNet-VGG19 cross convolution neural network model to obtain an illumination map and a reflection map which are obtained by decomposing the original image; wherein the GoogLeNet-VGG19 cross convolution neural network model is formed by cross fusion of an illumination map generation network and a reflection map generation network; the illumination map generation network is constructed based on GoogLeNet convolutional neural networks, and the reflection map generation network is constructed based on VGG19 convolutional neural networks; training the illumination map generation network and the reflection map generation network by adopting an Adam optimization method. In the invention, the reflectivity of the image of the result of the decomposition of the intrinsic image is kept consistent on the same object, the image quality is higher in terms of protecting edge information and removing noise, and the image quality is closer to that of a true image in terms of detail and definition.

Description

Intrinsic image decomposition method and system based on cross convolution neural network
Technical Field
The invention belongs to the technical field of image processing, and particularly relates to an intrinsic image decomposition method and system based on a cross convolution neural network.
Background
The original image decomposition was first proposed by Barrow and Tenenbaum in 1978, and the original image solving problem is to recover the brightness and reflectivity information in the scenes corresponding to all the pixel points from the image, so as to form an illumination map and a reflection map respectively. The intrinsic image decomposition is mainly divided into two types according to algorithm types, the first is intrinsic image decomposition based on Retinex theory, and the second is intrinsic image decomposition based on deep learning. The conventional intrinsic image decomposition method Retinex assumes that the larger gradient in the image is caused by the object reflectivity, while the smaller gradient belongs to the illumination variation. Since the Retinex method is entirely gradient-based, the Retinex method builds local constraints.
Another constraint commonly used at present is that natural images contain a small number of colors and that the color distribution is in a structural form, called global color sparsity, i.e. images requiring a reflectivity layer contain only a few colors. Because the gradient-based method can only establish local constraint, the obtained reflectivity layer image may have global inconsistency, namely, the reflectivities of two pixels of the same material which are far away from each other are inconsistent, and the addition of a plurality of images in the same scene puts strict requirements on the input of the intrinsic image method. After the gradient values of the reflectivity and brightness images are estimated, the gradient images are integrated by means of Weiss to solve the reflection map and the illumination map. However, this method requires a large number of samples to train the classifier, is time-consuming, and the obtained intrinsic image has a large error at the edge, and the finally obtained intrinsic image is blurred at the edge, so that the sample is required to train the classifier, and the overfitting phenomenon of the sample may occur.
The method for decomposing the intrinsic image based on deep learning improves the problems to a certain extent, but has a plurality of defects, such as Narihira and the like, because of the defects of network design, the image is downsampled to an excessively small scale, so that a large amount of information is lost after recovery, and the output result is fuzzy; fan et al integrated a filter in the network to flatten the reflective layer, removing residual noise and geometry information, but neglecting protection of image details resulting in jagged edges.
Disclosure of Invention
The invention aims to provide an intrinsic image decomposition method and system based on a cross convolution neural network, which are used for solving one or more technical problems. In the invention, the reflectivity of the image of the result of the decomposition of the intrinsic image is kept consistent on the same object, the image quality is higher in terms of protecting edge information and removing noise, and the image quality is closer to that of a true image in terms of detail and definition.
In order to achieve the above purpose, the invention adopts the following technical scheme:
The invention discloses an intrinsic image decomposition method based on a cross convolution neural network, which comprises the following steps:
inputting an original image to be decomposed into a trained GoogLeNet-VGG19 cross convolution neural network model to obtain an illumination map and a reflection map which are obtained by decomposing the original image;
wherein the GoogLeNet-VGG19 cross convolution neural network model is formed by cross fusion of an illumination map generation network and a reflection map generation network; the illumination map generation network is constructed based on GoogLeNet convolutional neural networks, and the reflection map generation network is constructed based on VGG19 convolutional neural networks;
the step of obtaining the trained GoogLeNet-VGG19 cross convolution neural network model comprises the step of training an illumination map generation network and a reflection map generation network by adopting an Adam optimization method.
The invention further improves that the steps of the illumination map generation network based on GoogLeNet convolutional neural network construction specifically comprise:
After the GoogLeNet convolutional neural network inception a carries out 4 convolutional operations on the second layer, 1 ReLU activation function is added respectively, and the 4 ReLU activation functions are output to the DepthConcat layer of inception a together;
After the GoogLeNet convolutional neural network inception b carries out 4 convolutional operations on the second layer, 1 ReLU activation function is added respectively, and the 4 ReLU activation functions are output to DepthConcat layers of inception b together;
In GoogLeNet convolutional neural network inception a, the first layer 2 convolutional operations are connected with the second layer 2 convolutional operations to form 2 connection channels; adding 1 ReLU activation function and MaxPool operation combinations on 2 connection channels respectively, wherein the ReLU activation function is in front, and MaxPool operation is behind;
In GoogLeNet convolutional neural network inception b, the first layer 2 convolutional operations are connected with the second layer 2 convolutional operations to form 2 connection channels; adding 1 ReLU activation function and MaxPool operation combinations on 2 connection channels respectively, wherein the ReLU activation function is in front, and MaxPool operation is behind;
The DepthConcat layer output of GoogLeNet convolutional neural network inception b is hopped to the DepthConcat layer of inception d; the convolution operation output following the AveragePool operation of the first layer of GoogLeNet convolutional neural network inception e is directly connected to the DepthConcat layer of inception e; the DepthConcat layer output of GoogLeNet convolutional neural network inception e is hopped to connect to the DepthConcat layer of inception b;
After the GoogLeNet convolutional neural network inception a carries out 4 convolutional operations on the third layer, 1 ReLU activation function is added respectively, and the 4 ReLU activation functions are output to the DepthConcat layer of inception a together;
after the GoogLeNet convolutional neural network inception b carries out 4 convolutional operations on the second layer, 1 ReLU activation function is added respectively, and the 4 ReLU activation functions are output to the DepthConcat layer of inception b together;
An FC layer is added after GoogLeNet convolving the FC layer of the neural network.
The invention further improves that the steps of the reflection map generating network based on VGG19 convolutional neural network construction specifically comprise:
Concat operation is carried out on the first MaxPool output result and the second MaxPool output result of the VGG19 convolutional neural network, and the obtained result is input to a fifth layer of the VGG19 convolutional neural network;
concat operation is carried out on the third MaxPool output result and the fourth MaxPool output result of the VGG19 convolutional neural network, and the obtained result is input to a tenth layer of the VGG19 convolutional neural network;
deleting seventeenth and eighteenth layers of the VGG19 convolutional neural network; and adding two layers with the structure identical to that of the sixteenth layer after the sixteenth layer of the VGG19 convolutional neural network to form a seventeenth layer and an eighteenth layer of the modified VGG19 convolutional neural network.
The invention further improves that the steps of the intersection fusion of the illumination map generation network and the reflection map generation network specifically comprise:
Connecting the DepthConcat layer output of GoogLeNet convolutional neural network inception e to the tenth layer of VGG19 convolutional neural network;
a convolution operation that connects the fourth MaxPool output of the VGG19 convolutional neural network to the second layer of GoogLeNet convolutional neural network inception a.
The invention further improves that the expression of the Loss function Loss1 of the illumination map generation network is as follows:
Wherein X is the input image, For the predicted image, H, W, C are the height, width and channel number of the input image, X, y represent pixel coordinates of the image, C represent channels, mu i represent weights at the i-th scale, X (i) represents the image at the i-th scale,Representing the predicted image at the ith scale generated by the modified GoogLeNet convolutional neural network.
The invention is further improved in that the expression of the Loss function Loss2 of the reflection map generation network is as follows:
wherein Y represents an input image, Representing the estimated value of the input image after the improved VGG19 network processing, C j,Hj,Wj represents the channel number, the height and the width of the j-th layer output characteristic diagram, V j (-) represents the output of the activation function when the j-th layer network processes the image, and j represents the layer number.
The invention further improves that the training steps of the illumination map generation network and the reflection map generation network by adopting the Adam optimization method specifically comprise the following steps:
taking images in a pre-constructed training image sample library as samples, and training an illumination map generation network and a reflection map generation network simultaneously by adopting an Adam optimization method;
In the training process, inputting the illumination map output by the illumination map generating network into an identification network, and identifying the probability that the illumination map output by the network is consistent with the training sample label image, and reversely updating the network parameters of the illumination map generating network; inputting the reflectogram output by the reflectogram generating network into an identification network, identifying the probability that the reflectogram output by the network is consistent with the training sample label image, and reversely updating the network parameters of the reflectogram generating network;
when the Loss function Loss1 reaches the minimum, stopping training the illumination map generation network to obtain a final illumination map generation network; when the Loss function Loss2 reaches the minimum, stopping training the reflectogram generation network to obtain a final reflectogram generation network;
the identification network is a multi-layer convolutional neural network and comprises six identical layers; each layer is in turn a convolution operation, a Sigmoid activation function, and MaxPool.
A further improvement of the invention is that Adam optimization parameter beta is set to (0.9,0.999), learning rate is 0.005, weight decay is 0.0001, epoch=100, and batch size=20.
The invention discloses an intrinsic image decomposition system based on a cross convolution neural network, which comprises the following components:
The decomposition module is used for inputting an original image to be decomposed into a trained GoogLeNet-VGG19 cross convolution neural network model to obtain an illumination map and a reflection map which are obtained by decomposing the original image;
wherein the GoogLeNet-VGG19 cross convolution neural network model is formed by cross fusion of an illumination map generation network and a reflection map generation network; the illumination map generation network is constructed based on GoogLeNet convolutional neural networks, and the reflection map generation network is constructed based on VGG19 convolutional neural networks;
the step of obtaining the trained GoogLeNet-VGG19 cross convolution neural network model comprises the step of training an illumination map generation network and a reflection map generation network by adopting an Adam optimization method.
A further improvement of the present invention is that,
The steps of the illumination map generation network based on GoogLeNet convolutional neural network construction specifically comprise:
After the GoogLeNet convolutional neural network inception a carries out 4 convolutional operations on the second layer, 1 ReLU activation function is added respectively, and the 4 ReLU activation functions are output to the DepthConcat layer of inception a together;
After the GoogLeNet convolutional neural network inception b carries out 4 convolutional operations on the second layer, 1 ReLU activation function is added respectively, and the 4 ReLU activation functions are output to DepthConcat layers of inception b together;
In GoogLeNet convolutional neural network inception a, the first layer 2 convolutional operations are connected with the second layer 2 convolutional operations to form 2 connection channels; adding 1 ReLU activation function and MaxPool operation combinations on 2 connection channels respectively, wherein the ReLU activation function is in front, and MaxPool operation is behind;
In GoogLeNet convolutional neural network inception b, the first layer 2 convolutional operations are connected with the second layer 2 convolutional operations to form 2 connection channels; adding 1 ReLU activation function and MaxPool operation combinations on 2 connection channels respectively, wherein the ReLU activation function is in front, and MaxPool operation is behind;
The DepthConcat layer output of GoogLeNet convolutional neural network inception b is hopped to the DepthConcat layer of inception d; the convolution operation output following the AveragePool operation of the first layer of GoogLeNet convolutional neural network inception e is directly connected to the DepthConcat layer of inception e; the DepthConcat layer output of GoogLeNet convolutional neural network inception e is hopped to connect to the DepthConcat layer of inception b;
After the GoogLeNet convolutional neural network inception a carries out 4 convolutional operations on the third layer, 1 ReLU activation function is added respectively, and the 4 ReLU activation functions are output to the DepthConcat layer of inception a together;
after the GoogLeNet convolutional neural network inception b carries out 4 convolutional operations on the second layer, 1 ReLU activation function is added respectively, and the 4 ReLU activation functions are output to the DepthConcat layer of inception b together;
adding an FC layer after GoogLeNet convolutional neural network;
the step of the reflection map generation network based on VGG19 convolutional neural network construction specifically comprises the following steps:
Concat operation is carried out on the first MaxPool output result and the second MaxPool output result of the VGG19 convolutional neural network, and the obtained result is input to a fifth layer of the VGG19 convolutional neural network;
concat operation is carried out on the third MaxPool output result and the fourth MaxPool output result of the VGG19 convolutional neural network, and the obtained result is input to a tenth layer of the VGG19 convolutional neural network;
deleting seventeenth and eighteenth layers of the VGG19 convolutional neural network; adding two layers with the structure identical to that of the sixteenth layer after the sixteenth layer of the VGG19 convolutional neural network to form a seventeenth layer and an eighteenth layer of the modified VGG19 convolutional neural network;
The step of performing cross fusion between the illumination map generation network and the reflection map generation network specifically comprises the following steps:
Connecting the DepthConcat layer output of GoogLeNet convolutional neural network inception e to the tenth layer of VGG19 convolutional neural network;
a convolution operation that connects the fourth MaxPool output of the VGG19 convolutional neural network to the second layer of GoogLeNet convolutional neural network inception a.
Compared with the prior art, the invention has the following beneficial effects:
The invention provides an essential image decomposition method based on an improved GoogLeNet-VGG19 cross convolution neural network, which comprises the steps of firstly constructing a training image sample library, then carrying out improved construction of a light map generation network based on a traditional GoogLeNet convolution neural network, carrying out improved construction of a reflection map generation network based on the traditional VGG19 convolution neural network, and carrying out cross fusion of the light map generation network and the reflection map generation network; next, constructing an identification network; and finally training the illumination map generation network and the reflection map generation network by adopting an Adam optimization method to obtain a final illumination map generation network and a final reflection map generation network. The image of the result of the decomposition of the intrinsic image is consistent in reflectivity of the same object, better in protecting edge information and removing noise, higher in image quality, and closer to the true image in detail or definition.
The system is used for decomposing the intrinsic image, compared with the image decomposed by the existing method, the problems that a lot of noise exists and the edges of the image are blurred are solved, the reflectivity of the output image of the method is kept consistent on the same object, the protection of edge information and the removal of noise are better, and the image quality is higher; the resulting results, both in detail and sharpness, are more closely related to the truth image.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the following description of the embodiments or the drawings used in the description of the prior art will make a brief description; it will be apparent to those of ordinary skill in the art that the drawings in the following description are of some embodiments of the invention and that other drawings may be derived from them without undue effort.
FIG. 1 is a flow chart of an essential image decomposition method based on a modified GoogLeNet-VGG19 cross-convolution neural network according to an embodiment of the invention;
FIG. 2 is a schematic diagram of the results of the decomposition of an intrinsic image in an embodiment of the present invention; fig. 2 (a) is an original image schematic diagram, fig. 2 (b) is an illumination schematic diagram obtained by decomposition, and fig. 2 (c) is a reflection schematic diagram obtained by decomposition.
Detailed Description
In order to make the purposes, technical effects and technical solutions of the embodiments of the present invention more clear, the technical solutions of the embodiments of the present invention are clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention; it will be apparent that the described embodiments are some of the embodiments of the present invention. Other embodiments, which may be made by those of ordinary skill in the art based on the disclosed embodiments without undue burden, are within the scope of the present invention.
Referring to fig. 1, an intrinsic image decomposition method based on a modified GoogLeNet-VGG19 cross convolution neural network according to an embodiment of the invention includes the following steps:
Step 1: constructing a training image sample library;
Taking out P images and corresponding illumination patterns and reflection patterns from an intrinsic image database by adopting a public intrinsic image database; then, carrying out random clipping on the P images to clip out a plurality of image blocks with specified sizes; then carrying out image processing on the image blocks, namely carrying out horizontal overturning, vertical overturning, rotation and mirroring randomly to expand a database; the image blocks after image processing and the illumination patterns and reflection patterns corresponding to the image blocks form a training image sample library;
step 2: adopting the improved GoogLeNet convolutional neural network to construct an illumination map generation network, the method is as follows:
Step 2-1: adding 1 ReLU activation function after 4 convolution operations of the second layer of GoogLeNet convolutional neural network inception a respectively, wherein the total number of the 4 ReLU activation functions is 4, and the 4 ReLU activation functions are output to DepthConcat layers of inception a together;
Step 2-2: adding 1 ReLU activation function after 4 convolution operations of the second layer of GoogLeNet convolutional neural network inception b respectively, wherein the total of 4 ReLU activation functions are output to DepthConcat layers of inception b together;
Step 2-3: in GoogLeNet convolutional neural network inception a, the first layer 2 convolutional operations are connected with the second layer 2 convolutional operations to form 2 connection channels; adding 1 ReLU activation function and MaxPool operation combinations on the 2 connection channels respectively, wherein 2 ReLU activation functions and MaxPool operation combinations are added in total; the ReLU activation function is in front, maxPool operates in back;
step 2-4: in GoogLeNet convolutional neural network inception b, the first layer 2 convolutional operations are connected with the second layer 2 convolutional operations to form 2 connection channels; adding 1 ReLU activation function and MaxPool operation combinations on the 2 connection channels respectively, wherein 2 ReLU activation functions and MaxPool operation combinations are added in total; the ReLU activation function is in front, maxPool operates in back;
Step 2-5: the DepthConcat layer output of GoogLeNet convolutional neural network inception b is hopped to the DepthConcat layer of inception d;
Step 2-6: the convolution operation output following the AveragePool operation of the first layer of GoogLeNet convolutional neural network inception e is directly connected to the DepthConcat layer of inception e;
step 2-7: the DepthConcat layer output of GoogLeNet convolutional neural network inception e is hopped to connect to the DepthConcat layer of inception b;
step 2-8: adding 1 ReLU activation function after the third layer 4 convolution operations of GoogLeNet convolution neural network inception a respectively, wherein the total number of the 4 ReLU activation functions is 4, and the 4 ReLU activation functions are output to DepthConcat layers of inception a together;
step 2-9: adding 1 ReLU activation function after 4 convolution operations of the second layer of GoogLeNet b convolutional neural network inception b respectively, wherein the total of 4 ReLU activation functions are output to DepthConcat layers of inception b together;
Step 2-10: a new FC layer is added behind the FC layer of the GoogLeNet convolutional neural network;
Step 2-11: forming an improved GoogLeNet convolutional neural network by the operations of step 2-1 to step 2-10;
step 3: the reflection map generating network is constructed by adopting the improved VGG19 convolutional neural network, and the method is concretely as follows:
Step 3-1: concat operation is carried out on the first MaxPool output result and the second MaxPool output result of the VGG19 convolutional neural network, and the obtained result is input to a fifth layer of the VGG19 convolutional neural network;
Step 3-2: concat operation is carried out on the third MaxPool output result and the fourth MaxPool output result of the VGG19 convolutional neural network, and the obtained result is input to a tenth layer of the VGG19 convolutional neural network;
step 3-3: deleting seventeenth and eighteenth layers of the VGG19 convolutional neural network;
Step 3-4: adding two identical layers after the sixteenth layer of the VGG19 convolutional neural network to form a new seventeenth layer and an eighteenth layer; the structure of the new seventeenth layer and the eighteenth layer is identical to that of the sixteenth layer;
Step 3-5: forming an improved VGG19 convolutional neural network through the operations of step 3-1 to step 3-4;
Step 4: the illumination map generation network and the reflection map generation network are crossed and fused;
Step 4-1: connecting the DepthConcat layer output of GoogLeNet convolutional neural network inception e to the tenth layer of VGG19 convolutional neural network;
Step 4-2: a convolution operation connecting the fourth MaxPool output of the VGG19 convolutional neural network to the second layer of GoogLeNet convolutional neural network inception a;
step 5: constructing an identification network;
The identification network is a multi-layer convolutional neural network and comprises six identical layers; each layer is sequentially provided with a convolution operation, a Sigmoid activation function and MaxPool;
step 6: defining a loss function;
Step 6-1: defining a lighting map to generate a network Loss function Loss1:
wherein X is an input image, For the predicted image, H, W, C are the height, width and channel number of the input image, X, y represent pixel coordinates of the image, C represent channels, mu i represent weights at the i-th scale, X (i) represents the image at the i-th scale,Representing a predicted image at an ith scale generated by the modified GoogLeNet convolutional neural network;
step 6-2: defining a reflection map to generate a network Loss function Loss2:
wherein Y represents an input image, Representing an estimated value of an input image after being processed by a modified VGG19 network, wherein C j,Hj,Wj represents the channel number, the height and the width of a j-th layer output characteristic diagram respectively, V j () represents the output of an activation function when the j-th layer network processes the image, and j represents the layer number;
Step 7: training a network;
Taking images in the training image sample library constructed in the step 1 as samples, and training an illumination map generation network and a reflection map generation network simultaneously by adopting an Adam optimization method;
In the training process, inputting the illumination map output by the illumination map generating network into an identification network, and identifying the probability that the illumination map output by the network is consistent with the training sample label image, and reversely updating the network parameters of the illumination map generating network; inputting the reflectogram output by the reflectogram generating network into an identification network, identifying the probability that the reflectogram output by the network is consistent with the training sample label image, and reversely updating the network parameters of the reflectogram generating network;
when the Loss function Loss1 reaches the minimum, stopping training the illumination map generation network to obtain a final illumination map generation network; when the Loss function Loss2 reaches the minimum, stopping training the reflectogram generation network to obtain a final reflectogram generation network;
step 8: and (3) respectively inputting the original image to be decomposed into a final illumination map generation network and a reflection map generation network which are obtained in the step (7), wherein the output image is the illumination map and the reflection map which are obtained by decomposing the original image.
In the embodiment of the present invention, the size of the image block with the specified size in the step 1 is 224×224.
In the embodiment of the present invention, the parameters set during the training of the network in the step 7 are as follows: adam optimization parameter beta is set to (0.9,0.999), learning rate is 0.005, weight decay is 0.0001, epoch=100, and batch size=20.
Compared with the image decomposed by the existing method, the method has the advantages that a plurality of noises exist, and the edges of the image are blurred, the reflectivity of the image output by the method in the embodiment of the invention is consistent, the protection of edge information and the removal of noises are better, and the image quality is higher; the results generated by the method of the invention are closer to the truth image in terms of detail and definition.
The embodiment of the invention discloses an intrinsic image decomposition system based on a cross convolution neural network, which comprises the following components:
The decomposition module is used for inputting an original image to be decomposed into a trained GoogLeNet-VGG19 cross convolution neural network model to obtain an illumination map and a reflection map which are obtained by decomposing the original image;
wherein the GoogLeNet-VGG19 cross convolution neural network model is formed by cross fusion of an illumination map generation network and a reflection map generation network; the illumination map generation network is constructed based on GoogLeNet convolutional neural networks, and the reflection map generation network is constructed based on VGG19 convolutional neural networks;
the step of obtaining the trained GoogLeNet-VGG19 cross convolution neural network model comprises the step of training an illumination map generation network and a reflection map generation network by adopting an Adam optimization method.
Referring to fig. 1 and 2, an essential image decomposition method based on a modified GoogLeNet-VGG19 cross convolution neural network according to an embodiment of the invention includes the following steps:
(1) Building a training image sample library
Using MPCal intrinsic image datasets, 1000 images were taken, 50 224 x 224 image blocks were randomly cropped in each image, and then the image blocks were randomly flipped horizontally, flipped vertically, rotated and mirrored, after which the 50 image blocks were changed to 200 image blocks. At this time, the total number of image blocks is 20 ten thousand. Meanwhile, in the illumination map and the reflection map corresponding to 1000 images, the illumination blocks and the reflection blocks corresponding to 20 ten thousand image blocks are found. The training image sample library is formed by the image blocks and the corresponding illumination blocks and reflection blocks.
(2) The illumination map generation network and the reflection map generation network constructed by the method are trained simultaneously by using a training image sample library, an Adam optimization method is adopted, adam optimization parameters beta are set to be (0.9,0.999), the learning rate is 0.005, the weight attenuation is 0.0001, epoch=100, and the batch size=20. And stopping training when the loss functions of the two generating networks are minimum, and obtaining a final illumination map generating network and a reflection map generating network. In the training process, inputting the illumination map output by the illumination map generating network into an identification network, and identifying the probability that the illumination map output by the network is consistent with the training sample label image, and reversely updating the network parameters of the illumination map generating network; inputting the reflectogram output by the reflectogram generating network into an identification network, identifying the probability that the reflectogram output by the network is consistent with the training sample label image, and reversely updating the network parameters of the reflectogram generating network; the generating network and the identifying network adopt TTUR training methods, and the ratio of the training times of the identifying network to the training times of the generating network is 3 to 1.
(3) As shown in fig. 2, the original image to be processed (shown in fig. 2 (a)) is input into the final illumination map generation network and the reflection map generation network, respectively, and the output image is the illumination map and the reflection map (shown in fig. 2 (b) and (c)) obtained by decomposing the original image. The method has the advantages that the noise of the decomposition result of the intrinsic image is less, the edge of the image is clear, the overall definition and quality of the image reach higher level, and the effectiveness and the practicability of the method are fully illustrated.
In summary, the embodiment of the invention provides an essential image decomposition method based on an improved GoogLeNet-VGG19 cross convolution neural network, which comprises the steps of firstly constructing a training image sample library, then carrying out improved construction of an illumination map generation network based on a traditional GoogLeNet convolution neural network, carrying out improved construction of a reflection map generation network based on the traditional VGG19 convolution neural network, and carrying out cross fusion of the illumination map generation network and the reflection map generation network; next, constructing an identification network; and finally training the illumination map generation network and the reflection map generation network by adopting an Adam optimization method to obtain a final illumination map generation network and a final reflection map generation network. The image of the result of the decomposition of the intrinsic image is consistent in reflectivity of the same object, better in protecting edge information and removing noise, higher in image quality, and closer to the true image in detail or definition.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The above embodiments are only for illustrating the technical solution of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the above embodiments, one skilled in the art may make modifications and equivalents to the specific embodiments of the present invention, and any modifications and equivalents not departing from the spirit and scope of the present invention are within the scope of the claims of the present invention.

Claims (3)

1. The intrinsic image decomposition method based on the cross convolution neural network is characterized by comprising the following steps of:
inputting an original image to be decomposed into a trained GoogLeNet-VGG19 cross convolution neural network model to obtain an illumination map and a reflection map which are obtained by decomposing the original image;
wherein the GoogLeNet-VGG19 cross convolution neural network model is formed by cross fusion of an illumination map generation network and a reflection map generation network; the illumination map generation network is constructed based on GoogLeNet convolutional neural networks, and the reflection map generation network is constructed based on VGG19 convolutional neural networks;
The obtaining step of the trained GoogLeNet-VGG19 cross convolution neural network model comprises the steps of training an illumination map generation network and a reflection map generation network by adopting an Adam optimization method;
Wherein,
The steps of the illumination map generation network based on GoogLeNet convolutional neural network construction specifically comprise:
After the GoogLeNet convolutional neural network inception a carries out 4 convolutional operations on the second layer, 1 ReLU activation function is added respectively, and the 4 ReLU activation functions are output to the DepthConcat layer of inception a together;
After the GoogLeNet convolutional neural network inception b carries out 4 convolutional operations on the second layer, 1 ReLU activation function is added respectively, and the 4 ReLU activation functions are output to DepthConcat layers of inception b together;
In GoogLeNet convolutional neural network inception a, the first layer 2 convolutional operations are connected with the second layer 2 convolutional operations to form 2 connection channels; adding 1 ReLU activation function and MaxPool operation combinations on 2 connection channels respectively, wherein the ReLU activation function is in front, and MaxPool operation is behind;
In GoogLeNet convolutional neural network inception b, the first layer 2 convolutional operations are connected with the second layer 2 convolutional operations to form 2 connection channels; adding 1 ReLU activation function and MaxPool operation combinations on 2 connection channels respectively, wherein the ReLU activation function is in front, and MaxPool operation is behind;
The DepthConcat layer output of GoogLeNet convolutional neural network inception b is hopped to the DepthConcat layer of inception d; the convolution operation output following the AveragePool operation of the first layer of GoogLeNet convolutional neural network inception e is directly connected to the DepthConcat layer of inception e; the DepthConcat layer output of GoogLeNet convolutional neural network inception e is hopped to connect to the DepthConcat layer of inception b;
After the GoogLeNet convolutional neural network inception a carries out 4 convolutional operations on the third layer, 1 ReLU activation function is added respectively, and the 4 ReLU activation functions are output to the DepthConcat layer of inception a together;
after the GoogLeNet convolutional neural network inception b carries out 4 convolutional operations on the second layer, 1 ReLU activation function is added respectively, and the 4 ReLU activation functions are output to the DepthConcat layer of inception b together;
adding an FC layer after GoogLeNet convolutional neural network;
the steps of the reflection map generating network based on VGG19 convolutional neural network construction specifically comprise:
Concat operation is carried out on the first MaxPool output result and the second MaxPool output result of the VGG19 convolutional neural network, and the obtained result is input to a fifth layer of the VGG19 convolutional neural network;
concat operation is carried out on the third MaxPool output result and the fourth MaxPool output result of the VGG19 convolutional neural network, and the obtained result is input to a tenth layer of the VGG19 convolutional neural network;
deleting seventeenth and eighteenth layers of the VGG19 convolutional neural network; adding two layers with the structure identical to that of the sixteenth layer after the sixteenth layer of the VGG19 convolutional neural network to form a seventeenth layer and an eighteenth layer of the modified VGG19 convolutional neural network;
the step of cross-fusing the illumination map generation network and the reflection map generation network specifically comprises the following steps:
Connecting the DepthConcat layer output of GoogLeNet convolutional neural network inception e to the tenth layer of VGG19 convolutional neural network;
a convolution operation connecting the fourth MaxPool output of the VGG19 convolutional neural network to the second layer of GoogLeNet convolutional neural network inception a;
The expression of the Loss function Loss1 of the illumination map generation network is:
Wherein X is the input image, For the predicted image, H, W, C are the height, width and channel number of the input image, X, y represent pixel coordinates of the image, C represent channels, mu i represent weights at the i-th scale, X (i) represents the image at the i-th scale,Representing a predicted image at an ith scale generated by the modified GoogLeNet convolutional neural network;
the expression of the Loss function Loss2 of the reflection map generation network is:
wherein Y represents an input image, Representing an estimated value of an input image after being processed by a modified VGG19 network, wherein C j,Hj,Wj represents the channel number, the height and the width of a j-th layer output characteristic diagram respectively, V j () represents the output of an activation function when the j-th layer network processes the image, and j represents the layer number;
The training of the illumination map generation network and the reflection map generation network by adopting the Adam optimization method specifically comprises the following steps:
taking images in a pre-constructed training image sample library as samples, and training an illumination map generation network and a reflection map generation network simultaneously by adopting an Adam optimization method;
In the training process, inputting the illumination map output by the illumination map generating network into an identification network, and identifying the probability that the illumination map output by the network is consistent with the training sample label image, and reversely updating the network parameters of the illumination map generating network; inputting the reflectogram output by the reflectogram generating network into an identification network, identifying the probability that the reflectogram output by the network is consistent with the training sample label image, and reversely updating the network parameters of the reflectogram generating network;
when the Loss function Loss1 reaches the minimum, stopping training the illumination map generation network to obtain a final illumination map generation network; when the Loss function Loss2 reaches the minimum, stopping training the reflectogram generation network to obtain a final reflectogram generation network;
the identification network is a multi-layer convolutional neural network and comprises six identical layers; each layer is in turn a convolution operation, a Sigmoid activation function, and MaxPool.
2. The intrinsic image decomposition method based on a cross convolution neural network according to claim 1, wherein Adam optimization parameter beta is set to (0.9,0.999), learning rate is 0.005, weight attenuation is 0.0001, epoch=100, and batch size=20.
3. An intrinsic image decomposing system based on a cross convolution neural network for implementing the intrinsic image decomposing method of claim 1, characterized in that the intrinsic image decomposing system comprises:
The decomposition module is used for inputting an original image to be decomposed into a trained GoogLeNet-VGG19 cross convolution neural network model to obtain an illumination map and a reflection map which are obtained by decomposing the original image;
wherein the GoogLeNet-VGG19 cross convolution neural network model is formed by cross fusion of an illumination map generation network and a reflection map generation network; the illumination map generation network is constructed based on GoogLeNet convolutional neural networks, and the reflection map generation network is constructed based on VGG19 convolutional neural networks;
the step of obtaining the trained GoogLeNet-VGG19 cross convolution neural network model comprises the step of training an illumination map generation network and a reflection map generation network by adopting an Adam optimization method.
CN202110385353.8A 2021-04-09 2021-04-09 Intrinsic image decomposition method and system based on cross convolution neural network Active CN113034353B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110385353.8A CN113034353B (en) 2021-04-09 2021-04-09 Intrinsic image decomposition method and system based on cross convolution neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110385353.8A CN113034353B (en) 2021-04-09 2021-04-09 Intrinsic image decomposition method and system based on cross convolution neural network

Publications (2)

Publication Number Publication Date
CN113034353A CN113034353A (en) 2021-06-25
CN113034353B true CN113034353B (en) 2024-07-12

Family

ID=76456400

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110385353.8A Active CN113034353B (en) 2021-04-09 2021-04-09 Intrinsic image decomposition method and system based on cross convolution neural network

Country Status (1)

Country Link
CN (1) CN113034353B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113657521B (en) * 2021-08-23 2023-09-19 天津大学 Method for separating two mutually exclusive components in image

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108416805A (en) * 2018-03-12 2018-08-17 中山大学 A kind of intrinsic image decomposition method and device based on deep learning
CN110232661A (en) * 2019-05-03 2019-09-13 天津大学 Low illumination colour-image reinforcing method based on Retinex and convolutional neural networks

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10586310B2 (en) * 2017-04-06 2020-03-10 Pixar Denoising Monte Carlo renderings using generative adversarial neural networks
US10706508B2 (en) * 2018-03-29 2020-07-07 Disney Enterprises, Inc. Adaptive sampling in Monte Carlo renderings using error-predicting neural networks
CN108764250B (en) * 2018-05-02 2021-09-17 西北工业大学 Method for extracting essential image by using convolutional neural network
WO2020068158A1 (en) * 2018-09-24 2020-04-02 Google Llc Photo relighting using deep neural networks and confidence learning
CN110675336A (en) * 2019-08-29 2020-01-10 苏州千视通视觉科技股份有限公司 Low-illumination image enhancement method and device
CN111242868B (en) * 2020-01-16 2023-05-02 重庆邮电大学 Image enhancement method based on convolutional neural network in scotopic vision environment
CN111563577B (en) * 2020-04-21 2022-03-11 西北工业大学 Unet-based intrinsic image decomposition method for skip layer frequency division and multi-scale identification
CN111681223B (en) * 2020-06-09 2023-04-18 安徽理工大学 Method for detecting mine well wall under low illumination condition based on convolutional neural network
GB2598711B (en) * 2020-08-11 2023-10-18 Toshiba Kk A Computer Vision Method and System
CN112131975B (en) * 2020-09-08 2022-11-15 东南大学 Face illumination processing method based on Retinex decomposition and generation of confrontation network

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108416805A (en) * 2018-03-12 2018-08-17 中山大学 A kind of intrinsic image decomposition method and device based on deep learning
CN110232661A (en) * 2019-05-03 2019-09-13 天津大学 Low illumination colour-image reinforcing method based on Retinex and convolutional neural networks

Also Published As

Publication number Publication date
CN113034353A (en) 2021-06-25

Similar Documents

Publication Publication Date Title
CN111080620B (en) Road disease detection method based on deep learning
CN109978807B (en) Shadow removing method based on generating type countermeasure network
CN110084817B (en) Digital elevation model production method based on deep learning
CN112001407A (en) Model iterative training method and system based on automatic labeling
CN114022586B (en) Defect image generation method based on countermeasure generation network
Jiao et al. Guided-Pix2Pix: End-to-end inference and refinement network for image dehazing
CN113034353B (en) Intrinsic image decomposition method and system based on cross convolution neural network
CN112101364A (en) Semantic segmentation method based on parameter importance incremental learning
CN116645369A (en) Anomaly detection method based on twin self-encoder and two-way information depth supervision
CN113762265A (en) Pneumonia classification and segmentation method and system
CN113989290A (en) Wrinkle segmentation method based on U-Net
CN116563250A (en) Recovery type self-supervision defect detection method, device and storage medium
Zhao et al. Layer-wise multi-defect detection for laser powder bed fusion using deep learning algorithm with visual explanation
CN115222750A (en) Remote sensing image segmentation method and system based on multi-scale fusion attention
CN108537266A (en) A kind of cloth textured fault sorting technique of depth convolutional network
CN114943655B (en) Image restoration system for generating countermeasure network structure based on cyclic depth convolution
CN113780547A (en) Computer implementation method, computer system and computer program product
CN117576453A (en) Cross-domain armored target detection method, system, electronic equipment and storage medium
CN116091784A (en) Target tracking method, device and storage medium
CN115376022A (en) Application of small target detection algorithm based on neural network in unmanned aerial vehicle aerial photography
CN116309545A (en) Single-stage cell nucleus instance segmentation method for medical microscopic image
CN115331052A (en) Garbage data labeling system and method based on deep learning
CN117934337B (en) Method for mask repair of blocked chromosome based on unsupervised learning
CN118314360B (en) Image self-adaptive quick recognition method based on deep learning
CN117593648B (en) Remote sensing target building extraction method based on weak supervision learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant