CN113034353B - Intrinsic image decomposition method and system based on cross convolution neural network - Google Patents
Intrinsic image decomposition method and system based on cross convolution neural network Download PDFInfo
- Publication number
- CN113034353B CN113034353B CN202110385353.8A CN202110385353A CN113034353B CN 113034353 B CN113034353 B CN 113034353B CN 202110385353 A CN202110385353 A CN 202110385353A CN 113034353 B CN113034353 B CN 113034353B
- Authority
- CN
- China
- Prior art keywords
- network
- layer
- neural network
- convolutional neural
- image
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 58
- 238000000354 decomposition reaction Methods 0.000 title claims abstract description 31
- 238000013528 artificial neural network Methods 0.000 title claims abstract description 22
- 238000013527 convolutional neural network Methods 0.000 claims abstract description 109
- 238000005286 illumination Methods 0.000 claims abstract description 82
- 238000012549 training Methods 0.000 claims abstract description 47
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 claims abstract description 18
- 238000005457 optimization Methods 0.000 claims abstract description 18
- 238000003062 neural network model Methods 0.000 claims abstract description 17
- 230000004927 fusion Effects 0.000 claims abstract description 10
- 230000006870 function Effects 0.000 claims description 76
- 230000004913 activation Effects 0.000 claims description 58
- 238000010586 diagram Methods 0.000 claims description 13
- 238000010276 construction Methods 0.000 claims description 10
- 230000008569 process Effects 0.000 claims description 8
- 238000002310 reflectometry Methods 0.000 abstract description 12
- 238000012545 processing Methods 0.000 description 8
- 238000004590 computer program Methods 0.000 description 7
- 238000003860 storage Methods 0.000 description 3
- 239000003086 colorant Substances 0.000 description 2
- 238000013135 deep learning Methods 0.000 description 2
- 230000007547 defect Effects 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000009826 distribution Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000011084 recovery Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformations in the plane of the image
- G06T3/04—Context-preserving transformations, e.g. by using an importance map
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/70—Denoising; Smoothing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/73—Deblurring; Sharpening
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20172—Image enhancement details
- G06T2207/20192—Edge enhancement; Edge preservation
Landscapes
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses an intrinsic image decomposition method and system based on a cross convolution neural network, wherein the method comprises the following steps: inputting an original image to be decomposed into a trained GoogLeNet-VGG19 cross convolution neural network model to obtain an illumination map and a reflection map which are obtained by decomposing the original image; wherein the GoogLeNet-VGG19 cross convolution neural network model is formed by cross fusion of an illumination map generation network and a reflection map generation network; the illumination map generation network is constructed based on GoogLeNet convolutional neural networks, and the reflection map generation network is constructed based on VGG19 convolutional neural networks; training the illumination map generation network and the reflection map generation network by adopting an Adam optimization method. In the invention, the reflectivity of the image of the result of the decomposition of the intrinsic image is kept consistent on the same object, the image quality is higher in terms of protecting edge information and removing noise, and the image quality is closer to that of a true image in terms of detail and definition.
Description
Technical Field
The invention belongs to the technical field of image processing, and particularly relates to an intrinsic image decomposition method and system based on a cross convolution neural network.
Background
The original image decomposition was first proposed by Barrow and Tenenbaum in 1978, and the original image solving problem is to recover the brightness and reflectivity information in the scenes corresponding to all the pixel points from the image, so as to form an illumination map and a reflection map respectively. The intrinsic image decomposition is mainly divided into two types according to algorithm types, the first is intrinsic image decomposition based on Retinex theory, and the second is intrinsic image decomposition based on deep learning. The conventional intrinsic image decomposition method Retinex assumes that the larger gradient in the image is caused by the object reflectivity, while the smaller gradient belongs to the illumination variation. Since the Retinex method is entirely gradient-based, the Retinex method builds local constraints.
Another constraint commonly used at present is that natural images contain a small number of colors and that the color distribution is in a structural form, called global color sparsity, i.e. images requiring a reflectivity layer contain only a few colors. Because the gradient-based method can only establish local constraint, the obtained reflectivity layer image may have global inconsistency, namely, the reflectivities of two pixels of the same material which are far away from each other are inconsistent, and the addition of a plurality of images in the same scene puts strict requirements on the input of the intrinsic image method. After the gradient values of the reflectivity and brightness images are estimated, the gradient images are integrated by means of Weiss to solve the reflection map and the illumination map. However, this method requires a large number of samples to train the classifier, is time-consuming, and the obtained intrinsic image has a large error at the edge, and the finally obtained intrinsic image is blurred at the edge, so that the sample is required to train the classifier, and the overfitting phenomenon of the sample may occur.
The method for decomposing the intrinsic image based on deep learning improves the problems to a certain extent, but has a plurality of defects, such as Narihira and the like, because of the defects of network design, the image is downsampled to an excessively small scale, so that a large amount of information is lost after recovery, and the output result is fuzzy; fan et al integrated a filter in the network to flatten the reflective layer, removing residual noise and geometry information, but neglecting protection of image details resulting in jagged edges.
Disclosure of Invention
The invention aims to provide an intrinsic image decomposition method and system based on a cross convolution neural network, which are used for solving one or more technical problems. In the invention, the reflectivity of the image of the result of the decomposition of the intrinsic image is kept consistent on the same object, the image quality is higher in terms of protecting edge information and removing noise, and the image quality is closer to that of a true image in terms of detail and definition.
In order to achieve the above purpose, the invention adopts the following technical scheme:
The invention discloses an intrinsic image decomposition method based on a cross convolution neural network, which comprises the following steps:
inputting an original image to be decomposed into a trained GoogLeNet-VGG19 cross convolution neural network model to obtain an illumination map and a reflection map which are obtained by decomposing the original image;
wherein the GoogLeNet-VGG19 cross convolution neural network model is formed by cross fusion of an illumination map generation network and a reflection map generation network; the illumination map generation network is constructed based on GoogLeNet convolutional neural networks, and the reflection map generation network is constructed based on VGG19 convolutional neural networks;
the step of obtaining the trained GoogLeNet-VGG19 cross convolution neural network model comprises the step of training an illumination map generation network and a reflection map generation network by adopting an Adam optimization method.
The invention further improves that the steps of the illumination map generation network based on GoogLeNet convolutional neural network construction specifically comprise:
After the GoogLeNet convolutional neural network inception a carries out 4 convolutional operations on the second layer, 1 ReLU activation function is added respectively, and the 4 ReLU activation functions are output to the DepthConcat layer of inception a together;
After the GoogLeNet convolutional neural network inception b carries out 4 convolutional operations on the second layer, 1 ReLU activation function is added respectively, and the 4 ReLU activation functions are output to DepthConcat layers of inception b together;
In GoogLeNet convolutional neural network inception a, the first layer 2 convolutional operations are connected with the second layer 2 convolutional operations to form 2 connection channels; adding 1 ReLU activation function and MaxPool operation combinations on 2 connection channels respectively, wherein the ReLU activation function is in front, and MaxPool operation is behind;
In GoogLeNet convolutional neural network inception b, the first layer 2 convolutional operations are connected with the second layer 2 convolutional operations to form 2 connection channels; adding 1 ReLU activation function and MaxPool operation combinations on 2 connection channels respectively, wherein the ReLU activation function is in front, and MaxPool operation is behind;
The DepthConcat layer output of GoogLeNet convolutional neural network inception b is hopped to the DepthConcat layer of inception d; the convolution operation output following the AveragePool operation of the first layer of GoogLeNet convolutional neural network inception e is directly connected to the DepthConcat layer of inception e; the DepthConcat layer output of GoogLeNet convolutional neural network inception e is hopped to connect to the DepthConcat layer of inception b;
After the GoogLeNet convolutional neural network inception a carries out 4 convolutional operations on the third layer, 1 ReLU activation function is added respectively, and the 4 ReLU activation functions are output to the DepthConcat layer of inception a together;
after the GoogLeNet convolutional neural network inception b carries out 4 convolutional operations on the second layer, 1 ReLU activation function is added respectively, and the 4 ReLU activation functions are output to the DepthConcat layer of inception b together;
An FC layer is added after GoogLeNet convolving the FC layer of the neural network.
The invention further improves that the steps of the reflection map generating network based on VGG19 convolutional neural network construction specifically comprise:
Concat operation is carried out on the first MaxPool output result and the second MaxPool output result of the VGG19 convolutional neural network, and the obtained result is input to a fifth layer of the VGG19 convolutional neural network;
concat operation is carried out on the third MaxPool output result and the fourth MaxPool output result of the VGG19 convolutional neural network, and the obtained result is input to a tenth layer of the VGG19 convolutional neural network;
deleting seventeenth and eighteenth layers of the VGG19 convolutional neural network; and adding two layers with the structure identical to that of the sixteenth layer after the sixteenth layer of the VGG19 convolutional neural network to form a seventeenth layer and an eighteenth layer of the modified VGG19 convolutional neural network.
The invention further improves that the steps of the intersection fusion of the illumination map generation network and the reflection map generation network specifically comprise:
Connecting the DepthConcat layer output of GoogLeNet convolutional neural network inception e to the tenth layer of VGG19 convolutional neural network;
a convolution operation that connects the fourth MaxPool output of the VGG19 convolutional neural network to the second layer of GoogLeNet convolutional neural network inception a.
The invention further improves that the expression of the Loss function Loss1 of the illumination map generation network is as follows:
Wherein X is the input image, For the predicted image, H, W, C are the height, width and channel number of the input image, X, y represent pixel coordinates of the image, C represent channels, mu i represent weights at the i-th scale, X (i) represents the image at the i-th scale,Representing the predicted image at the ith scale generated by the modified GoogLeNet convolutional neural network.
The invention is further improved in that the expression of the Loss function Loss2 of the reflection map generation network is as follows:
wherein Y represents an input image, Representing the estimated value of the input image after the improved VGG19 network processing, C j,Hj,Wj represents the channel number, the height and the width of the j-th layer output characteristic diagram, V j (-) represents the output of the activation function when the j-th layer network processes the image, and j represents the layer number.
The invention further improves that the training steps of the illumination map generation network and the reflection map generation network by adopting the Adam optimization method specifically comprise the following steps:
taking images in a pre-constructed training image sample library as samples, and training an illumination map generation network and a reflection map generation network simultaneously by adopting an Adam optimization method;
In the training process, inputting the illumination map output by the illumination map generating network into an identification network, and identifying the probability that the illumination map output by the network is consistent with the training sample label image, and reversely updating the network parameters of the illumination map generating network; inputting the reflectogram output by the reflectogram generating network into an identification network, identifying the probability that the reflectogram output by the network is consistent with the training sample label image, and reversely updating the network parameters of the reflectogram generating network;
when the Loss function Loss1 reaches the minimum, stopping training the illumination map generation network to obtain a final illumination map generation network; when the Loss function Loss2 reaches the minimum, stopping training the reflectogram generation network to obtain a final reflectogram generation network;
the identification network is a multi-layer convolutional neural network and comprises six identical layers; each layer is in turn a convolution operation, a Sigmoid activation function, and MaxPool.
A further improvement of the invention is that Adam optimization parameter beta is set to (0.9,0.999), learning rate is 0.005, weight decay is 0.0001, epoch=100, and batch size=20.
The invention discloses an intrinsic image decomposition system based on a cross convolution neural network, which comprises the following components:
The decomposition module is used for inputting an original image to be decomposed into a trained GoogLeNet-VGG19 cross convolution neural network model to obtain an illumination map and a reflection map which are obtained by decomposing the original image;
wherein the GoogLeNet-VGG19 cross convolution neural network model is formed by cross fusion of an illumination map generation network and a reflection map generation network; the illumination map generation network is constructed based on GoogLeNet convolutional neural networks, and the reflection map generation network is constructed based on VGG19 convolutional neural networks;
the step of obtaining the trained GoogLeNet-VGG19 cross convolution neural network model comprises the step of training an illumination map generation network and a reflection map generation network by adopting an Adam optimization method.
A further improvement of the present invention is that,
The steps of the illumination map generation network based on GoogLeNet convolutional neural network construction specifically comprise:
After the GoogLeNet convolutional neural network inception a carries out 4 convolutional operations on the second layer, 1 ReLU activation function is added respectively, and the 4 ReLU activation functions are output to the DepthConcat layer of inception a together;
After the GoogLeNet convolutional neural network inception b carries out 4 convolutional operations on the second layer, 1 ReLU activation function is added respectively, and the 4 ReLU activation functions are output to DepthConcat layers of inception b together;
In GoogLeNet convolutional neural network inception a, the first layer 2 convolutional operations are connected with the second layer 2 convolutional operations to form 2 connection channels; adding 1 ReLU activation function and MaxPool operation combinations on 2 connection channels respectively, wherein the ReLU activation function is in front, and MaxPool operation is behind;
In GoogLeNet convolutional neural network inception b, the first layer 2 convolutional operations are connected with the second layer 2 convolutional operations to form 2 connection channels; adding 1 ReLU activation function and MaxPool operation combinations on 2 connection channels respectively, wherein the ReLU activation function is in front, and MaxPool operation is behind;
The DepthConcat layer output of GoogLeNet convolutional neural network inception b is hopped to the DepthConcat layer of inception d; the convolution operation output following the AveragePool operation of the first layer of GoogLeNet convolutional neural network inception e is directly connected to the DepthConcat layer of inception e; the DepthConcat layer output of GoogLeNet convolutional neural network inception e is hopped to connect to the DepthConcat layer of inception b;
After the GoogLeNet convolutional neural network inception a carries out 4 convolutional operations on the third layer, 1 ReLU activation function is added respectively, and the 4 ReLU activation functions are output to the DepthConcat layer of inception a together;
after the GoogLeNet convolutional neural network inception b carries out 4 convolutional operations on the second layer, 1 ReLU activation function is added respectively, and the 4 ReLU activation functions are output to the DepthConcat layer of inception b together;
adding an FC layer after GoogLeNet convolutional neural network;
the step of the reflection map generation network based on VGG19 convolutional neural network construction specifically comprises the following steps:
Concat operation is carried out on the first MaxPool output result and the second MaxPool output result of the VGG19 convolutional neural network, and the obtained result is input to a fifth layer of the VGG19 convolutional neural network;
concat operation is carried out on the third MaxPool output result and the fourth MaxPool output result of the VGG19 convolutional neural network, and the obtained result is input to a tenth layer of the VGG19 convolutional neural network;
deleting seventeenth and eighteenth layers of the VGG19 convolutional neural network; adding two layers with the structure identical to that of the sixteenth layer after the sixteenth layer of the VGG19 convolutional neural network to form a seventeenth layer and an eighteenth layer of the modified VGG19 convolutional neural network;
The step of performing cross fusion between the illumination map generation network and the reflection map generation network specifically comprises the following steps:
Connecting the DepthConcat layer output of GoogLeNet convolutional neural network inception e to the tenth layer of VGG19 convolutional neural network;
a convolution operation that connects the fourth MaxPool output of the VGG19 convolutional neural network to the second layer of GoogLeNet convolutional neural network inception a.
Compared with the prior art, the invention has the following beneficial effects:
The invention provides an essential image decomposition method based on an improved GoogLeNet-VGG19 cross convolution neural network, which comprises the steps of firstly constructing a training image sample library, then carrying out improved construction of a light map generation network based on a traditional GoogLeNet convolution neural network, carrying out improved construction of a reflection map generation network based on the traditional VGG19 convolution neural network, and carrying out cross fusion of the light map generation network and the reflection map generation network; next, constructing an identification network; and finally training the illumination map generation network and the reflection map generation network by adopting an Adam optimization method to obtain a final illumination map generation network and a final reflection map generation network. The image of the result of the decomposition of the intrinsic image is consistent in reflectivity of the same object, better in protecting edge information and removing noise, higher in image quality, and closer to the true image in detail or definition.
The system is used for decomposing the intrinsic image, compared with the image decomposed by the existing method, the problems that a lot of noise exists and the edges of the image are blurred are solved, the reflectivity of the output image of the method is kept consistent on the same object, the protection of edge information and the removal of noise are better, and the image quality is higher; the resulting results, both in detail and sharpness, are more closely related to the truth image.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the following description of the embodiments or the drawings used in the description of the prior art will make a brief description; it will be apparent to those of ordinary skill in the art that the drawings in the following description are of some embodiments of the invention and that other drawings may be derived from them without undue effort.
FIG. 1 is a flow chart of an essential image decomposition method based on a modified GoogLeNet-VGG19 cross-convolution neural network according to an embodiment of the invention;
FIG. 2 is a schematic diagram of the results of the decomposition of an intrinsic image in an embodiment of the present invention; fig. 2 (a) is an original image schematic diagram, fig. 2 (b) is an illumination schematic diagram obtained by decomposition, and fig. 2 (c) is a reflection schematic diagram obtained by decomposition.
Detailed Description
In order to make the purposes, technical effects and technical solutions of the embodiments of the present invention more clear, the technical solutions of the embodiments of the present invention are clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention; it will be apparent that the described embodiments are some of the embodiments of the present invention. Other embodiments, which may be made by those of ordinary skill in the art based on the disclosed embodiments without undue burden, are within the scope of the present invention.
Referring to fig. 1, an intrinsic image decomposition method based on a modified GoogLeNet-VGG19 cross convolution neural network according to an embodiment of the invention includes the following steps:
Step 1: constructing a training image sample library;
Taking out P images and corresponding illumination patterns and reflection patterns from an intrinsic image database by adopting a public intrinsic image database; then, carrying out random clipping on the P images to clip out a plurality of image blocks with specified sizes; then carrying out image processing on the image blocks, namely carrying out horizontal overturning, vertical overturning, rotation and mirroring randomly to expand a database; the image blocks after image processing and the illumination patterns and reflection patterns corresponding to the image blocks form a training image sample library;
step 2: adopting the improved GoogLeNet convolutional neural network to construct an illumination map generation network, the method is as follows:
Step 2-1: adding 1 ReLU activation function after 4 convolution operations of the second layer of GoogLeNet convolutional neural network inception a respectively, wherein the total number of the 4 ReLU activation functions is 4, and the 4 ReLU activation functions are output to DepthConcat layers of inception a together;
Step 2-2: adding 1 ReLU activation function after 4 convolution operations of the second layer of GoogLeNet convolutional neural network inception b respectively, wherein the total of 4 ReLU activation functions are output to DepthConcat layers of inception b together;
Step 2-3: in GoogLeNet convolutional neural network inception a, the first layer 2 convolutional operations are connected with the second layer 2 convolutional operations to form 2 connection channels; adding 1 ReLU activation function and MaxPool operation combinations on the 2 connection channels respectively, wherein 2 ReLU activation functions and MaxPool operation combinations are added in total; the ReLU activation function is in front, maxPool operates in back;
step 2-4: in GoogLeNet convolutional neural network inception b, the first layer 2 convolutional operations are connected with the second layer 2 convolutional operations to form 2 connection channels; adding 1 ReLU activation function and MaxPool operation combinations on the 2 connection channels respectively, wherein 2 ReLU activation functions and MaxPool operation combinations are added in total; the ReLU activation function is in front, maxPool operates in back;
Step 2-5: the DepthConcat layer output of GoogLeNet convolutional neural network inception b is hopped to the DepthConcat layer of inception d;
Step 2-6: the convolution operation output following the AveragePool operation of the first layer of GoogLeNet convolutional neural network inception e is directly connected to the DepthConcat layer of inception e;
step 2-7: the DepthConcat layer output of GoogLeNet convolutional neural network inception e is hopped to connect to the DepthConcat layer of inception b;
step 2-8: adding 1 ReLU activation function after the third layer 4 convolution operations of GoogLeNet convolution neural network inception a respectively, wherein the total number of the 4 ReLU activation functions is 4, and the 4 ReLU activation functions are output to DepthConcat layers of inception a together;
step 2-9: adding 1 ReLU activation function after 4 convolution operations of the second layer of GoogLeNet b convolutional neural network inception b respectively, wherein the total of 4 ReLU activation functions are output to DepthConcat layers of inception b together;
Step 2-10: a new FC layer is added behind the FC layer of the GoogLeNet convolutional neural network;
Step 2-11: forming an improved GoogLeNet convolutional neural network by the operations of step 2-1 to step 2-10;
step 3: the reflection map generating network is constructed by adopting the improved VGG19 convolutional neural network, and the method is concretely as follows:
Step 3-1: concat operation is carried out on the first MaxPool output result and the second MaxPool output result of the VGG19 convolutional neural network, and the obtained result is input to a fifth layer of the VGG19 convolutional neural network;
Step 3-2: concat operation is carried out on the third MaxPool output result and the fourth MaxPool output result of the VGG19 convolutional neural network, and the obtained result is input to a tenth layer of the VGG19 convolutional neural network;
step 3-3: deleting seventeenth and eighteenth layers of the VGG19 convolutional neural network;
Step 3-4: adding two identical layers after the sixteenth layer of the VGG19 convolutional neural network to form a new seventeenth layer and an eighteenth layer; the structure of the new seventeenth layer and the eighteenth layer is identical to that of the sixteenth layer;
Step 3-5: forming an improved VGG19 convolutional neural network through the operations of step 3-1 to step 3-4;
Step 4: the illumination map generation network and the reflection map generation network are crossed and fused;
Step 4-1: connecting the DepthConcat layer output of GoogLeNet convolutional neural network inception e to the tenth layer of VGG19 convolutional neural network;
Step 4-2: a convolution operation connecting the fourth MaxPool output of the VGG19 convolutional neural network to the second layer of GoogLeNet convolutional neural network inception a;
step 5: constructing an identification network;
The identification network is a multi-layer convolutional neural network and comprises six identical layers; each layer is sequentially provided with a convolution operation, a Sigmoid activation function and MaxPool;
step 6: defining a loss function;
Step 6-1: defining a lighting map to generate a network Loss function Loss1:
wherein X is an input image, For the predicted image, H, W, C are the height, width and channel number of the input image, X, y represent pixel coordinates of the image, C represent channels, mu i represent weights at the i-th scale, X (i) represents the image at the i-th scale,Representing a predicted image at an ith scale generated by the modified GoogLeNet convolutional neural network;
step 6-2: defining a reflection map to generate a network Loss function Loss2:
wherein Y represents an input image, Representing an estimated value of an input image after being processed by a modified VGG19 network, wherein C j,Hj,Wj represents the channel number, the height and the width of a j-th layer output characteristic diagram respectively, V j () represents the output of an activation function when the j-th layer network processes the image, and j represents the layer number;
Step 7: training a network;
Taking images in the training image sample library constructed in the step 1 as samples, and training an illumination map generation network and a reflection map generation network simultaneously by adopting an Adam optimization method;
In the training process, inputting the illumination map output by the illumination map generating network into an identification network, and identifying the probability that the illumination map output by the network is consistent with the training sample label image, and reversely updating the network parameters of the illumination map generating network; inputting the reflectogram output by the reflectogram generating network into an identification network, identifying the probability that the reflectogram output by the network is consistent with the training sample label image, and reversely updating the network parameters of the reflectogram generating network;
when the Loss function Loss1 reaches the minimum, stopping training the illumination map generation network to obtain a final illumination map generation network; when the Loss function Loss2 reaches the minimum, stopping training the reflectogram generation network to obtain a final reflectogram generation network;
step 8: and (3) respectively inputting the original image to be decomposed into a final illumination map generation network and a reflection map generation network which are obtained in the step (7), wherein the output image is the illumination map and the reflection map which are obtained by decomposing the original image.
In the embodiment of the present invention, the size of the image block with the specified size in the step 1 is 224×224.
In the embodiment of the present invention, the parameters set during the training of the network in the step 7 are as follows: adam optimization parameter beta is set to (0.9,0.999), learning rate is 0.005, weight decay is 0.0001, epoch=100, and batch size=20.
Compared with the image decomposed by the existing method, the method has the advantages that a plurality of noises exist, and the edges of the image are blurred, the reflectivity of the image output by the method in the embodiment of the invention is consistent, the protection of edge information and the removal of noises are better, and the image quality is higher; the results generated by the method of the invention are closer to the truth image in terms of detail and definition.
The embodiment of the invention discloses an intrinsic image decomposition system based on a cross convolution neural network, which comprises the following components:
The decomposition module is used for inputting an original image to be decomposed into a trained GoogLeNet-VGG19 cross convolution neural network model to obtain an illumination map and a reflection map which are obtained by decomposing the original image;
wherein the GoogLeNet-VGG19 cross convolution neural network model is formed by cross fusion of an illumination map generation network and a reflection map generation network; the illumination map generation network is constructed based on GoogLeNet convolutional neural networks, and the reflection map generation network is constructed based on VGG19 convolutional neural networks;
the step of obtaining the trained GoogLeNet-VGG19 cross convolution neural network model comprises the step of training an illumination map generation network and a reflection map generation network by adopting an Adam optimization method.
Referring to fig. 1 and 2, an essential image decomposition method based on a modified GoogLeNet-VGG19 cross convolution neural network according to an embodiment of the invention includes the following steps:
(1) Building a training image sample library
Using MPCal intrinsic image datasets, 1000 images were taken, 50 224 x 224 image blocks were randomly cropped in each image, and then the image blocks were randomly flipped horizontally, flipped vertically, rotated and mirrored, after which the 50 image blocks were changed to 200 image blocks. At this time, the total number of image blocks is 20 ten thousand. Meanwhile, in the illumination map and the reflection map corresponding to 1000 images, the illumination blocks and the reflection blocks corresponding to 20 ten thousand image blocks are found. The training image sample library is formed by the image blocks and the corresponding illumination blocks and reflection blocks.
(2) The illumination map generation network and the reflection map generation network constructed by the method are trained simultaneously by using a training image sample library, an Adam optimization method is adopted, adam optimization parameters beta are set to be (0.9,0.999), the learning rate is 0.005, the weight attenuation is 0.0001, epoch=100, and the batch size=20. And stopping training when the loss functions of the two generating networks are minimum, and obtaining a final illumination map generating network and a reflection map generating network. In the training process, inputting the illumination map output by the illumination map generating network into an identification network, and identifying the probability that the illumination map output by the network is consistent with the training sample label image, and reversely updating the network parameters of the illumination map generating network; inputting the reflectogram output by the reflectogram generating network into an identification network, identifying the probability that the reflectogram output by the network is consistent with the training sample label image, and reversely updating the network parameters of the reflectogram generating network; the generating network and the identifying network adopt TTUR training methods, and the ratio of the training times of the identifying network to the training times of the generating network is 3 to 1.
(3) As shown in fig. 2, the original image to be processed (shown in fig. 2 (a)) is input into the final illumination map generation network and the reflection map generation network, respectively, and the output image is the illumination map and the reflection map (shown in fig. 2 (b) and (c)) obtained by decomposing the original image. The method has the advantages that the noise of the decomposition result of the intrinsic image is less, the edge of the image is clear, the overall definition and quality of the image reach higher level, and the effectiveness and the practicability of the method are fully illustrated.
In summary, the embodiment of the invention provides an essential image decomposition method based on an improved GoogLeNet-VGG19 cross convolution neural network, which comprises the steps of firstly constructing a training image sample library, then carrying out improved construction of an illumination map generation network based on a traditional GoogLeNet convolution neural network, carrying out improved construction of a reflection map generation network based on the traditional VGG19 convolution neural network, and carrying out cross fusion of the illumination map generation network and the reflection map generation network; next, constructing an identification network; and finally training the illumination map generation network and the reflection map generation network by adopting an Adam optimization method to obtain a final illumination map generation network and a final reflection map generation network. The image of the result of the decomposition of the intrinsic image is consistent in reflectivity of the same object, better in protecting edge information and removing noise, higher in image quality, and closer to the true image in detail or definition.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The above embodiments are only for illustrating the technical solution of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the above embodiments, one skilled in the art may make modifications and equivalents to the specific embodiments of the present invention, and any modifications and equivalents not departing from the spirit and scope of the present invention are within the scope of the claims of the present invention.
Claims (3)
1. The intrinsic image decomposition method based on the cross convolution neural network is characterized by comprising the following steps of:
inputting an original image to be decomposed into a trained GoogLeNet-VGG19 cross convolution neural network model to obtain an illumination map and a reflection map which are obtained by decomposing the original image;
wherein the GoogLeNet-VGG19 cross convolution neural network model is formed by cross fusion of an illumination map generation network and a reflection map generation network; the illumination map generation network is constructed based on GoogLeNet convolutional neural networks, and the reflection map generation network is constructed based on VGG19 convolutional neural networks;
The obtaining step of the trained GoogLeNet-VGG19 cross convolution neural network model comprises the steps of training an illumination map generation network and a reflection map generation network by adopting an Adam optimization method;
Wherein,
The steps of the illumination map generation network based on GoogLeNet convolutional neural network construction specifically comprise:
After the GoogLeNet convolutional neural network inception a carries out 4 convolutional operations on the second layer, 1 ReLU activation function is added respectively, and the 4 ReLU activation functions are output to the DepthConcat layer of inception a together;
After the GoogLeNet convolutional neural network inception b carries out 4 convolutional operations on the second layer, 1 ReLU activation function is added respectively, and the 4 ReLU activation functions are output to DepthConcat layers of inception b together;
In GoogLeNet convolutional neural network inception a, the first layer 2 convolutional operations are connected with the second layer 2 convolutional operations to form 2 connection channels; adding 1 ReLU activation function and MaxPool operation combinations on 2 connection channels respectively, wherein the ReLU activation function is in front, and MaxPool operation is behind;
In GoogLeNet convolutional neural network inception b, the first layer 2 convolutional operations are connected with the second layer 2 convolutional operations to form 2 connection channels; adding 1 ReLU activation function and MaxPool operation combinations on 2 connection channels respectively, wherein the ReLU activation function is in front, and MaxPool operation is behind;
The DepthConcat layer output of GoogLeNet convolutional neural network inception b is hopped to the DepthConcat layer of inception d; the convolution operation output following the AveragePool operation of the first layer of GoogLeNet convolutional neural network inception e is directly connected to the DepthConcat layer of inception e; the DepthConcat layer output of GoogLeNet convolutional neural network inception e is hopped to connect to the DepthConcat layer of inception b;
After the GoogLeNet convolutional neural network inception a carries out 4 convolutional operations on the third layer, 1 ReLU activation function is added respectively, and the 4 ReLU activation functions are output to the DepthConcat layer of inception a together;
after the GoogLeNet convolutional neural network inception b carries out 4 convolutional operations on the second layer, 1 ReLU activation function is added respectively, and the 4 ReLU activation functions are output to the DepthConcat layer of inception b together;
adding an FC layer after GoogLeNet convolutional neural network;
the steps of the reflection map generating network based on VGG19 convolutional neural network construction specifically comprise:
Concat operation is carried out on the first MaxPool output result and the second MaxPool output result of the VGG19 convolutional neural network, and the obtained result is input to a fifth layer of the VGG19 convolutional neural network;
concat operation is carried out on the third MaxPool output result and the fourth MaxPool output result of the VGG19 convolutional neural network, and the obtained result is input to a tenth layer of the VGG19 convolutional neural network;
deleting seventeenth and eighteenth layers of the VGG19 convolutional neural network; adding two layers with the structure identical to that of the sixteenth layer after the sixteenth layer of the VGG19 convolutional neural network to form a seventeenth layer and an eighteenth layer of the modified VGG19 convolutional neural network;
the step of cross-fusing the illumination map generation network and the reflection map generation network specifically comprises the following steps:
Connecting the DepthConcat layer output of GoogLeNet convolutional neural network inception e to the tenth layer of VGG19 convolutional neural network;
a convolution operation connecting the fourth MaxPool output of the VGG19 convolutional neural network to the second layer of GoogLeNet convolutional neural network inception a;
The expression of the Loss function Loss1 of the illumination map generation network is:
Wherein X is the input image, For the predicted image, H, W, C are the height, width and channel number of the input image, X, y represent pixel coordinates of the image, C represent channels, mu i represent weights at the i-th scale, X (i) represents the image at the i-th scale,Representing a predicted image at an ith scale generated by the modified GoogLeNet convolutional neural network;
the expression of the Loss function Loss2 of the reflection map generation network is:
wherein Y represents an input image, Representing an estimated value of an input image after being processed by a modified VGG19 network, wherein C j,Hj,Wj represents the channel number, the height and the width of a j-th layer output characteristic diagram respectively, V j () represents the output of an activation function when the j-th layer network processes the image, and j represents the layer number;
The training of the illumination map generation network and the reflection map generation network by adopting the Adam optimization method specifically comprises the following steps:
taking images in a pre-constructed training image sample library as samples, and training an illumination map generation network and a reflection map generation network simultaneously by adopting an Adam optimization method;
In the training process, inputting the illumination map output by the illumination map generating network into an identification network, and identifying the probability that the illumination map output by the network is consistent with the training sample label image, and reversely updating the network parameters of the illumination map generating network; inputting the reflectogram output by the reflectogram generating network into an identification network, identifying the probability that the reflectogram output by the network is consistent with the training sample label image, and reversely updating the network parameters of the reflectogram generating network;
when the Loss function Loss1 reaches the minimum, stopping training the illumination map generation network to obtain a final illumination map generation network; when the Loss function Loss2 reaches the minimum, stopping training the reflectogram generation network to obtain a final reflectogram generation network;
the identification network is a multi-layer convolutional neural network and comprises six identical layers; each layer is in turn a convolution operation, a Sigmoid activation function, and MaxPool.
2. The intrinsic image decomposition method based on a cross convolution neural network according to claim 1, wherein Adam optimization parameter beta is set to (0.9,0.999), learning rate is 0.005, weight attenuation is 0.0001, epoch=100, and batch size=20.
3. An intrinsic image decomposing system based on a cross convolution neural network for implementing the intrinsic image decomposing method of claim 1, characterized in that the intrinsic image decomposing system comprises:
The decomposition module is used for inputting an original image to be decomposed into a trained GoogLeNet-VGG19 cross convolution neural network model to obtain an illumination map and a reflection map which are obtained by decomposing the original image;
wherein the GoogLeNet-VGG19 cross convolution neural network model is formed by cross fusion of an illumination map generation network and a reflection map generation network; the illumination map generation network is constructed based on GoogLeNet convolutional neural networks, and the reflection map generation network is constructed based on VGG19 convolutional neural networks;
the step of obtaining the trained GoogLeNet-VGG19 cross convolution neural network model comprises the step of training an illumination map generation network and a reflection map generation network by adopting an Adam optimization method.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110385353.8A CN113034353B (en) | 2021-04-09 | 2021-04-09 | Intrinsic image decomposition method and system based on cross convolution neural network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110385353.8A CN113034353B (en) | 2021-04-09 | 2021-04-09 | Intrinsic image decomposition method and system based on cross convolution neural network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113034353A CN113034353A (en) | 2021-06-25 |
CN113034353B true CN113034353B (en) | 2024-07-12 |
Family
ID=76456400
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110385353.8A Active CN113034353B (en) | 2021-04-09 | 2021-04-09 | Intrinsic image decomposition method and system based on cross convolution neural network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113034353B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113657521B (en) * | 2021-08-23 | 2023-09-19 | 天津大学 | Method for separating two mutually exclusive components in image |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108416805A (en) * | 2018-03-12 | 2018-08-17 | 中山大学 | A kind of intrinsic image decomposition method and device based on deep learning |
CN110232661A (en) * | 2019-05-03 | 2019-09-13 | 天津大学 | Low illumination colour-image reinforcing method based on Retinex and convolutional neural networks |
Family Cites Families (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10586310B2 (en) * | 2017-04-06 | 2020-03-10 | Pixar | Denoising Monte Carlo renderings using generative adversarial neural networks |
US10706508B2 (en) * | 2018-03-29 | 2020-07-07 | Disney Enterprises, Inc. | Adaptive sampling in Monte Carlo renderings using error-predicting neural networks |
CN108764250B (en) * | 2018-05-02 | 2021-09-17 | 西北工业大学 | Method for extracting essential image by using convolutional neural network |
WO2020068158A1 (en) * | 2018-09-24 | 2020-04-02 | Google Llc | Photo relighting using deep neural networks and confidence learning |
CN110675336A (en) * | 2019-08-29 | 2020-01-10 | 苏州千视通视觉科技股份有限公司 | Low-illumination image enhancement method and device |
CN111242868B (en) * | 2020-01-16 | 2023-05-02 | 重庆邮电大学 | Image enhancement method based on convolutional neural network in scotopic vision environment |
CN111563577B (en) * | 2020-04-21 | 2022-03-11 | 西北工业大学 | Unet-based intrinsic image decomposition method for skip layer frequency division and multi-scale identification |
CN111681223B (en) * | 2020-06-09 | 2023-04-18 | 安徽理工大学 | Method for detecting mine well wall under low illumination condition based on convolutional neural network |
GB2598711B (en) * | 2020-08-11 | 2023-10-18 | Toshiba Kk | A Computer Vision Method and System |
CN112131975B (en) * | 2020-09-08 | 2022-11-15 | 东南大学 | Face illumination processing method based on Retinex decomposition and generation of confrontation network |
-
2021
- 2021-04-09 CN CN202110385353.8A patent/CN113034353B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108416805A (en) * | 2018-03-12 | 2018-08-17 | 中山大学 | A kind of intrinsic image decomposition method and device based on deep learning |
CN110232661A (en) * | 2019-05-03 | 2019-09-13 | 天津大学 | Low illumination colour-image reinforcing method based on Retinex and convolutional neural networks |
Also Published As
Publication number | Publication date |
---|---|
CN113034353A (en) | 2021-06-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111080620B (en) | Road disease detection method based on deep learning | |
CN109978807B (en) | Shadow removing method based on generating type countermeasure network | |
CN110084817B (en) | Digital elevation model production method based on deep learning | |
CN112001407A (en) | Model iterative training method and system based on automatic labeling | |
CN114022586B (en) | Defect image generation method based on countermeasure generation network | |
Jiao et al. | Guided-Pix2Pix: End-to-end inference and refinement network for image dehazing | |
CN113034353B (en) | Intrinsic image decomposition method and system based on cross convolution neural network | |
CN112101364A (en) | Semantic segmentation method based on parameter importance incremental learning | |
CN116645369A (en) | Anomaly detection method based on twin self-encoder and two-way information depth supervision | |
CN113762265A (en) | Pneumonia classification and segmentation method and system | |
CN113989290A (en) | Wrinkle segmentation method based on U-Net | |
CN116563250A (en) | Recovery type self-supervision defect detection method, device and storage medium | |
Zhao et al. | Layer-wise multi-defect detection for laser powder bed fusion using deep learning algorithm with visual explanation | |
CN115222750A (en) | Remote sensing image segmentation method and system based on multi-scale fusion attention | |
CN108537266A (en) | A kind of cloth textured fault sorting technique of depth convolutional network | |
CN114943655B (en) | Image restoration system for generating countermeasure network structure based on cyclic depth convolution | |
CN113780547A (en) | Computer implementation method, computer system and computer program product | |
CN117576453A (en) | Cross-domain armored target detection method, system, electronic equipment and storage medium | |
CN116091784A (en) | Target tracking method, device and storage medium | |
CN115376022A (en) | Application of small target detection algorithm based on neural network in unmanned aerial vehicle aerial photography | |
CN116309545A (en) | Single-stage cell nucleus instance segmentation method for medical microscopic image | |
CN115331052A (en) | Garbage data labeling system and method based on deep learning | |
CN117934337B (en) | Method for mask repair of blocked chromosome based on unsupervised learning | |
CN118314360B (en) | Image self-adaptive quick recognition method based on deep learning | |
CN117593648B (en) | Remote sensing target building extraction method based on weak supervision learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |