CN113034353B

CN113034353B - Intrinsic image decomposition method and system based on cross convolution neural network

Info

Publication number: CN113034353B
Application number: CN202110385353.8A
Authority: CN
Inventors: 权炜; 孙燕平; 于军琪; 董芳楠
Original assignee: Xian University of Architecture and Technology
Current assignee: Xian University of Architecture and Technology
Priority date: 2021-04-09
Filing date: 2021-04-09
Publication date: 2024-07-12
Anticipated expiration: 2041-04-09
Also published as: CN113034353A

Abstract

The invention discloses an intrinsic image decomposition method and system based on a cross convolution neural network, wherein the method comprises the following steps: inputting an original image to be decomposed into a trained GoogLeNet-VGG19 cross convolution neural network model to obtain an illumination map and a reflection map which are obtained by decomposing the original image; wherein the GoogLeNet-VGG19 cross convolution neural network model is formed by cross fusion of an illumination map generation network and a reflection map generation network; the illumination map generation network is constructed based on GoogLeNet convolutional neural networks, and the reflection map generation network is constructed based on VGG19 convolutional neural networks; training the illumination map generation network and the reflection map generation network by adopting an Adam optimization method. In the invention, the reflectivity of the image of the result of the decomposition of the intrinsic image is kept consistent on the same object, the image quality is higher in terms of protecting edge information and removing noise, and the image quality is closer to that of a true image in terms of detail and definition.

Description

Intrinsic image decomposition method and system based on cross convolution neural network

Technical Field

The invention belongs to the technical field of image processing, and particularly relates to an intrinsic image decomposition method and system based on a cross convolution neural network.

Background

The original image decomposition was first proposed by Barrow and Tenenbaum in 1978, and the original image solving problem is to recover the brightness and reflectivity information in the scenes corresponding to all the pixel points from the image, so as to form an illumination map and a reflection map respectively. The intrinsic image decomposition is mainly divided into two types according to algorithm types, the first is intrinsic image decomposition based on Retinex theory, and the second is intrinsic image decomposition based on deep learning. The conventional intrinsic image decomposition method Retinex assumes that the larger gradient in the image is caused by the object reflectivity, while the smaller gradient belongs to the illumination variation. Since the Retinex method is entirely gradient-based, the Retinex method builds local constraints.

Another constraint commonly used at present is that natural images contain a small number of colors and that the color distribution is in a structural form, called global color sparsity, i.e. images requiring a reflectivity layer contain only a few colors. Because the gradient-based method can only establish local constraint, the obtained reflectivity layer image may have global inconsistency, namely, the reflectivities of two pixels of the same material which are far away from each other are inconsistent, and the addition of a plurality of images in the same scene puts strict requirements on the input of the intrinsic image method. After the gradient values of the reflectivity and brightness images are estimated, the gradient images are integrated by means of Weiss to solve the reflection map and the illumination map. However, this method requires a large number of samples to train the classifier, is time-consuming, and the obtained intrinsic image has a large error at the edge, and the finally obtained intrinsic image is blurred at the edge, so that the sample is required to train the classifier, and the overfitting phenomenon of the sample may occur.

The method for decomposing the intrinsic image based on deep learning improves the problems to a certain extent, but has a plurality of defects, such as Narihira and the like, because of the defects of network design, the image is downsampled to an excessively small scale, so that a large amount of information is lost after recovery, and the output result is fuzzy; fan et al integrated a filter in the network to flatten the reflective layer, removing residual noise and geometry information, but neglecting protection of image details resulting in jagged edges.

Disclosure of Invention

The invention aims to provide an intrinsic image decomposition method and system based on a cross convolution neural network, which are used for solving one or more technical problems. In the invention, the reflectivity of the image of the result of the decomposition of the intrinsic image is kept consistent on the same object, the image quality is higher in terms of protecting edge information and removing noise, and the image quality is closer to that of a true image in terms of detail and definition.

In order to achieve the above purpose, the invention adopts the following technical scheme:

The invention discloses an intrinsic image decomposition method based on a cross convolution neural network, which comprises the following steps:

inputting an original image to be decomposed into a trained GoogLeNet-VGG19 cross convolution neural network model to obtain an illumination map and a reflection map which are obtained by decomposing the original image;

wherein the GoogLeNet-VGG19 cross convolution neural network model is formed by cross fusion of an illumination map generation network and a reflection map generation network; the illumination map generation network is constructed based on GoogLeNet convolutional neural networks, and the reflection map generation network is constructed based on VGG19 convolutional neural networks;

the step of obtaining the trained GoogLeNet-VGG19 cross convolution neural network model comprises the step of training an illumination map generation network and a reflection map generation network by adopting an Adam optimization method.

The invention further improves that the steps of the illumination map generation network based on GoogLeNet convolutional neural network construction specifically comprise:

After the GoogLeNet convolutional neural network inception a carries out 4 convolutional operations on the second layer, 1 ReLU activation function is added respectively, and the 4 ReLU activation functions are output to the DepthConcat layer of inception a together;

After the GoogLeNet convolutional neural network inception b carries out 4 convolutional operations on the second layer, 1 ReLU activation function is added respectively, and the 4 ReLU activation functions are output to DepthConcat layers of inception b together;

In GoogLeNet convolutional neural network inception a, the first layer 2 convolutional operations are connected with the second layer 2 convolutional operations to form 2 connection channels; adding 1 ReLU activation function and MaxPool operation combinations on 2 connection channels respectively, wherein the ReLU activation function is in front, and MaxPool operation is behind;

In GoogLeNet convolutional neural network inception b, the first layer 2 convolutional operations are connected with the second layer 2 convolutional operations to form 2 connection channels; adding 1 ReLU activation function and MaxPool operation combinations on 2 connection channels respectively, wherein the ReLU activation function is in front, and MaxPool operation is behind;

The DepthConcat layer output of GoogLeNet convolutional neural network inception b is hopped to the DepthConcat layer of inception d; the convolution operation output following the AveragePool operation of the first layer of GoogLeNet convolutional neural network inception e is directly connected to the DepthConcat layer of inception e; the DepthConcat layer output of GoogLeNet convolutional neural network inception e is hopped to connect to the DepthConcat layer of inception b;

After the GoogLeNet convolutional neural network inception a carries out 4 convolutional operations on the third layer, 1 ReLU activation function is added respectively, and the 4 ReLU activation functions are output to the DepthConcat layer of inception a together;

after the GoogLeNet convolutional neural network inception b carries out 4 convolutional operations on the second layer, 1 ReLU activation function is added respectively, and the 4 ReLU activation functions are output to the DepthConcat layer of inception b together;

An FC layer is added after GoogLeNet convolving the FC layer of the neural network.

The invention further improves that the steps of the reflection map generating network based on VGG19 convolutional neural network construction specifically comprise:

Concat operation is carried out on the first MaxPool output result and the second MaxPool output result of the VGG19 convolutional neural network, and the obtained result is input to a fifth layer of the VGG19 convolutional neural network;

concat operation is carried out on the third MaxPool output result and the fourth MaxPool output result of the VGG19 convolutional neural network, and the obtained result is input to a tenth layer of the VGG19 convolutional neural network;

deleting seventeenth and eighteenth layers of the VGG19 convolutional neural network; and adding two layers with the structure identical to that of the sixteenth layer after the sixteenth layer of the VGG19 convolutional neural network to form a seventeenth layer and an eighteenth layer of the modified VGG19 convolutional neural network.

The invention further improves that the steps of the intersection fusion of the illumination map generation network and the reflection map generation network specifically comprise:

Connecting the DepthConcat layer output of GoogLeNet convolutional neural network inception e to the tenth layer of VGG19 convolutional neural network;

a convolution operation that connects the fourth MaxPool output of the VGG19 convolutional neural network to the second layer of GoogLeNet convolutional neural network inception a.

The invention further improves that the expression of the Loss function Loss1 of the illumination map generation network is as follows:

Wherein X is the input image, For the predicted image, H, W, C are the height, width and channel number of the input image, X, y represent pixel coordinates of the image, C represent channels, mu _i represent weights at the i-th scale, X ⁽ⁱ⁾ represents the image at the i-th scale,Representing the predicted image at the ith scale generated by the modified GoogLeNet convolutional neural network.

The invention is further improved in that the expression of the Loss function Loss2 of the reflection map generation network is as follows:

wherein Y represents an input image, Representing the estimated value of the input image after the improved VGG19 network processing, C _j,H_j,W_j represents the channel number, the height and the width of the j-th layer output characteristic diagram, V _j (-) represents the output of the activation function when the j-th layer network processes the image, and j represents the layer number.

The invention further improves that the training steps of the illumination map generation network and the reflection map generation network by adopting the Adam optimization method specifically comprise the following steps:

taking images in a pre-constructed training image sample library as samples, and training an illumination map generation network and a reflection map generation network simultaneously by adopting an Adam optimization method;

In the training process, inputting the illumination map output by the illumination map generating network into an identification network, and identifying the probability that the illumination map output by the network is consistent with the training sample label image, and reversely updating the network parameters of the illumination map generating network; inputting the reflectogram output by the reflectogram generating network into an identification network, identifying the probability that the reflectogram output by the network is consistent with the training sample label image, and reversely updating the network parameters of the reflectogram generating network;

when the Loss function Loss1 reaches the minimum, stopping training the illumination map generation network to obtain a final illumination map generation network; when the Loss function Loss2 reaches the minimum, stopping training the reflectogram generation network to obtain a final reflectogram generation network;

the identification network is a multi-layer convolutional neural network and comprises six identical layers; each layer is in turn a convolution operation, a Sigmoid activation function, and MaxPool.

A further improvement of the invention is that Adam optimization parameter beta is set to (0.9,0.999), learning rate is 0.005, weight decay is 0.0001, epoch=100, and batch size=20.

The invention discloses an intrinsic image decomposition system based on a cross convolution neural network, which comprises the following components:

The decomposition module is used for inputting an original image to be decomposed into a trained GoogLeNet-VGG19 cross convolution neural network model to obtain an illumination map and a reflection map which are obtained by decomposing the original image;

A further improvement of the present invention is that,

The steps of the illumination map generation network based on GoogLeNet convolutional neural network construction specifically comprise:

adding an FC layer after GoogLeNet convolutional neural network;

the step of the reflection map generation network based on VGG19 convolutional neural network construction specifically comprises the following steps:

deleting seventeenth and eighteenth layers of the VGG19 convolutional neural network; adding two layers with the structure identical to that of the sixteenth layer after the sixteenth layer of the VGG19 convolutional neural network to form a seventeenth layer and an eighteenth layer of the modified VGG19 convolutional neural network;

The step of performing cross fusion between the illumination map generation network and the reflection map generation network specifically comprises the following steps:

Compared with the prior art, the invention has the following beneficial effects:

The invention provides an essential image decomposition method based on an improved GoogLeNet-VGG19 cross convolution neural network, which comprises the steps of firstly constructing a training image sample library, then carrying out improved construction of a light map generation network based on a traditional GoogLeNet convolution neural network, carrying out improved construction of a reflection map generation network based on the traditional VGG19 convolution neural network, and carrying out cross fusion of the light map generation network and the reflection map generation network; next, constructing an identification network; and finally training the illumination map generation network and the reflection map generation network by adopting an Adam optimization method to obtain a final illumination map generation network and a final reflection map generation network. The image of the result of the decomposition of the intrinsic image is consistent in reflectivity of the same object, better in protecting edge information and removing noise, higher in image quality, and closer to the true image in detail or definition.

The system is used for decomposing the intrinsic image, compared with the image decomposed by the existing method, the problems that a lot of noise exists and the edges of the image are blurred are solved, the reflectivity of the output image of the method is kept consistent on the same object, the protection of edge information and the removal of noise are better, and the image quality is higher; the resulting results, both in detail and sharpness, are more closely related to the truth image.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the following description of the embodiments or the drawings used in the description of the prior art will make a brief description; it will be apparent to those of ordinary skill in the art that the drawings in the following description are of some embodiments of the invention and that other drawings may be derived from them without undue effort.

FIG. 1 is a flow chart of an essential image decomposition method based on a modified GoogLeNet-VGG19 cross-convolution neural network according to an embodiment of the invention;

FIG. 2 is a schematic diagram of the results of the decomposition of an intrinsic image in an embodiment of the present invention; fig. 2 (a) is an original image schematic diagram, fig. 2 (b) is an illumination schematic diagram obtained by decomposition, and fig. 2 (c) is a reflection schematic diagram obtained by decomposition.

Detailed Description

In order to make the purposes, technical effects and technical solutions of the embodiments of the present invention more clear, the technical solutions of the embodiments of the present invention are clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention; it will be apparent that the described embodiments are some of the embodiments of the present invention. Other embodiments, which may be made by those of ordinary skill in the art based on the disclosed embodiments without undue burden, are within the scope of the present invention.

Referring to fig. 1, an intrinsic image decomposition method based on a modified GoogLeNet-VGG19 cross convolution neural network according to an embodiment of the invention includes the following steps:

Step 1: constructing a training image sample library;

Taking out P images and corresponding illumination patterns and reflection patterns from an intrinsic image database by adopting a public intrinsic image database; then, carrying out random clipping on the P images to clip out a plurality of image blocks with specified sizes; then carrying out image processing on the image blocks, namely carrying out horizontal overturning, vertical overturning, rotation and mirroring randomly to expand a database; the image blocks after image processing and the illumination patterns and reflection patterns corresponding to the image blocks form a training image sample library;

step 2: adopting the improved GoogLeNet convolutional neural network to construct an illumination map generation network, the method is as follows:

Step 2-1: adding 1 ReLU activation function after 4 convolution operations of the second layer of GoogLeNet convolutional neural network inception a respectively, wherein the total number of the 4 ReLU activation functions is 4, and the 4 ReLU activation functions are output to DepthConcat layers of inception a together;

Step 2-2: adding 1 ReLU activation function after 4 convolution operations of the second layer of GoogLeNet convolutional neural network inception b respectively, wherein the total of 4 ReLU activation functions are output to DepthConcat layers of inception b together;

Step 2-3: in GoogLeNet convolutional neural network inception a, the first layer 2 convolutional operations are connected with the second layer 2 convolutional operations to form 2 connection channels; adding 1 ReLU activation function and MaxPool operation combinations on the 2 connection channels respectively, wherein 2 ReLU activation functions and MaxPool operation combinations are added in total; the ReLU activation function is in front, maxPool operates in back;

step 2-4: in GoogLeNet convolutional neural network inception b, the first layer 2 convolutional operations are connected with the second layer 2 convolutional operations to form 2 connection channels; adding 1 ReLU activation function and MaxPool operation combinations on the 2 connection channels respectively, wherein 2 ReLU activation functions and MaxPool operation combinations are added in total; the ReLU activation function is in front, maxPool operates in back;

Step 2-5: the DepthConcat layer output of GoogLeNet convolutional neural network inception b is hopped to the DepthConcat layer of inception d;

Step 2-6: the convolution operation output following the AveragePool operation of the first layer of GoogLeNet convolutional neural network inception e is directly connected to the DepthConcat layer of inception e;

step 2-7: the DepthConcat layer output of GoogLeNet convolutional neural network inception e is hopped to connect to the DepthConcat layer of inception b;

step 2-8: adding 1 ReLU activation function after the third layer 4 convolution operations of GoogLeNet convolution neural network inception a respectively, wherein the total number of the 4 ReLU activation functions is 4, and the 4 ReLU activation functions are output to DepthConcat layers of inception a together;

step 2-9: adding 1 ReLU activation function after 4 convolution operations of the second layer of GoogLeNet b convolutional neural network inception b respectively, wherein the total of 4 ReLU activation functions are output to DepthConcat layers of inception b together;

Step 2-10: a new FC layer is added behind the FC layer of the GoogLeNet convolutional neural network;

Step 2-11: forming an improved GoogLeNet convolutional neural network by the operations of step 2-1 to step 2-10;

step 3: the reflection map generating network is constructed by adopting the improved VGG19 convolutional neural network, and the method is concretely as follows:

Step 3-1: concat operation is carried out on the first MaxPool output result and the second MaxPool output result of the VGG19 convolutional neural network, and the obtained result is input to a fifth layer of the VGG19 convolutional neural network;

Step 3-2: concat operation is carried out on the third MaxPool output result and the fourth MaxPool output result of the VGG19 convolutional neural network, and the obtained result is input to a tenth layer of the VGG19 convolutional neural network;

step 3-3: deleting seventeenth and eighteenth layers of the VGG19 convolutional neural network;

Step 3-4: adding two identical layers after the sixteenth layer of the VGG19 convolutional neural network to form a new seventeenth layer and an eighteenth layer; the structure of the new seventeenth layer and the eighteenth layer is identical to that of the sixteenth layer;

Step 3-5: forming an improved VGG19 convolutional neural network through the operations of step 3-1 to step 3-4;

Step 4: the illumination map generation network and the reflection map generation network are crossed and fused;

Step 4-1: connecting the DepthConcat layer output of GoogLeNet convolutional neural network inception e to the tenth layer of VGG19 convolutional neural network;

Step 4-2: a convolution operation connecting the fourth MaxPool output of the VGG19 convolutional neural network to the second layer of GoogLeNet convolutional neural network inception a;

step 5: constructing an identification network;

The identification network is a multi-layer convolutional neural network and comprises six identical layers; each layer is sequentially provided with a convolution operation, a Sigmoid activation function and MaxPool;

step 6: defining a loss function;

Step 6-1: defining a lighting map to generate a network Loss function Loss1:

wherein X is an input image, For the predicted image, H, W, C are the height, width and channel number of the input image, X, y represent pixel coordinates of the image, C represent channels, mu _i represent weights at the i-th scale, X ⁽ⁱ⁾ represents the image at the i-th scale,Representing a predicted image at an ith scale generated by the modified GoogLeNet convolutional neural network;

step 6-2: defining a reflection map to generate a network Loss function Loss2:

wherein Y represents an input image, Representing an estimated value of an input image after being processed by a modified VGG19 network, wherein C _j,H_j,W_j represents the channel number, the height and the width of a j-th layer output characteristic diagram respectively, V _j () represents the output of an activation function when the j-th layer network processes the image, and j represents the layer number;

Step 7: training a network;

Taking images in the training image sample library constructed in the step 1 as samples, and training an illumination map generation network and a reflection map generation network simultaneously by adopting an Adam optimization method;

step 8: and (3) respectively inputting the original image to be decomposed into a final illumination map generation network and a reflection map generation network which are obtained in the step (7), wherein the output image is the illumination map and the reflection map which are obtained by decomposing the original image.

In the embodiment of the present invention, the size of the image block with the specified size in the step 1 is 224×224.

In the embodiment of the present invention, the parameters set during the training of the network in the step 7 are as follows: adam optimization parameter beta is set to (0.9,0.999), learning rate is 0.005, weight decay is 0.0001, epoch=100, and batch size=20.

Compared with the image decomposed by the existing method, the method has the advantages that a plurality of noises exist, and the edges of the image are blurred, the reflectivity of the image output by the method in the embodiment of the invention is consistent, the protection of edge information and the removal of noises are better, and the image quality is higher; the results generated by the method of the invention are closer to the truth image in terms of detail and definition.

The embodiment of the invention discloses an intrinsic image decomposition system based on a cross convolution neural network, which comprises the following components:

Referring to fig. 1 and 2, an essential image decomposition method based on a modified GoogLeNet-VGG19 cross convolution neural network according to an embodiment of the invention includes the following steps:

(1) Building a training image sample library

Using MPCal intrinsic image datasets, 1000 images were taken, 50 224 x 224 image blocks were randomly cropped in each image, and then the image blocks were randomly flipped horizontally, flipped vertically, rotated and mirrored, after which the 50 image blocks were changed to 200 image blocks. At this time, the total number of image blocks is 20 ten thousand. Meanwhile, in the illumination map and the reflection map corresponding to 1000 images, the illumination blocks and the reflection blocks corresponding to 20 ten thousand image blocks are found. The training image sample library is formed by the image blocks and the corresponding illumination blocks and reflection blocks.

(2) The illumination map generation network and the reflection map generation network constructed by the method are trained simultaneously by using a training image sample library, an Adam optimization method is adopted, adam optimization parameters beta are set to be (0.9,0.999), the learning rate is 0.005, the weight attenuation is 0.0001, epoch=100, and the batch size=20. And stopping training when the loss functions of the two generating networks are minimum, and obtaining a final illumination map generating network and a reflection map generating network. In the training process, inputting the illumination map output by the illumination map generating network into an identification network, and identifying the probability that the illumination map output by the network is consistent with the training sample label image, and reversely updating the network parameters of the illumination map generating network; inputting the reflectogram output by the reflectogram generating network into an identification network, identifying the probability that the reflectogram output by the network is consistent with the training sample label image, and reversely updating the network parameters of the reflectogram generating network; the generating network and the identifying network adopt TTUR training methods, and the ratio of the training times of the identifying network to the training times of the generating network is 3 to 1.

(3) As shown in fig. 2, the original image to be processed (shown in fig. 2 (a)) is input into the final illumination map generation network and the reflection map generation network, respectively, and the output image is the illumination map and the reflection map (shown in fig. 2 (b) and (c)) obtained by decomposing the original image. The method has the advantages that the noise of the decomposition result of the intrinsic image is less, the edge of the image is clear, the overall definition and quality of the image reach higher level, and the effectiveness and the practicability of the method are fully illustrated.

In summary, the embodiment of the invention provides an essential image decomposition method based on an improved GoogLeNet-VGG19 cross convolution neural network, which comprises the steps of firstly constructing a training image sample library, then carrying out improved construction of an illumination map generation network based on a traditional GoogLeNet convolution neural network, carrying out improved construction of a reflection map generation network based on the traditional VGG19 convolution neural network, and carrying out cross fusion of the illumination map generation network and the reflection map generation network; next, constructing an identification network; and finally training the illumination map generation network and the reflection map generation network by adopting an Adam optimization method to obtain a final illumination map generation network and a final reflection map generation network. The image of the result of the decomposition of the intrinsic image is consistent in reflectivity of the same object, better in protecting edge information and removing noise, higher in image quality, and closer to the true image in detail or definition.

It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The above embodiments are only for illustrating the technical solution of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the above embodiments, one skilled in the art may make modifications and equivalents to the specific embodiments of the present invention, and any modifications and equivalents not departing from the spirit and scope of the present invention are within the scope of the claims of the present invention.

Claims

1. The intrinsic image decomposition method based on the cross convolution neural network is characterized by comprising the following steps of:

The obtaining step of the trained GoogLeNet-VGG19 cross convolution neural network model comprises the steps of training an illumination map generation network and a reflection map generation network by adopting an Adam optimization method;

Wherein,

adding an FC layer after GoogLeNet convolutional neural network;

the steps of the reflection map generating network based on VGG19 convolutional neural network construction specifically comprise:

the step of cross-fusing the illumination map generation network and the reflection map generation network specifically comprises the following steps:

a convolution operation connecting the fourth MaxPool output of the VGG19 convolutional neural network to the second layer of GoogLeNet convolutional neural network inception a;

The expression of the Loss function Loss1 of the illumination map generation network is:

Wherein X is the input image, For the predicted image, H, W, C are the height, width and channel number of the input image, X, y represent pixel coordinates of the image, C represent channels, mu _i represent weights at the i-th scale, X ⁽ⁱ⁾ represents the image at the i-th scale,Representing a predicted image at an ith scale generated by the modified GoogLeNet convolutional neural network;

the expression of the Loss function Loss2 of the reflection map generation network is:

The training of the illumination map generation network and the reflection map generation network by adopting the Adam optimization method specifically comprises the following steps:

2. The intrinsic image decomposition method based on a cross convolution neural network according to claim 1, wherein Adam optimization parameter beta is set to (0.9,0.999), learning rate is 0.005, weight attenuation is 0.0001, epoch=100, and batch size=20.

3. An intrinsic image decomposing system based on a cross convolution neural network for implementing the intrinsic image decomposing method of claim 1, characterized in that the intrinsic image decomposing system comprises: