CN112434702A

CN112434702A - Image processing method, image processing device, computer equipment and storage medium

Info

Publication number: CN112434702A
Application number: CN201910792402.2A
Authority: CN
Inventors: 陈伟涛; 王洪彬; 李�昊
Original assignee: Alibaba Group Holding Ltd
Current assignee: Alibaba Group Holding Ltd
Priority date: 2019-08-26
Filing date: 2019-08-26
Publication date: 2021-03-02

Abstract

The embodiment of the application discloses an image processing method and device. The method comprises the following steps: respectively inputting the first image and the second image into corresponding coding networks for feature extraction to obtain feature data of different coding levels, then merging the feature data of the same coding level to obtain target feature data of different coding levels, sequentially performing superposition operation and decoding operation on target feature data of different coding levels to obtain decoded data, according to the decoded data, determining the difference data between the first image and the second image, so that the two input images can be better distinguished during encoding, and the features of a plurality of encoding layers are fused during decoding, thereby not only utilizing the more abstract high-level features, but also utilizing the more image-like low-level features, and the characteristics of each intermediate level between the two images, the difference can be efficiently extracted from the two images, and the accuracy and efficiency of finding the difference between the images are improved.

Description

Image processing method, image processing device, computer equipment and storage medium

Technical Field

The present application relates to the field of image processing technologies, and in particular, to an image processing method, an image processing apparatus, a computer device, and a computer-readable storage medium.

Background

With the development of computer technology, computer detection has begun to be applied to the comparison task of remote sensing images in two phases, for example, newly added building detection is the most common one, and is mostly used in homeland management law enforcement.

Compared with the newly added building detection, the range of the dynamic soil construction detection is wider. The method not only comprises detection of newly added buildings, but also comprises detection of construction demolition, building reconstruction, construction site change of agricultural land, forest land and soil movement, newly added roads, coastline change and the like. The application scenes are also expanded from single territorial management law enforcement to various scenes such as ecological environment management, disaster degree evaluation, marine offshore area management and the like. Therefore, the method has great significance for solving the problem of how to independently extract the target soil movement change from a plurality of changes in the complex two-stage remote sensing image.

The applicant finds that most of the change areas need to be extracted manually, so that the problem of more missed extraction exists, the efficiency is low, and the general computer detection method is not really applied to actual scenes due to the fact that the number of the found and paired change areas is not ideal.

Disclosure of Invention

In view of the above, the present application is made to provide an image processing method, a computer apparatus, and a computer-readable storage medium that overcome or at least partially solve the above problems.

According to an aspect of the present application, there is provided an image processing method including:

respectively inputting the first image and the second image into corresponding coding networks for feature extraction to obtain feature data of different coding levels;

merging the feature data of the same coding level to obtain target feature data of different coding levels;

sequentially performing superposition operation and decoding operation on the target characteristic data of different coding levels to obtain decoded data;

determining difference data between the first image and the second image according to the decoded data.

Optionally, before the first image and the second image are respectively input to corresponding coding networks for feature extraction, so as to obtain feature data of different coding levels, the method further includes:

receiving two remote sensing images of the same area at different times as the first image and the second image.

Optionally, the respectively inputting the first image and the second image into corresponding coding networks for feature extraction, and obtaining feature data of different coding levels includes:

extracting the characteristics of the first image and the second image in a corresponding coding network to obtain characteristic data of a 1 st coding level;

and performing feature extraction on the feature data of the (N-1) th coding level in the coding network to obtain the feature data of the Nth coding level.

Optionally, the sequentially performing a superposition operation and a decoding operation on the target feature data of the different encoding levels to obtain decoded data includes:

decoding the target characteristic data of the Nth coding level to obtain Nth decoding data;

performing superposition operation and decoding operation on the target characteristic data of the (N-1) th coding level and the Nth decoding data to obtain N-1 th decoding data;

and iteratively executing the superposition operation and the decoding operation until the superposition operation and the decoding operation of the target characteristic data of each coding level are sequentially completed.

Optionally, the performing an overlay operation and a decoding operation on the target feature data of the N-1 th encoding layer and the nth decoded data to obtain the N-1 th decoded data includes:

performing superposition operation on the target characteristic data of the (N-1) th coding level and the Nth decoding data to obtain a superposition result;

and sequentially performing decoding operations on the superposition result through a first convolution layer, a first deconvolution layer and a second convolution layer to obtain the N-1 decoding data, wherein the number of output channels of the first convolution layer is one fourth of the number of input channels, the number of input channels and the number of output channels of the first deconvolution layer are the same, and the number of output channels of the second convolution layer is the same as the number of channels of the target characteristic data of the N-2 coding level.

Optionally, after the iteratively performing the superposition operation and the decoding operation until the superposition operation and the decoding operation of the target feature data of each coding level are sequentially completed, the sequentially performing the superposition operation and the decoding operation on the target feature data of different coding levels to obtain decoded data further includes:

and sequentially carrying out decoding operations on the 1 st decoded data by a second deconvolution layer, a third convolution layer and a fourth convolution layer to obtain the decoded data, wherein the number of output channels of the second deconvolution layer is half of the number of input channels, the number of input channels and the number of output channels of the third convolution layer are the same, and the number of output channels of the fourth convolution layer is 1.

Optionally, the merging the feature data of the same coding level to obtain the target feature data of different coding levels includes:

and merging channels of the characteristic data to obtain the target characteristic data.

Optionally, the determining, from the decoded data, difference data between the first image and the second image comprises:

determining difference degree data at different positions between the first image and the second image according to the decoding data;

and determining difference data between the first image and the second image according to the position information corresponding to the difference degree data meeting the preset requirement.

a neural network for image processing is trained using the first image samples and the second image samples, the neural network including an encoding network and a decoding network.

In accordance with another aspect of the present application, there is provided an image processing method including:

receiving a first remote sensing image and a second remote sensing image which aim at the same area and are different in time;

respectively inputting the first remote sensing image and the second remote sensing image into corresponding coding networks for feature extraction to obtain feature data of different coding levels;

and determining a target area with difference between the first remote sensing image and the second remote sensing image according to the decoded data.

In accordance with another aspect of the present application, there is provided an image processing apparatus including a neural network including a first encoding network, a second encoding network, a decoding network, and a difference determination module;

the first encoding network is configured to: extracting the characteristics of the first image to obtain characteristic data of different coding levels;

the second encoding network is configured to: extracting the features of the second image to obtain feature data of different coding levels;

the decoding network is configured to: merging the feature data of the same coding level to obtain target feature data of different coding levels; sequentially performing superposition operation and decoding operation on the target characteristic data of different coding levels to obtain decoded data;

the discrepancy determining module is to: determining difference data between the first image and the second image according to the decoded data.

According to the embodiment of the application, the first image and the second image are respectively input into the corresponding coding networks for feature extraction to obtain feature data of different coding levels, then merging the feature data of the same coding level to obtain target feature data of different coding levels, sequentially performing superposition operation and decoding operation on target feature data of different coding levels to obtain decoded data, according to the decoded data, determining the difference data between the first image and the second image, so that the two input images can be better distinguished during encoding, and the features of a plurality of encoding layers are fused during decoding, thereby not only utilizing the more abstract high-level features, but also utilizing the more image-like low-level features, and the characteristics of each intermediate level between the two images, the difference can be efficiently extracted from the two images, and the accuracy and efficiency of finding the difference between the images are improved.

The foregoing description is only an overview of the technical solutions of the present application, and the present application can be implemented according to the content of the description in order to make the technical means of the present application more clearly understood, and the following detailed description of the present application is given in order to make the above and other objects, features, and advantages of the present application more clearly understandable.

Drawings

Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the application. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:

FIG. 1 shows a schematic diagram of an image processing process;

FIG. 2 is a flow chart of an embodiment of an image processing method according to a first embodiment of the present application;

FIG. 3 shows a schematic diagram of an image processing architecture;

FIG. 4 is a flow chart of an embodiment of an image processing method according to the second embodiment of the present application;

FIG. 5 is a flow chart of an embodiment of an image processing method according to the third embodiment of the present application;

FIG. 6 is a block diagram of an embodiment of an image processing apparatus according to a fourth embodiment of the present application;

FIG. 7 is a block diagram of an embodiment of an image processing apparatus according to the fifth embodiment of the present application;

FIG. 8 is a block diagram of an embodiment of an image processing apparatus according to the sixth embodiment of the present application;

fig. 9 illustrates an exemplary system that can be used to implement various embodiments described in this disclosure.

Detailed Description

Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

To enable those skilled in the art to better understand the present application, the following description is made of the concepts related to the present application:

the first image and the second image are two images for which it is necessary to determine whether there is a difference between the two, or the size of the difference, or a region having a difference, or the like. For example, in an application scene of remote sensing dynamic soil change detection, two satellite remote sensing images in two years of a certain area are used as a first image and a second image to realize comparison of buildings, detection of illegal buildings and the like; or in an application scenario of medical treatment or research, two infrared photographs or X-ray photographs of a patient at different periods are taken as the first image and the second image to implement diagnosis of a disease, or any other suitable first image and second image may be included, which is not limited in this application.

The difference between the two images can be characterized using difference data. The difference data includes whether there is a difference, or the degree of the difference, the area with the difference, or any other applicable difference data, which is not limited in the embodiments of the present application. For example, two satellite remote sensing images in two years of a certain area are acquired at different times, and for various reasons, each pixel point between the two remote sensing images may be more or less different, but only when the difference in a region is large or the difference indicates that there is a soil movement change, the region needs to be identified as a region with the difference.

The present application proposes an algorithmic framework for determining difference data between two images, the framework inputting two original images corresponding to two coding networks. The encoding network can automatically encode the original input to generate feature data of a plurality of encoding levels, and the encoding network can encode the input and also can be regarded as extracting features of the input. For example, a back propagation algorithm is used to train the network so that feature data for multiple encoding levels of the encoding network output may represent the input.

The method comprises the steps of performing feature extraction on an input first image and an input second image to obtain more image-specific feature data, and performing feature extraction on the obtained more image-specific feature data to obtain more abstract feature data, namely, the more abstract feature data is obtained by encoding on the basis of the more image-specific feature data. Thus, the feature data may be distinguished by different encoding levels, wherein more concrete feature data corresponds to lower encoding levels and more abstract feature data corresponds to higher encoding levels. The coding network may generate feature data of at least two coding levels, which is not limited in the embodiments of the present application.

For example, the two coding networks are twin networks, i.e., two networks with identical network structures and identical weights. One network in the twin network processes the remote sensing image (namely, the first image) in the previous period and extracts the characteristic data of the image in the previous period, the other network processes the remote sensing image (namely, the second image) in the later period and extracts the characteristic data of the image in the later period, and the weights of the two networks are completely shared. Each network is divided into several modules, the output of the previous module being directly connected to the input of the next module. And performing feature extraction on the first image to obtain feature data of the 1 st coding level of the first image, and performing feature extraction on the feature data of the (N-1) th coding level of the first image to obtain feature data of the Nth coding level of the first image. And performing feature extraction on the second image to obtain feature data of the 1 st coding level of the second image, and performing feature extraction on the feature data of the (N-1) th coding level of the second image to obtain feature data of the Nth coding level of the second image.

And each coding level corresponds to the characteristic data of the first image and the characteristic data of the second image, the characteristic data of the two images in the same coding level are merged, and the obtained characteristic data is marked as target characteristic data. The merging of the feature data may be merging in channel dimension. The feature data corresponding to different coding levels can generate target feature data corresponding to the coding levels.

In order to fuse feature data of a plurality of coding levels to determine difference data using feature data to each coding level, it is necessary to sequentially perform a superposition operation and a decoding operation on target feature data of different coding levels. The decoding operation is an operation corresponding to the encoding process, and is an operation of reducing the abstract feature data to the original data as much as possible, for example, an operation of decoding the feature map through a process such as convolution, deconvolution, or convolution, or any other suitable decoding operation, which is not limited in this embodiment of the present application. The overlay operation is a way of fusing features, for example, an operation of adding two feature maps of the same size, or any other suitable overlay operation, which is not limited in this embodiment of the present application. And sequentially performing superposition operation and decoding operation on the target feature data of different coding levels to obtain data which is marked as decoding data.

In an alternative embodiment of the present application, the first image and the second image may be remote sensing images, which include, but are not limited to, films or photographs recording electromagnetic waves of various surface features, and are mainly classified into aerial photographs and satellite photographs, for example, two remote sensing images obtained by satellite remote sensing technology at different times for the same region on the ground may be used as the first image and the second image.

In an optional embodiment of the present application, feature extraction is performed on the first image and the second image in a corresponding coding network, and the obtained feature data is recorded as feature data of the 1 st coding level. And performing feature extraction on the feature data of the (N-1) th coding level of the first image and the second image, and recording the obtained feature data as the feature data of the (N) th coding level.

In an optional embodiment of the present application, a decoding operation is performed on target feature data of an nth coding level, and the obtained data is denoted as nth decoded data. And performing superposition operation and decoding operation on the target characteristic data of the (N-1) th coding level and the N-th decoded data to obtain data, marking the obtained data as the (N-1) th decoded data, and performing superposition operation and decoding operation on the target characteristic data of the (N-2) th coding level and the N-1 th decoded data to obtain data, marking the obtained data as the (N-2) th decoded data.

In an alternative embodiment of the present application, the decoding operation requires passing through a convolutional layer, a reverse convolutional layer, etc. The Convolutional layer is a structure in a Convolutional neural network, each Convolutional layer (Convolutional layer) in the Convolutional neural network is composed of a plurality of Convolutional units, and parameters of each Convolutional unit are obtained through optimization of a back propagation algorithm. The convolution operation aims to extract different input features, the first layer of convolution layer can only extract some low-level features such as edges, lines, angles and other levels, and more layers of networks can iteratively extract more complex features from the low-level features. The deconvolution layer corresponds to the Convolutional layer and is used to visualize a trained Convolutional neural network, also called a Transposed Convolutional layer (Transposed Convolutional layer).

In an alternative embodiment of the present application, in order to determine difference data between the first image and the second image, it is necessary to obtain difference degree data at each corresponding position between the first image and the second image according to the decoded data. The difference degree data is used for representing the difference degree between the two images at each position, and the magnitude of the numerical value is related to the magnitude of the difference. Each difference degree data corresponds to each position on the image, and information characterizing the position may be denoted as position information, e.g. representing each position on the image in a coordinate manner, which may be used to determine the area with the difference between the first image and the second image.

In an alternative embodiment of the present application, a neural network for image processing of the present application may be trained using a first image sample and a second image sample. The neural network includes an encoding network and a decoding network. The decoding network is configured to merge feature data of the same coding level, sequentially perform a superposition operation and a decoding operation on target feature data of different coding levels, and the like, and may be specifically used for any other applicable operation, which is not limited in this embodiment of the present application.

In an optional embodiment of the present application, in order to satisfy the comparison between images in different scenes, a neural network may be formed by a plurality of dual coding networks, for example, a first image and a second image are respectively input into corresponding first coding networks, a third image and a fourth image are respectively input into corresponding second coding networks, first difference data between the first image and the second image and second difference data between the third image and the fourth image are respectively obtained, and then the first difference data and the second difference data are formed together into a comparison report, so that a more refined or accurate comparison result is achieved. According to an embodiment of the application, in the soil movement change detection, most of the change areas need to be extracted manually, so that the problems of more missed extraction and low efficiency exist. As shown in fig. 1, a schematic diagram of an image processing process is provided, in which a first image and a second image are respectively input to corresponding coding networks for feature extraction to obtain feature data of different coding levels, then the feature data of the same coding level are combined to obtain target feature data of different coding levels, the target feature data of different coding levels are sequentially subjected to superposition operation and decoding operation to obtain decoded data, difference data between the first image and the second image is determined according to the decoded data, so that two input images can be better distinguished during encoding, features of multiple coding levels are fused during decoding, not only abstract high-level features but also more similar low-level features and features of intermediate levels between the two can be utilized, the method and the device realize the efficient extraction of the difference from the two images, and improve the accuracy and efficiency of finding the difference between the images. The present application is applicable to, but not limited to, the above application scenarios.

Referring to fig. 2, a flowchart of an embodiment of an image processing method according to a first embodiment of the present application is shown, where the method may specifically include the following steps:

step 101, inputting the first image and the second image into corresponding coding networks respectively to perform feature extraction, so as to obtain feature data of different coding levels.

In an embodiment of the application, in order to determine difference data between two images, the two inputs correspond to two coding networks, the first image being input to one coding network and the second image being input to the other coding network. And (5) carrying out feature extraction on the coding network to obtain feature data.

For example, as shown in the schematic diagram of the image processing architecture shown in fig. 3, a previous remote sensing image (i.e., a first image) and a subsequent remote sensing image (i.e., a second image) are input into corresponding twin network-based encoding networks, and feature extraction is performed on an original image after certain processing. Each coding network is divided into 4 coding modules, the former coding module is directly connected with the latter coding module, each coding network can extract feature maps (namely feature data) of 4 coding levels of an image, and the final coding part outputs 8 feature maps of different levels to enter a decoding network, wherein two networks of the 4 th coding module respectively output the feature maps n₁And a characteristic diagram n₂。

And 102, merging the feature data of the same coding level to obtain target feature data of different coding levels.

In the embodiment of the present application, feature data of each coding level is merged first, that is, feature data of a first image and feature data of a second image of the same coding level are merged to obtain target feature data of the coding level.

For example, as shown in FIG. 3, starting from the 4 th encoding module, it outputs a characteristic map n₁And featuresGraph n₂First, merging in channel dimension with the result of merging n₁₂(namely the target characteristic data of the 4 th coding level), and the characteristic graph nl output by the 3 rd coding module₁And a characteristic diagram nl₂Are combined in channel dimension, and the combined result is nl₁₂(i.e., the target feature data of the 3 rd encoding level), and so on.

And 103, sequentially performing superposition operation and decoding operation on the target characteristic data of different coding layers to obtain decoded data.

In the embodiment of the present application, in order to fuse target feature data of different coding levels, rather than performing decoding operation only on the target feature data, it is necessary to perform decoding operation after performing superposition operation on the target feature data of one coding level and target feature data of a subsequent coding level. Then for the target feature data of the last encoding level, since there is no target feature data of the next encoding level, the decoding operation is performed first. And sequentially performing superposition operation and decoding operation on the target characteristic data of each coding layer to obtain decoded data.

For example, as shown in FIG. 3, merge result n₁₂(namely the target characteristic data of the 4 th coding level) is decoded by the basic decoding module to output dn₁₂. Then to the feature map nl output by the 3 rd encoding module₁And a characteristic diagram nl₂Merging the results nl₁₂(i.e., the target feature data of the 3 rd encoding level), the merging result nl₁₂And dn₁₂And adding, inputting the added result into the next basic decoding module for decoding operation, adding the added result with the target characteristic data of the 2 nd coding level, inputting the added result into the next basic decoding module for decoding operation, adding the added result with the target characteristic data of the 1 st coding level, inputting the added result into the next basic decoding module for decoding operation, and finally obtaining the decoded data.

Step 104, determining difference data between the first image and the second image according to the decoded data.

In the embodiment of the present application, the implementation manner of determining the difference data between the first image and the second image according to the decoded data may include multiple manners, for example, determining difference degree data at different positions between the first image and the second image according to the decoded data, determining the difference data between the first image and the second image according to position information corresponding to the difference degree data meeting a preset requirement, performing visualization according to the difference data, and displaying an area having a difference between the first image and the second image. Any suitable implementation may be specifically included, and the embodiments of the present application do not limit this.

For example, as shown in fig. 3, the decoded data is subjected to a sigmoid (logic) activation function to obtain difference degree data at each position between the first image and the second image, the difference data is obtained according to whether the difference degree data is greater than a preset threshold, and visualization is performed according to the difference data, where the rightmost image in the diagram shows an area having a difference between the first image and the second image.

Referring to fig. 4, a flowchart of an embodiment of an image processing method according to the second embodiment of the present application is shown, where the method specifically includes the following steps:

step 201, training a neural network for image processing by using a first image sample and a second image sample, wherein the neural network comprises an encoding network and a decoding network.

In the embodiment of the present application, the neural network for image processing includes an encoding network and a decoding network, and the neural network needs to be trained by first using the first image sample and the second image sample. The training process may include continuously providing two image samples to the neural network, passing a first image of the image pair through the network, passing a second image of the image pair through the network, calculating a loss value using the two images, then back-propagating the loss calculation gradient, updating weights in the neural network until a desired performance is achieved, and finally obtaining a neural network comprising an encoding network and a decoding network.

For example, the neural network proposed in the present application may use a certain region on an image in two years provided by a certain unit for training, and after training, use a region completely not overlapping with the training region as a test region for effect comparison, the first method does not use a twin network, and directly combines images in the preceding and following periods into 6 channels of data input, and the number of input channels corresponding to the first convolutional layer is also changed from 3 to 6, and the second method uses a twin network, but only uses the feature data of the highest layer, and the method proposed herein uses a twin network, and uses the feature data of each coding layer. The evaluation index was IOU (Intersection over Union), and the results are shown in the following table:

method of producing a composite material	First method	Second method	The method is presented herein
				IOU	0.6	0.64	0.69

Step 202, receiving two remote sensing images of the same area at different times as the first image and the second image.

In the embodiment of the application, in order to obtain the moving soil change condition of the same region on the ground, two remote sensing images are required to be obtained at different times for the same region, and the two remote sensing images are received and input into a neural network as a first image and a second image. For example, remote sensing images for two years of the same area are received, the time interval of the two remote sensing images being two years.

Step 203, extracting the features of the first image and the second image in the corresponding coding network to obtain the feature data of the 1 st coding level.

In the embodiment of the application, after the first image and the second image are input into the corresponding coding networks, the coding networks perform feature extraction, and first obtain feature data of the 1 st coding level.

And 204, performing feature extraction on the feature data of the (N-1) th coding level in the coding network to obtain the feature data of the Nth coding level.

In the embodiment of the application, the coding network performs feature extraction on the feature data of the 1 st coding level to obtain the feature data of the 2 nd coding level, then performs feature extraction on the feature data of the 2 nd coding level to obtain the feature data of the 3 rd coding level, and performs feature extraction on the feature data of the (N-1) th coding level in the coding network according to rules in sequence to obtain the feature data of the Nth coding level.

For example, as shown in the schematic diagram of the image processing architecture shown in fig. 3, a previous remote sensing image (i.e., a first image) and a subsequent remote sensing image (i.e., a second image) are input into corresponding twin network-based encoding networks, and feature extraction is performed on an original image after certain processing. Each coding network can extract feature maps (i.e., feature data) for 4 coding levels of an image.

And step 205, merging the channels of the feature data to obtain the target feature data.

In the embodiment of the application, merging channels is performed on the feature data of each coding level to obtain target feature data of each coding level.

For example, as shown in FIG. 3, the feature map n from the 4 th coding level₁And a characteristic diagram n₂Initially, first merge in channel dimension with a result of n₁₂(i.e. target feature data of the 4 th coding level), then feature map nl of the 3 rd coding level₁And a characteristic diagram nl₂Merging in channel dimension, and the merging result is nl₁₂(i.e., the target feature data of the 3 rd encoding level), and so on.

And step 206, performing decoding operation on the target characteristic data of the Nth coding level to obtain Nth decoding data.

In the embodiment of the present application, starting from the target feature data of the nth coding hierarchy, the target feature data is decoded first, and the decoded data is denoted as nth decoded data.

For example, as shown in FIG. 3, the merging result n of the 4 th coding level is first obtained₁₂And inputting the basic decoding module, and decrypting to obtain the Nth decoding data.

And step 207, performing superposition operation and decoding operation on the target feature data of the (N-1) th coding level and the Nth decoding data to obtain the (N-1) th decoding data.

In the embodiment of the application, the target characteristic data of the (N-1) th coding level and the N-th decoding data are subjected to superposition operation, then the superposition result is subjected to decoding operation to obtain the (N-1) th decoding data, and the like until the 1 st decoding data is obtained.

In this embodiment of the present application, optionally, performing an overlay operation and a decoding operation on the target feature data of the N-1 th encoding layer and the nth decoded data to obtain an implementation manner of the N-1 th decoded data may include: performing superposition operation on the target characteristic data of the (N-1) th coding level and the Nth decoding data to obtain a superposition result; and sequentially carrying out decoding operations on the superposition result through the first convolutional layer, the first anti-convolutional layer and the second convolutional layer to obtain the N-1 decoding data.

The number of output channels of the first convolution layer is one fourth of the number of input channels, the number of input channels and the number of output channels of the first deconvolution layer are the same, and the number of output channels of the second convolution layer is the same as the number of channels of the target feature data of the (N-2) th coding level.

For example, as shown in FIG. 3, the 4 th coding level merge result n₁₂Inputting a basic decoding module, wherein the first layer of the basic decoding module is a convolution layer (namely a first convolution layer) with convolution kernel of 1 × 1, and the input channel number is a feature map n₁Number of channels and profile n₂The number of output channels is one fourth of the number of input channels, the second layer is a deconvolution layer (i.e. a first deconvolution layer) with the kernel size of 3x3, the number of input and output channels is the same as that of the output channels of the first layer, the third layer is a convolution layer (i.e. a second convolution layer) with the convolution kernel of 1x1, the number of input channels is the same as that of the output channels of the second layer, the number of output channels is the sum of the number of channels of two feature maps of the 3 rd coding layer, and each layer is activated by a Rectised Linear Unit (Linear rectifier function) activation function. The basic decoding module outputs decoding result 4 th decoding data dn₁₂Then, the data is combined with the target characteristic data of the 3 rd coding level, namely the combination result nl₁₂Adding, inputting the added result (i.e. superposition result) into the next basic decoding module, outputting the decoding result 3 rd decoding data, and repeating the above steps until 1 st decoding data is obtained.

And step 208, iteratively executing the superposition operation and the decoding operation until the superposition operation and the decoding operation of the target feature data of each coding level are sequentially completed.

In the embodiment of the present application, the superposition operation and the decoding operation in step 207 are iteratively executed until the superposition operation and the decoding operation of the target feature data of each coding level are sequentially completed, and finally the obtained decoded data fuses the features of each coding level.

Step 209, sequentially subjecting the 1 st decoded data to decoding operations of a second deconvolution layer, a third convolution layer and a fourth convolution layer to obtain the decoded data, wherein the number of output channels of the second deconvolution layer is half of the number of input channels, the number of input channels and the number of output channels of the third convolution layer are the same, and the number of output channels of the fourth convolution layer is 1.

In this embodiment, the 1 st decoded data may further continue to be decoded, and the 1 st decoded data sequentially passes through the second deconvolution layer, the third convolution layer, and the fourth convolution layer to be decoded, so as to obtain decoded data.

For example, as shown in fig. 3, the number of channels of the 1 st decoded data obtained finally is 128, the decoded data is sent to one deconvolution layer whose number of output channels is 64 and core size is 4 × 4, the output of the deconvolution layer is input to the convolution layer whose number of channels and output channels are 64, fill value is 1 and core size is 3 × 3, the output of the convolution layer is input to the convolution layer whose number of input channels is 64, output channel number is 1, fill value is 1 and core size is 3 × 3, and the decoded data is output.

Step 210, determining difference degree data at different positions between the first image and the second image according to the decoded data.

In the embodiment of the application, the decoded data is normalized to be between 0 and 1 through the sigmoid activation function, and difference degree data of each position between the first image and the second image can be obtained.

For example, the decoding data obtained by sequentially performing decoding operations on the second deconvolution layer, the third convolution layer and the fourth convolution layer is normalized to be between 0 and 1 through a sigmoid activation function, the numerical value obtained by normalization can represent difference degree data at each position between the two images, the difference data between the first image and the second image is determined according to position information corresponding to the difference degree data meeting the preset requirement, and visualization is performed to obtain the rightmost image in fig. 3, wherein the image shows the region with difference between the first image and the second image. Any suitable implementation may be specifically included, and the embodiments of the present application do not limit this.

Step 211, determining difference data between the first image and the second image according to the position information corresponding to the difference degree data meeting the preset requirement.

In this embodiment of the application, it is determined whether the difference degree data meets a preset requirement, for example, whether the difference degree data is greater than a preset threshold, and if the difference degree data is greater than the preset threshold, the preset requirement is met, which may specifically include any applicable preset requirement, and this embodiment of the application does not limit this. According to the position information corresponding to the difference degree data meeting the preset requirement, difference data between the first image and the second image can be determined, and the difference data can represent the area with the difference between the first image and the second image and the size of the difference.

Referring to fig. 5, a flowchart of an embodiment of an image processing method according to a third embodiment of the present application is shown, where the method specifically includes the following steps:

step 301, receiving a first remote sensing image and a second remote sensing image of different time aiming at the same area.

In the embodiment of the application, in order to obtain the moving soil change condition of the same region on the ground, two remote sensing images need to be obtained at different times for the same region, and the two remote sensing images are received and recorded as a first remote sensing image and a second remote sensing image. For example, remote sensing images for two years of the same area are received, the time interval of the two remote sensing images being two years.

And 302, respectively inputting the first remote sensing image and the second remote sensing image into corresponding coding networks for feature extraction, and obtaining feature data of different coding levels.

In the embodiment of the present application, a specific implementation manner of this step may refer to the description in the foregoing embodiment, and is not described herein again.

And 303, merging the feature data of the same coding level to obtain target feature data of different coding levels.

And step 304, sequentially performing superposition operation and decoding operation on the target characteristic data of different coding levels to obtain decoded data.

And 305, determining a target area with difference between the first remote sensing image and the second remote sensing image according to the decoded data.

According to the embodiment of the application, the first remote sensing image and the second remote sensing image in different time aiming at the same region are received, the first remote sensing image and the second remote sensing image are respectively input into corresponding coding networks for feature extraction to obtain feature data of different coding levels, the feature data of the same coding level are merged to obtain target feature data of different coding levels, the target feature data of different coding levels are sequentially subjected to superposition operation and decoding operation to obtain decoded data, a target region with difference between the first remote sensing image and the second remote sensing image is determined according to the decoded data, so that the two input remote sensing images can be better distinguished during coding, the features of a plurality of coding levels are fused during decoding, not only abstract high-level features can be utilized, but also more image low-level features can be utilized, and the characteristics of each intermediate level between the two remote sensing images, the difference can be efficiently extracted from the two remote sensing images, and the accuracy and efficiency of finding the areas with the difference between the remote sensing images are improved.

Referring to fig. 6, a block diagram illustrating a structure of an embodiment of an image processing apparatus according to a fourth embodiment of the present application may specifically include:

the image processing apparatus comprises a neural network 400 comprising a first encoding network 4001, a second encoding network 4002, a decoding network 4003 and a disparity determining module 4004;

Referring to fig. 7, a block diagram illustrating a structure of an embodiment of an image processing apparatus according to the fifth embodiment of the present application may specifically include:

an extraction module 501, configured to input the first image and the second image into corresponding coding networks respectively to perform feature extraction, so as to obtain feature data of different coding levels;

a merging module 502, configured to merge feature data of the same coding level to obtain target feature data of different coding levels;

a decoding module 503, configured to perform superposition operation and decoding operation on the target feature data of different coding layers in sequence to obtain decoded data;

a difference determining module 504 configured to determine difference data between the first image and the second image according to the decoded data.

In this embodiment of the present application, optionally, the apparatus further includes:

and the image receiving module is used for receiving two remote sensing images aiming at the same area at different time as the first image and the second image before the first image and the second image are respectively input into the corresponding coding networks for feature extraction to obtain feature data of different coding levels.

In this embodiment of the application, optionally, the extraction module includes:

the first extraction submodule is used for extracting the characteristics of the first image and the second image in a corresponding coding network to obtain the characteristic data of the 1 st coding level;

and the second extraction submodule is used for extracting the characteristics of the characteristic data of the (N-1) th coding level in the coding network to obtain the characteristic data of the Nth coding level.

In this embodiment of the present application, optionally, the decryption module includes:

the first decoding submodule is used for carrying out decoding operation on the target characteristic data of the Nth coding level to obtain Nth decoding data;

the second decoding submodule is used for carrying out superposition operation and decoding operation on the target characteristic data of the (N-1) th coding level and the Nth decoding data to obtain the (N-1) th decoding data;

and the iteration submodule is used for iteratively executing the superposition operation and the decoding operation until the superposition operation and the decoding operation of the target characteristic data of each coding level are sequentially completed.

In this embodiment of the application, optionally, the second decoding sub-module includes:

the superposition unit is used for carrying out superposition operation on the target characteristic data of the (N-1) th coding level and the Nth decoding data to obtain a superposition result;

and the decoding unit is used for sequentially performing decoding operations on the superposition result through a first convolutional layer, a first anti-convolutional layer and a second convolutional layer to obtain the N-1 decoded data, wherein the number of output channels of the first convolutional layer is one fourth of the number of input channels, the number of input channels and the number of output channels of the first anti-convolutional layer are the same, and the number of output channels of the second convolutional layer is the same as the number of channels of the target characteristic data of the N-2 coding layer.

In this embodiment of the application, optionally, the decryption module further includes:

and the decoding submodule is used for sequentially carrying out the superposition operation and the decoding operation on the 1 st decoded data after the superposition operation and the decoding operation on the target feature data of each coding layer are sequentially finished, and obtaining the decoded data by sequentially carrying out the decoding operations on a second deconvolution layer, a third convolution layer and a fourth convolution layer, wherein the number of output channels of the second deconvolution layer is one half of the number of input channels, the number of input channels and the number of output channels of the third convolution layer are the same, and the number of output channels of the fourth convolution layer is 1.

In this embodiment of the present application, optionally, the merging module includes:

and the merging submodule is used for merging the channels of the characteristic data to obtain the target characteristic data.

In this embodiment of the application, optionally, the difference determining module includes:

the degree determining submodule is used for determining difference degree data on different positions between the first image and the second image according to the decoded data;

and the difference determining submodule is used for determining difference data between the first image and the second image according to the position information corresponding to the difference degree data meeting the preset requirement.

and the training module is used for training a neural network for image processing by adopting the first image sample and the second image sample before the first image and the second image are respectively input into the corresponding coding networks for feature extraction to obtain feature data of different coding levels, and the neural network comprises a coding network and a decoding network.

Referring to fig. 8, a block diagram illustrating a structure of an embodiment of an image processing apparatus according to a sixth embodiment of the present application may specifically include:

the image receiving module 601 is configured to receive a first remote sensing image and a second remote sensing image of the same region at different times;

an extraction module 602, configured to input the first remote sensing image and the second remote sensing image into corresponding coding networks respectively to perform feature extraction, so as to obtain feature data of different coding levels;

a merging module 603, configured to merge feature data of the same coding level to obtain target feature data of different coding levels;

a decoding module 604, configured to perform superposition operation and decoding operation on the target feature data of different coding layers in sequence to obtain decoded data;

and the region determining module 605 is configured to determine a target region having a difference between the first remote sensing image and the second remote sensing image according to the decoded data.

For the device embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, refer to the partial description of the method embodiment.

Embodiments of the disclosure may be implemented as a system using any suitable hardware, firmware, software, or any combination thereof, in a desired configuration. Fig. 9 schematically illustrates an exemplary system (or apparatus) 700 that can be used to implement various embodiments described in this disclosure.

For one embodiment, fig. 9 illustrates an exemplary system 700 having one or more processors 702, a system control module (chipset) 704 coupled to at least one of the processor(s) 702, a system memory 706 coupled to the system control module 704, a non-volatile memory (NVM)/storage 708 coupled to the system control module 704, one or more input/output devices 710 coupled to the system control module 704, and a network interface 712 coupled to the system control module 706.

The processor 702 may include one or more single-core or multi-core processors, and the processor 702 may include any combination of general-purpose or special-purpose processors (e.g., graphics processors, application processors, baseband processors, etc.). In some embodiments, the system 700 can function as a browser as described in embodiments herein.

In some embodiments, system 700 may include one or more computer-readable media (e.g., system memory 706 or NVM/storage 708) having instructions and one or more processors 702 in combination with the one or more computer-readable media configured to execute the instructions to implement modules to perform the actions described in this disclosure.

For one embodiment, system control module 704 may include any suitable interface controllers to provide any suitable interface to at least one of processor(s) 702 and/or any suitable device or component in communication with system control module 704.

The system control module 704 may include a memory controller module to provide an interface to the system memory 706. The memory controller module may be a hardware module, a software module, and/or a firmware module.

System memory 706 may be used to load and store data and/or instructions for system 700, for example. For one embodiment, system memory 706 may include any suitable volatile memory, such as suitable DRAM. In some embodiments, the system memory 706 may include a double data rate type four synchronous dynamic random access memory (DDR4 SDRAM).

For one embodiment, system control module 704 may include one or more input/output controllers to provide an interface to NVM/storage 708 and input/output device(s) 710.

For example, NVM/storage 708 may be used to store data and/or instructions. NVM/storage 708 may include any suitable non-volatile memory (e.g., flash memory) and/or may include any suitable non-volatile storage device(s) (e.g., one or more hard disk drive(s) (HDD (s)), one or more Compact Disc (CD) drive(s), and/or one or more Digital Versatile Disc (DVD) drive (s)).

NVM/storage 708 may include storage resources that are physically part of the device on which system 700 is installed or may be accessed by the device and not necessarily part of the device. For example, NVM/storage 708 may be accessible over a network via input/output device(s) 710.

Input/output device(s) 710 may provide an interface for system 700 to communicate with any other suitable device, input/output device(s) 710 may include communication components, audio components, sensor components, and the like. Network interface 712 may provide an interface for system 700 to communicate over one or more networks, and system 700 may communicate wirelessly with one or more components of a wireless network according to any of one or more wireless network standards and/or protocols, such as to access a communication standard-based wireless network, such as WiFi, 2G, or 3G, or a combination thereof.

For one embodiment, at least one of the processor(s) 702 may be packaged together with logic for one or more controller(s) (e.g., memory controller module) of system control module 704. For one embodiment, at least one of the processor(s) 702 may be packaged together with logic for one or more controller(s) of system control module 704 to form a System In Package (SiP). For one embodiment, at least one of the processor(s) 702 may be integrated on the same die with logic for one or more controller(s) of system control module 704. For one embodiment, at least one of the processor(s) 702 may be integrated on the same die with logic for one or more controller(s) of system control module 704 to form a system on a chip (SoC).

In various embodiments, system 700 may be, but is not limited to being: a browser, a workstation, a desktop computing device, or a mobile computing device (e.g., a laptop computing device, a handheld computing device, a tablet, a netbook, etc.). In various embodiments, system 700 may have more or fewer components and/or different architectures. For example, in some embodiments, system 700 includes one or more cameras, a keyboard, a Liquid Crystal Display (LCD) screen (including a touch screen display), a non-volatile memory port, multiple antennas, a graphics chip, an Application Specific Integrated Circuit (ASIC), and speakers.

Wherein, if the display includes a touch panel, the display screen may be implemented as a touch screen display to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation.

The present application further provides a non-volatile readable storage medium, where one or more modules (programs) are stored in the storage medium, and when the one or more modules are applied to a terminal device, the one or more modules may cause the terminal device to execute instructions (instructions) of method steps in the present application.

In one example, a computer device is provided, comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method according to the embodiments of the present application when executing the computer program.

There is also provided in one example a computer readable storage medium having stored thereon a computer program, characterized in that the program, when executed by a processor, implements a method as one or more of the embodiments of the application.

An embodiment of the application discloses an image processing method and an image processing device, and example 1 includes an image processing method, including:

Example 2 may include the method of example 1, wherein before the first image and the second image are respectively input to corresponding coding networks for feature extraction, so as to obtain feature data of different coding levels, the method further includes:

Example 3 may include the method of example 1 and/or example 2, wherein the respectively inputting the first image and the second image into corresponding coding networks for feature extraction, and obtaining feature data of different coding levels includes:

Example 4 may include the method of one or more of examples 1-3, wherein the sequentially performing the superposition operation and the decoding operation on the target feature data of the different encoding levels to obtain decoded data includes:

Example 5 may include the method of one or more of examples 1-4, wherein the performing the overlay operation and the decode operation on the target feature data of the N-1 th encoding hierarchy and the N-th decoded data to obtain the N-1 th decoded data comprises:

Example 6 may include the method of one or more of examples 1 to 5, wherein the iteratively performing the overlay operation and the decoding operation until the overlay operation and the decoding operation on the target feature data of different coding layers are sequentially completed, and the sequentially performing the overlay operation and the decoding operation on the target feature data of different coding layers to obtain the decoded data further includes:

Example 7 may include the method of one or more of examples 1 to 6, wherein the merging feature data of the same coding level to obtain target feature data of different coding levels includes:

Example 8 may include the method of one or more of examples 1-7, wherein the determining, from the decoded data, the difference data between the first image and the second image comprises:

Example 9 may include the method of one or more of examples 1 to 8, wherein before the first image and the second image are respectively input to corresponding coding networks for feature extraction, and feature data of different coding levels are obtained, the method further includes:

Example 10 includes an image processing method comprising:

Example 11 includes an image processing apparatus comprising a neural network comprising a first encoding network, a second encoding network, a decoding network, and a difference determination module;

Example 12 may include the apparatus of example 11, wherein the apparatus further comprises:

Example 13 may include the apparatus of example 11 and/or example 12, wherein the extraction module includes:

Example 14 may include the apparatus of one or more of examples 11-13, wherein the decryption module comprises:

Example 15 may include the apparatus of one or more of examples 11-14, wherein the second decoding sub-module comprises:

Example 16 may include the apparatus of one or more of examples 11-15, wherein the decryption module further comprises:

Example 17 may include the apparatus of one or more of examples 11-16, wherein the means for merging comprises:

Example 18 may include the apparatus of one or more of examples 11-17, wherein the discrepancy determining module comprises:

Example 19 may include the apparatus of one or more of examples 11-18, wherein the apparatus further comprises:

Example 20 includes an image processing apparatus comprising:

the image receiving module is used for receiving a first remote sensing image and a second remote sensing image which aim at the same region and are different in time;

the extraction module is used for respectively inputting the first remote sensing image and the second remote sensing image into corresponding coding networks for feature extraction to obtain feature data of different coding levels;

the merging module is used for merging the feature data of the same coding level to obtain target feature data of different coding levels;

the decoding module is used for sequentially performing superposition operation and decoding operation on the target characteristic data of different coding layers to obtain decoded data;

and the region determining module is used for determining a target region with difference between the first remote sensing image and the second remote sensing image according to the decoded data.

Example 21 includes a computer device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing a method as in one or more of examples 1-10 when executing the computer program.

Example 22 includes a computer-readable storage medium having stored thereon a computer program that, when executed by a processor, implements a method as in one or more of examples 1-10.

Although certain examples have been illustrated and described for purposes of description, a wide variety of alternate and/or equivalent implementations, or calculations, may be made to achieve the same objectives without departing from the scope of practice of the present application. This application is intended to cover any adaptations or variations of the embodiments discussed herein. Therefore, it is manifestly intended that the embodiments described herein be limited only by the claims and the equivalents thereof.

Claims

1. An image processing method, comprising:

2. The method according to claim 1, wherein before the first image and the second image are respectively input to corresponding coding networks for feature extraction, so as to obtain feature data of different coding levels, the method further comprises:

3. The method according to claim 1, wherein the step of inputting the first image and the second image into corresponding coding networks respectively for feature extraction to obtain feature data of different coding levels comprises:

4. The method of claim 3, wherein the sequentially performing the superposition operation and the decoding operation on the target feature data of the different encoding levels to obtain decoded data comprises:

5. The method of claim 4, wherein the performing the superposition operation and the decoding operation on the target feature data of the N-1 th encoding level and the nth decoded data to obtain the N-1 th decoded data comprises:

6. The method of claim 5, wherein after the iteratively performing the superposition operation and the decoding operation until the superposition operation and the decoding operation of the target feature data of each coding level are sequentially completed, the sequentially performing the superposition operation and the decoding operation on the target feature data of different coding levels to obtain decoded data further comprises:

7. The method of claim 1, wherein the merging the feature data of the same coding level to obtain the target feature data of different coding levels comprises:

8. The method of claim 1, wherein determining difference data between the first image and the second image from the decoded data comprises:

9. The method according to claim 1, wherein before the first image and the second image are respectively input to corresponding coding networks for feature extraction, so as to obtain feature data of different coding levels, the method further comprises:

10. An image processing method, comprising:

11. An image processing apparatus comprising a neural network, the neural network comprising a first encoding network, a second encoding network, a decoding network, and a difference determination module;

12. A computer arrangement comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method according to one or more of claims 1-10 when executing the computer program.

13. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the method according to one or more of claims 1-10.