CN111539306B

CN111539306B - A method for building recognition in remote sensing images based on the replaceability of activation expressions

Info

Publication number: CN111539306B
Application number: CN202010314628.4A
Authority: CN
Inventors: 陈力; 李海峰; 彭剑; 朱佳玮; 黄浩哲; 崔振琦
Original assignee: Central South University
Current assignee: Central South University
Priority date: 2020-04-21
Filing date: 2020-04-21
Publication date: 2021-07-06
Anticipated expiration: 2040-04-21
Also published as: CN111539306A; AU2021101713A4

Abstract

The invention discloses a method for recognizing buildings in remote sensing images based on the replaceability of activation expressions, comprising the following steps: acquiring a data set of buildings in remote sensing images; training a common deep neural network model; Maximum response map; calculate the activation expression replaceability of each convolution kernel; prune the model convolution kernel according to the activation expression replaceability of each convolution kernel, retaining the small activation expression replaceability; use the pruned Deep neural network model for building recognition in remote sensing images. The method of the invention proposes an activated expression replaceability index method for the building recognition model based on deep learning, which can quantify the replaceability of the expression of each convolution kernel on the same layer in the feature space, and the activation expression of the convolution kernel can be The lower the replacement value is, the more irreplaceable it is in the feature space, and the convolution kernel is selectively pruned to effectively improve the recognition accuracy of the remote sensing image building recognition model.

Description

Remote sensing image building identification method based on activation expression replaceability

Technical Field

The invention belongs to the technical field of remote sensing image identification, and relates to a remote sensing image building identification method based on activation expression replaceability.

Background

In recent years, a great number of remote sensing satellites are lifted off and simultaneously bring a great number of remote sensing images. Remote-sensing image data is increasing dramatically, including a variety of remote-sensing images with different spectral and spatial resolutions. These remote sensing images bring great economic value. The method can be used for rapidly extracting targets such as buildings in the remote sensing image, and can effectively help urban planning, infrastructure construction, illegal building detection and the like. At present, a large number of building identification algorithms based on deep learning are developed at present, but the whole identification precision of the model is difficult to meet the practical application due to the fact that the generalization of the remote sensing image building identification model is not known.

At present, there are two main methods for measuring the generalization ability of the deep learning model. The first scheme mainly depends on a traditional statistical learning theory method, such as VC dimension, Rademacher complexity and other algorithms to explore the relationship between the robustness, complexity and generalization of the model. These theories suggest that models containing a large number of parameters tend to over-fit on the data, but at the same time reduce the generalization ability on the test data. However, this conclusion is contrary to the performance of the current deep learning model, and the traditional statistical learning theory cannot reasonably explain the generalization ability of the deep learning model. The second scheme mainly explains and evaluates the generalization capability of the model from the change of the parameter space in the deep learning model optimization process. Schmidhuber considers that the generalization ability of the model is related to the straightness of minima and the straightness of Bayesian boundaries. However, Dinh indicates that a non-smooth min-depth learning model may actually have a better generalization. Wang links the smoothness of the solution and the generalization ability of the model under a Bayesian framework, and theoretically proves that the generalization ability of the model is not only related to Hessian spectrum, but also related to the smoothness of the solution, the scale of parameters and the number of training samples. In addition, the random gradient descent algorithm for training the deep learning model can also improve the generalization capability of the model. Many of the conclusions in the second approach are also contradictory.

In practical application, a rich and well-distributed remote sensing image building test set is usually used for evaluating the generalization capability of the model. In this approach, however, Recht suggests that obtaining a well-behaved model on a particular test set may not embody the model's own generalization ability, and that accuracy based on the test set is fragile and subject to changes due to subtle changes in data distribution. The evaluation on the test set also has the problem of inaccuracy. Therefore, at present, no reasonable algorithm can correctly measure the generalization of the remote sensing image recognition model.

Disclosure of Invention

In view of the above, the present invention provides a method for identifying a building based on a remote sensing image with an alternative activation expression, and the present invention provides an index for effectively evaluating the generalization of a remote sensing image identification model, and the remote sensing image identification model is pruned through the index, so as to improve the accuracy of the model and further improve the identification capability of the model.

The invention aims to realize the method for identifying the buildings based on the remote sensing images with the alternative activation expression, which comprises the following steps:

step 1, obtaining a building data set of a remote sensing image;

step 2, training a common deep neural network model;

step 3, calculating and identifying an independent maximum response graph of each convolution kernel in the model;

step 4, calculating the activation expression replaceability of each convolution kernel;

step 5, pruning the model convolution kernels according to the activation expression replaceability of each convolution kernel, and keeping small activation expression replaceability;

and 6, using the trimmed deep neural network model to identify the remote sensing image building.

Specifically, in the training process of the deep neural network model described in step 2, the training set is represented as D ═ { X, y }, X represents the nth remote sensing image, y represents the building label corresponding to the nth remote sensing image, Θ ═ W, b } represents the weight of the deep neural network to be trained, W represents the ith convolutional layer, b represents the offset on the ith layer, and the trained weight Θ is obtained by defining the loss function of the recognition task and using the BP algorithm^*So that the model can be made static with small error values while achieving a high recognition accuracy on the data set D.

Further, in the calculation process of the independent maximum response graph in step 3, the objective function is

Wherein, X of the initial input is random noise theta^*For the trained weights, J represents the number of all convolution kernels in the l layers, and the output of each convolution kernel is h_l，i(X，Θ^*) Which represents the output of the ith convolution kernel at the l-th layer, h_l，-i(X，Θ^*) Other feature maps representing the activation values of the outputs excluding the target i convolution kernels, argmax (×) representing the maximum response map of the outputs, X^*Then the independent maximum response graph of the final output is represented;

fixing corresponding weight value, using gradient rising algorithm to iteratively update X^*So that it can make the output activation value of ith convolution kernel of l layer be maximum and X obtained by target function^*The target convolution kernel output can be enabled to be as large as possible, and simultaneously, the integral output of other convolution kernels is ensured to be small, and the final X is obtained^*Then the independent maximum response map in feature space for the corresponding convolution kernel.

Further, the alternative calculation of the activation expression comprises the following steps:

step 401, calculating the expression replaceability, wherein the formula of the expression replaceability is as follows

RS (l, i) represents the property that the unwrapping feature of the ith convolution kernel of the ith layer can be replaced by other convolution kernels on the same layer, wherein | { x_lL represents the total number of the ith layer of convolution kernels, IAM (l, i) represents the characteristic representation of an independent maximum response graph generated by the ith layer of convolution kernels, and f (IAM (l, i)) represents the activation value of the ith layer of convolution kernels obtained by forward propagation of the characteristic representation of the generated independent maximum response graph; i { x_l，j：x_l，j＞x_l，iThe I represents the number of convolution kernels which are larger than the activation value of the ith convolution kernel in the l layer; expression replaceability quantifies the replaceability of the convolution kernel expression on the same layer, and the metric value ranges from 0 to 1]；

Step 402, calculating the expression replaceability of activation, wherein the expression replaceability of activation is defined as:

AR (l, i) represents the ratio of the activation values of the corresponding convolution kernels, wherein the ratio is not a 0 value, and the expression replaceability of activation represents the replaceability of the expression of the effective activation value output by the target convolution kernel in the feature space.

For the existing remote sensing image building identification model, the generalization upper bound obtained by the traditional theoretical method is limited to the evaluation of the model. They do not specifically quantify the generalization ability of deep learning models. The method for setting the test set needs more skills to ensure the balance of data distribution in the test set, and is difficult to truly reflect the performance of the model on unseen data. There is therefore a need for an index that quantifies model generalization that can directly compare the generalization ability of a model without using a test set. The aim of identifying the building is improved. In this regard, the generalization capability of the model is directly quantified using the main structural convolution kernel of the building identification model. Each convolution kernel has the ability to extract features. And the final recognizer recognizes the extracted features to finish reasoning the unseen data, namely embodying the generalization. Numerous analyses in parameter space also verify the importance of the convolution kernel to model generalization. The performance of the convolution kernel in the feature space also belongs to the performance of the generalization capability of the model. Naturally, as the convolution kernel can present a richer representation of features in the feature space, these rich features will be more favorable for predicting new unseen data. The measure of richness of the model acquisition features has important significance for evaluating generalization. The method can further quantize the generalization capability of the model by measuring the richness of the characteristics obtained by the convolution kernel, so that the model is pruned, and the identification precision of the remote sensing image building by the model is effectively improved.

Drawings

FIG. 1 is a schematic flow diagram of the process of the present invention;

FIG. 2 is an alternative schematic representation of the expression of activation of an embodiment of the method of the present invention.

Detailed Description

The present invention is further illustrated by the following examples and the accompanying drawings, but the present invention is not limited thereto in any way, and any modifications or alterations based on the teaching of the present invention are within the scope of the present invention.

As shown in fig. 1, the method for identifying buildings based on remote sensing images with alternative activation expression comprises the following steps:

step 1, obtaining a building data set of a remote sensing image;

step 2, training a common deep neural network model;

In a remote sensing image building identification task, the training set is assumed to be D ═ X, y. X represents the nth remote sensing image, and y represents a building label corresponding to the nth remote sensing image. And the weight Θ of the deep neural network that needs training is { W, b }. W denotes the first convolutional layer, and similarly b denotes the offset on the first layer. Theta is obtained by defining a loss function of an identification task and using a BP algorithm^*So that the model can achieve a high recognition accuracy on the data set D while still maintaining a small error value. After the model is converged, for each new input image, the image can be converted into corresponding feature vectors through the processing of a convolution kernel. The last recognizer of the model can correctly recognize the image.

Each convolution kernel will produce a different response to a different feature. The output of each convolution kernel is defined as h (X, θ), which represents the output of the ith convolution kernel on the l-th layer. The maximum response graph algorithm proposed by Erhan et al, which is intended to obtain the features that result in the maximum response of the convolution kernel output, is defined as follows,

X^*＝argmaxh_l，i(X，Θ^*)

the X of the initial input is random noise. For a trained weight Θ^*. Fixing corresponding weight value, using gradient rising algorithm to iteratively update X^*So that it can maximize the output activation value of the ith convolution kernel of the l-th layer. Finally obtained X^*Then a visual representation that can cause the convolution kernel to produce the maximum response is represented.

However, in the maximum response map algorithm, only the activation value of the target convolution kernel is considered to be maximized. We have found that the resulting visual representation also causes other convolution kernels in the same layer to produce a high response, i.e. the resulting visual representation is characteristically entangled with the representations of the other convolution kernels. The resulting image does not represent the representation of the corresponding convolution kernel in the feature space well. Therefore, in order to better obtain the unwrapping characteristic of the convolution kernel in the characteristic space, the embodiment modifies the target of the maximum response map algorithm. The activation value output by the target convolution kernel is maximum, and meanwhile, other convolution kernels are guaranteed to obtain less input as far as possible.

Wherein, X of the initial input is random noise theta^*For the trained weights, J represents the number of all convolution kernels in the l layers, and the output of each convolution kernel is h_l，i(X，Θ^*) Which represents the output of the ith convolution kernel at the ith layer,h_l，-i(X，Θ^*) Other characteristic graphs representing the output activation values excluding the target i convolution kernels, argmax (X) representing the maximum response graph of the output, and X representing the independent maximum response graph of the final output;

For remote sensing image building identification, the model can predict new data by relying on various characteristics. When the new remote sensing image has a good prediction result, the model has strong generalization capability. We can help us measure the model generalization ability by measuring the characteristics of the model's representation in the feature space.

Currently, the unwrapping characteristic of each convolution kernel can be obtained through an independent maximum response graph algorithm. But different convolution kernels may produce a repetitive expression, there being other convolution kernels that produce a higher response for the unwrapped feature. The method of the invention provides a method for expressing an alternative Repeatable Stabilization (RS) to quantify the repeatability of expressions of other convolution kernels, namely the replaceability of the expressions of the convolution kernels on a feature space.

And (4) obtaining the unwrapping characteristic of the target convolution kernel by using an independent maximum response graph algorithm, and propagating the unwrapping characteristic of the target convolution kernel to the layer where the target convolution kernel is located in the forward direction. If the activation value of the target convolution kernel at the corresponding layer is maximum, then the feature indicating the convolution kernel is that other convolution kernels cannot be replaced. Conversely, when there are other convolution kernels in the same layer whose activation values are greater than the activation value of the target convolution kernel, then the feature representing the target convolution kernel may be replaced. The feature that can be replaced is missing as well as other convolution kernels that express the feature, so the convolution kernel is not important.

Further, the alternative calculation of the activation expression includes steps 401 and 402.

RS (l, i) represents the property that the unwrapping feature of the ith convolution kernel of the ith layer can be replaced by other convolution kernels on the same layer, wherein | { x_lL represents the total number of the ith layer of convolution kernels, IAM (l, i) represents the characteristic representation of an independent maximum response graph generated by the ith layer of convolution kernels, and f (IAM (l, i)) represents the activation value of the ith layer of convolution kernels obtained by forward propagation of the characteristic representation of the generated independent maximum response graph; i { x_l，j：x_l，j＞x_l，iThe I represents the number of convolution kernels which are larger than the activation value of the ith convolution kernel in the l layer; expression replaceability quantifies the replaceability of the convolution kernel expression on the same layer, and the metric value ranges from 0 to 1]。

A convolution kernel with low expression replaceability may indicate that the expression is not easily replaced. And when the expression replaceability is high, two meanings are given. One is that the expression of the convolution kernel is easily replaced by other convolution kernels. The other indicates that the convolution kernel does not learn any features. Since f (x) is close to 0 as known from the independent maximum response graph formula when the target convolution kernel does not respond to any feature, the result is that the average response value of the other convolution kernels is minimal. These convolution kernels will produce nearly similar results. But the convolution kernel does not learn any features, and can also cause higher response of other convolution kernels and make the expression replaceability value of the convolution kernel higher. The alternative expression also means that the convolution kernel does not learn the features.

The invention further proposes an Activated expression alternative Activated Representational Substistion (ARS) that can uniformly represent the replaceability of convolution kernel expressions. According to the independent maximum response graph formula, the output is 0 when the target convolution kernel does not learn the characteristics. And (4) alternatively representing the activation condition of the convolution kernel by using an expression, and representing the proportion of non-zero values in the output characteristic diagram.

Step 402, combining the expression replaceability and the activation response value of the convolution kernel, calculating the activated expression replaceability, wherein the activated expression replaceability is defined as:

A convolution kernel with low replaceability of the activated expression means that the activated expression on the same layer is less easily replaced, which is important for generalization of the model. As the value of ARS becomes larger, it means that the convolution kernel changes from repetitive expression to meaningless expression. Convolution kernels with high ARS are not important for model generalization.

The realization principle of activated expression replaceability (ARS) is that the feature richness of a building model of a remote sensing image is measured, so that the generalization characteristic of the model is measured, and the recognition result of the recognition model is improved in a targeted manner. The method can also excellently improve the identification precision of the building model of the remote sensing image. An activated Alternate Representation (ARS) principle diagram is shown in fig. 2, where the gray scale and shape represent different features learned by the remote sensing image deep learning model. Because the number of model convolution kernels is fixed, there may be richer features for the entire model when the representation of each convolution kernel is not easily replaced by the representations of the other convolution kernels. Therefore, the expression replaceability of the activation of the convolution kernel has a strong relationship with the generalization of the model.

As shown in fig. 2, the input image is a button and a face composed of different features. The model 2 shows expressions which are not easily replaced by other convolution kernels in three convolution kernels, namely, the activated expressions of each convolution kernel are low in replaceability, so that the input image can be correctly identified. And similar expressions exist in the convolution kernels of the model 1, the activation expressions of partial convolution kernels are large in replaceability, and a face cannot be correctly identified.

According to the invention content and the embodiment, the method aims at the expression replaceability index method for the deep learning-based building identification model, which can quantify the replaceability of the expression of each convolution kernel on the same layer in the feature space. The more important the expression replaceability value of the activation of the convolution kernel is lower, the expression replaceability value represents that the convolution kernel is irreplaceable in the feature space, and the method can be used for selectively pruning the convolution kernel, so that the identification accuracy of the remote sensing image building identification model is effectively improved.

Claims

1. based on the remote sensing image building identification method of activation expression replaceability, it is characterized in that, comprise the following steps:

Step 1, obtaining a remote sensing image building data set;

Step 2, train a common deep neural network model;

Step 3, calculate and identify the independent maximum response map of each convolution kernel in the model;

Step 4, calculate the replaceability of the activation expression of each convolution kernel;

Step 5, trim the model convolution kernel according to the replaceability of activation expression of each convolution kernel, and retain the replaceability of small activation expression;

Step 6, use the pruned deep neural network model to identify buildings in remote sensing images;

In the training process of the deep neural network model described in step 2, the training set is represented as D={X, y}, X represents the nth remote sensing image, y represents the building label corresponding to the nth remote sensing image, Θ= {W, b} represents the weight of the deep neural network to be trained, W represents the lth convolutional layer, and b represents the bias on the lth layer. By defining the loss function of the recognition task and using the BP algorithm, the trained The weight of Θ ^* enables the model to obtain a high recognition accuracy rate on the dataset D while maintaining a small error value statically;

The described activation expression replaceability calculation includes the following steps:

In step 401, the expression replaceability is calculated, and the formula for expressing the replaceability is as follows

RS(l, i) represents the feature that the disentangled feature of the ith convolution kernel in the lth layer can be replaced by other convolution kernels in the same layer, where |{x _l }| represents the total number of convolution kernels in the lth layer , IAM(l, i) represents the feature representation of the independent maximum response map generated by the ith convolution kernel of the lth layer, f(IAM(l, i)) represents forwarding the generated independent maximum response map feature representation The activation value of the i-th convolution kernel in the l-th layer obtained by propagation; |{x _{l, j} : x _{l, j} > x _{l, i} }| indicates that in the l-th layer, the activation value of the i-th convolution kernel is greater than that The number of convolution kernels; expression replaceability quantifies the replaceability of convolution kernels expressed on the same layer, and its metric value ranges from [0, 1];

In step 402, the activated expression replaceability is calculated, and the activated expression replaceability is defined as:

AR(l, i) represents the proportion of the activation value of the corresponding convolution kernel that is not 0, and the expression replaceability of the activation represents the expression replaceability of the effective activation value output by the target convolution kernel in the feature space .

2. remote sensing image building identification method according to claim 1 is characterized in that, in the calculation process of the independent maximum response graph described in step 3, its objective function is

Among them, the initial input X is random noise, Θ ^* is the trained weight, J represents the number of all convolution kernels in the l layer, and the output of each convolution kernel is h _{l, i} (X, Θ ^* ), which is Represents the output of the ith convolution kernel on the lth layer, h _{l, -i} (X, Θ ^* ) represents other feature maps that remove the output activation values of the target i convolution kernels, and arg max(*) represents the output Maximum response map, X ^* represents the independent maximum response map of the final output;

Fix the corresponding weights, use the gradient ascent algorithm to iteratively update X ^* , so that it can maximize the output activation value of the ith convolution kernel of the lth layer, and the X ^* obtained by the objective function can make the output of the target convolution kernel as far as possible At the same time, it also ensures that the overall output of other convolution kernels is small, and the final X ^* is the independent maximum response map of the corresponding convolution kernel in the feature space.