Disclosure of Invention
In view of the above, the present invention provides a method for identifying a building based on a remote sensing image with an alternative activation expression, and the present invention provides an index for effectively evaluating the generalization of a remote sensing image identification model, and the remote sensing image identification model is pruned through the index, so as to improve the accuracy of the model and further improve the identification capability of the model.
The invention aims to realize the method for identifying the buildings based on the remote sensing images with the alternative activation expression, which comprises the following steps:
step 1, obtaining a building data set of a remote sensing image;
step 2, training a common deep neural network model;
step 3, calculating and identifying an independent maximum response graph of each convolution kernel in the model;
step 4, calculating the activation expression replaceability of each convolution kernel;
step 5, pruning the model convolution kernels according to the activation expression replaceability of each convolution kernel, and keeping small activation expression replaceability;
and 6, using the trimmed deep neural network model to identify the remote sensing image building.
Specifically, in the training process of the deep neural network model described in step 2, the training set is represented as D ═ { X, y }, X represents the nth remote sensing image, y represents the building label corresponding to the nth remote sensing image, Θ ═ W, b } represents the weight of the deep neural network to be trained, W represents the ith convolutional layer, b represents the offset on the ith layer, and the trained weight Θ is obtained by defining the loss function of the recognition task and using the BP algorithm*So that the model can be made static with small error values while achieving a high recognition accuracy on the data set D.
Further, in the calculation process of the independent maximum response graph in step 3, the objective function is
Wherein, X of the initial input is random noise theta*For the trained weights, J represents the number of all convolution kernels in the l layers, and the output of each convolution kernel is hl,i(X,Θ*) Which represents the output of the ith convolution kernel at the l-th layer, hl,-i(X,Θ*) Other feature maps representing the activation values of the outputs excluding the target i convolution kernels, argmax (×) representing the maximum response map of the outputs, X*Then the independent maximum response graph of the final output is represented;
fixing corresponding weight value, using gradient rising algorithm to iteratively update X*So that it can make the output activation value of ith convolution kernel of l layer be maximum and X obtained by target function*The target convolution kernel output can be enabled to be as large as possible, and simultaneously, the integral output of other convolution kernels is ensured to be small, and the final X is obtained*Then the independent maximum response map in feature space for the corresponding convolution kernel.
Further, the alternative calculation of the activation expression comprises the following steps:
step 401, calculating the expression replaceability, wherein the formula of the expression replaceability is as follows
RS (l, i) represents the property that the unwrapping feature of the ith convolution kernel of the ith layer can be replaced by other convolution kernels on the same layer, wherein | { xlL represents the total number of the ith layer of convolution kernels, IAM (l, i) represents the characteristic representation of an independent maximum response graph generated by the ith layer of convolution kernels, and f (IAM (l, i)) represents the activation value of the ith layer of convolution kernels obtained by forward propagation of the characteristic representation of the generated independent maximum response graph; i { xl,j:xl,j>xl,iThe I represents the number of convolution kernels which are larger than the activation value of the ith convolution kernel in the l layer; expression replaceability quantifies the replaceability of the convolution kernel expression on the same layer, and the metric value ranges from 0 to 1];
Step 402, calculating the expression replaceability of activation, wherein the expression replaceability of activation is defined as:
AR (l, i) represents the ratio of the activation values of the corresponding convolution kernels, wherein the ratio is not a 0 value, and the expression replaceability of activation represents the replaceability of the expression of the effective activation value output by the target convolution kernel in the feature space.
For the existing remote sensing image building identification model, the generalization upper bound obtained by the traditional theoretical method is limited to the evaluation of the model. They do not specifically quantify the generalization ability of deep learning models. The method for setting the test set needs more skills to ensure the balance of data distribution in the test set, and is difficult to truly reflect the performance of the model on unseen data. There is therefore a need for an index that quantifies model generalization that can directly compare the generalization ability of a model without using a test set. The aim of identifying the building is improved. In this regard, the generalization capability of the model is directly quantified using the main structural convolution kernel of the building identification model. Each convolution kernel has the ability to extract features. And the final recognizer recognizes the extracted features to finish reasoning the unseen data, namely embodying the generalization. Numerous analyses in parameter space also verify the importance of the convolution kernel to model generalization. The performance of the convolution kernel in the feature space also belongs to the performance of the generalization capability of the model. Naturally, as the convolution kernel can present a richer representation of features in the feature space, these rich features will be more favorable for predicting new unseen data. The measure of richness of the model acquisition features has important significance for evaluating generalization. The method can further quantize the generalization capability of the model by measuring the richness of the characteristics obtained by the convolution kernel, so that the model is pruned, and the identification precision of the remote sensing image building by the model is effectively improved.
Detailed Description
The present invention is further illustrated by the following examples and the accompanying drawings, but the present invention is not limited thereto in any way, and any modifications or alterations based on the teaching of the present invention are within the scope of the present invention.
As shown in fig. 1, the method for identifying buildings based on remote sensing images with alternative activation expression comprises the following steps:
step 1, obtaining a building data set of a remote sensing image;
step 2, training a common deep neural network model;
step 3, calculating and identifying an independent maximum response graph of each convolution kernel in the model;
step 4, calculating the activation expression replaceability of each convolution kernel;
step 5, pruning the model convolution kernels according to the activation expression replaceability of each convolution kernel, and keeping small activation expression replaceability;
and 6, using the trimmed deep neural network model to identify the remote sensing image building.
In a remote sensing image building identification task, the training set is assumed to be D ═ X, y. X represents the nth remote sensing image, and y represents a building label corresponding to the nth remote sensing image. And the weight Θ of the deep neural network that needs training is { W, b }. W denotes the first convolutional layer, and similarly b denotes the offset on the first layer. Theta is obtained by defining a loss function of an identification task and using a BP algorithm*So that the model can achieve a high recognition accuracy on the data set D while still maintaining a small error value. After the model is converged, for each new input image, the image can be converted into corresponding feature vectors through the processing of a convolution kernel. The last recognizer of the model can correctly recognize the image.
Each convolution kernel will produce a different response to a different feature. The output of each convolution kernel is defined as h (X, θ), which represents the output of the ith convolution kernel on the l-th layer. The maximum response graph algorithm proposed by Erhan et al, which is intended to obtain the features that result in the maximum response of the convolution kernel output, is defined as follows,
X*=argmaxhl,i(X,Θ*)
the X of the initial input is random noise. For a trained weight Θ*. Fixing corresponding weight value, using gradient rising algorithm to iteratively update X*So that it can maximize the output activation value of the ith convolution kernel of the l-th layer. Finally obtained X*Then a visual representation that can cause the convolution kernel to produce the maximum response is represented.
However, in the maximum response map algorithm, only the activation value of the target convolution kernel is considered to be maximized. We have found that the resulting visual representation also causes other convolution kernels in the same layer to produce a high response, i.e. the resulting visual representation is characteristically entangled with the representations of the other convolution kernels. The resulting image does not represent the representation of the corresponding convolution kernel in the feature space well. Therefore, in order to better obtain the unwrapping characteristic of the convolution kernel in the characteristic space, the embodiment modifies the target of the maximum response map algorithm. The activation value output by the target convolution kernel is maximum, and meanwhile, other convolution kernels are guaranteed to obtain less input as far as possible.
Further, in the calculation process of the independent maximum response graph in step 3, the objective function is
Wherein, X of the initial input is random noise theta*For the trained weights, J represents the number of all convolution kernels in the l layers, and the output of each convolution kernel is hl,i(X,Θ*) Which represents the output of the ith convolution kernel at the ith layer,hl,-i(X,Θ*) Other characteristic graphs representing the output activation values excluding the target i convolution kernels, argmax (X) representing the maximum response graph of the output, and X representing the independent maximum response graph of the final output;
fixing corresponding weight value, using gradient rising algorithm to iteratively update X*So that it can make the output activation value of ith convolution kernel of l layer be maximum and X obtained by target function*The target convolution kernel output can be enabled to be as large as possible, and simultaneously, the integral output of other convolution kernels is ensured to be small, and the final X is obtained*Then the independent maximum response map in feature space for the corresponding convolution kernel.
For remote sensing image building identification, the model can predict new data by relying on various characteristics. When the new remote sensing image has a good prediction result, the model has strong generalization capability. We can help us measure the model generalization ability by measuring the characteristics of the model's representation in the feature space.
Currently, the unwrapping characteristic of each convolution kernel can be obtained through an independent maximum response graph algorithm. But different convolution kernels may produce a repetitive expression, there being other convolution kernels that produce a higher response for the unwrapped feature. The method of the invention provides a method for expressing an alternative Repeatable Stabilization (RS) to quantify the repeatability of expressions of other convolution kernels, namely the replaceability of the expressions of the convolution kernels on a feature space.
And (4) obtaining the unwrapping characteristic of the target convolution kernel by using an independent maximum response graph algorithm, and propagating the unwrapping characteristic of the target convolution kernel to the layer where the target convolution kernel is located in the forward direction. If the activation value of the target convolution kernel at the corresponding layer is maximum, then the feature indicating the convolution kernel is that other convolution kernels cannot be replaced. Conversely, when there are other convolution kernels in the same layer whose activation values are greater than the activation value of the target convolution kernel, then the feature representing the target convolution kernel may be replaced. The feature that can be replaced is missing as well as other convolution kernels that express the feature, so the convolution kernel is not important.
Further, the alternative calculation of the activation expression includes steps 401 and 402.
Step 401, calculating the expression replaceability, wherein the formula of the expression replaceability is as follows
RS (l, i) represents the property that the unwrapping feature of the ith convolution kernel of the ith layer can be replaced by other convolution kernels on the same layer, wherein | { xlL represents the total number of the ith layer of convolution kernels, IAM (l, i) represents the characteristic representation of an independent maximum response graph generated by the ith layer of convolution kernels, and f (IAM (l, i)) represents the activation value of the ith layer of convolution kernels obtained by forward propagation of the characteristic representation of the generated independent maximum response graph; i { xl,j:xl,j>xl,iThe I represents the number of convolution kernels which are larger than the activation value of the ith convolution kernel in the l layer; expression replaceability quantifies the replaceability of the convolution kernel expression on the same layer, and the metric value ranges from 0 to 1]。
A convolution kernel with low expression replaceability may indicate that the expression is not easily replaced. And when the expression replaceability is high, two meanings are given. One is that the expression of the convolution kernel is easily replaced by other convolution kernels. The other indicates that the convolution kernel does not learn any features. Since f (x) is close to 0 as known from the independent maximum response graph formula when the target convolution kernel does not respond to any feature, the result is that the average response value of the other convolution kernels is minimal. These convolution kernels will produce nearly similar results. But the convolution kernel does not learn any features, and can also cause higher response of other convolution kernels and make the expression replaceability value of the convolution kernel higher. The alternative expression also means that the convolution kernel does not learn the features.
The invention further proposes an Activated expression alternative Activated Representational Substistion (ARS) that can uniformly represent the replaceability of convolution kernel expressions. According to the independent maximum response graph formula, the output is 0 when the target convolution kernel does not learn the characteristics. And (4) alternatively representing the activation condition of the convolution kernel by using an expression, and representing the proportion of non-zero values in the output characteristic diagram.
Step 402, combining the expression replaceability and the activation response value of the convolution kernel, calculating the activated expression replaceability, wherein the activated expression replaceability is defined as:
AR (l, i) represents the ratio of the activation values of the corresponding convolution kernels, wherein the ratio is not a 0 value, and the expression replaceability of activation represents the replaceability of the expression of the effective activation value output by the target convolution kernel in the feature space.
A convolution kernel with low replaceability of the activated expression means that the activated expression on the same layer is less easily replaced, which is important for generalization of the model. As the value of ARS becomes larger, it means that the convolution kernel changes from repetitive expression to meaningless expression. Convolution kernels with high ARS are not important for model generalization.
The realization principle of activated expression replaceability (ARS) is that the feature richness of a building model of a remote sensing image is measured, so that the generalization characteristic of the model is measured, and the recognition result of the recognition model is improved in a targeted manner. The method can also excellently improve the identification precision of the building model of the remote sensing image. An activated Alternate Representation (ARS) principle diagram is shown in fig. 2, where the gray scale and shape represent different features learned by the remote sensing image deep learning model. Because the number of model convolution kernels is fixed, there may be richer features for the entire model when the representation of each convolution kernel is not easily replaced by the representations of the other convolution kernels. Therefore, the expression replaceability of the activation of the convolution kernel has a strong relationship with the generalization of the model.
As shown in fig. 2, the input image is a button and a face composed of different features. The model 2 shows expressions which are not easily replaced by other convolution kernels in three convolution kernels, namely, the activated expressions of each convolution kernel are low in replaceability, so that the input image can be correctly identified. And similar expressions exist in the convolution kernels of the model 1, the activation expressions of partial convolution kernels are large in replaceability, and a face cannot be correctly identified.
According to the invention content and the embodiment, the method aims at the expression replaceability index method for the deep learning-based building identification model, which can quantify the replaceability of the expression of each convolution kernel on the same layer in the feature space. The more important the expression replaceability value of the activation of the convolution kernel is lower, the expression replaceability value represents that the convolution kernel is irreplaceable in the feature space, and the method can be used for selectively pruning the convolution kernel, so that the identification accuracy of the remote sensing image building identification model is effectively improved.