CN118521767A

CN118521767A - Infrared small target detection method based on learning guided filtering

Info

Publication number: CN118521767A
Application number: CN202410560465.6A
Authority: CN
Inventors: 张欣鹏; 刘加臣; 石凡
Original assignee: Tianjin University of Technology
Current assignee: Tianjin University of Technology
Priority date: 2024-05-08
Filing date: 2024-05-08
Publication date: 2024-08-20

Abstract

The invention relates to an infrared small target detection method based on learning guide filtering, and belongs to the field of computer vision and image processing. The method comprises the steps of preprocessing an infrared image, unifying the image sizes, and inputting the image sizes into a model. The image features are then encoded using a central differential convolution module (CENTRAL DIFFERENCE Convolution, CDC) to obtain shallow features. The features are then processed in the frequency domain using a fast fourier convolution module (Fast Fourier Convolution, FFC) to obtain global deep features. The edges are then enhanced with the upsampling features and the skip-connect features using a designed learning guided filter module (Learnable Guided Filtering, LGF). Finally, the convolution of the last layer is used to extract the small target. The invention fully utilizes the edge distribution characteristic of the small target and the edge protection characteristic of the guided filter, realizes the multi-scale target detection of edge enhancement, increases the detail and the edge information of the target, and improves the detection performance of the infrared small target.

Description

Infrared small target detection method based on learning guided filtering

Technical Field

The invention relates to an infrared small target detection method based on learning guide filtering, and belongs to the field of computer vision and image processing. The method well overcomes the defects of fewer pixels occupied by a target, insufficient texture information, blurred edges and the like commonly existing in an infrared image, has good edge protection characteristic based on guide filtering, constructs a learnable guide filtering module by different processing of characteristics on the basis of using a box-type mean filter and a guide filtering parameter calculation module which are defined and designed by a series of convolution layers in a neural network, and greatly enhances the connectivity of the edge characteristics and target areas of candidate targets by widely applying the module in a model, so that the targets are more prominent and clear in the image, and meanwhile, has stronger robustness to multi-scale targets and clutters with different shapes.

Background

The infrared small target detection is an important research topic in the field of computer vision, and is widely applied to the fields of military reconnaissance, security monitoring, environmental monitoring and the like. But due to the long imaging distance, the target is weak, often lacking in details of texture, structure, edges, etc. In addition, the infrared radiation signal of a small target is weak, and is easy to be interfered by noise and complex background. Thus, infrared small target detection remains a difficult and challenging task. By analyzing the characteristics of small objects in the infrared image, the object area and the background area have obvious gray feature differences. This difference is mainly reflected in the brightness intensity or gray scale distribution. Typically, the target area will exhibit higher reflection or emission characteristics than the surrounding environment and thus appear as a bright area in the infrared image, while the background appears as a relatively dark area. Such reflective or emissive properties result in a significant gray gradient between the target area and the background, resulting in a significant brightness change. Conventional methods therefore typically use this difference as a basis for object detection and recognition, distinguishing objects from the background by analyzing the gray scale distribution in the image. Mathematical modeling is performed by utilizing the difference of the distribution characteristics of the targets, and the filters are used for separating the targets from the background; designing a significance measuring method to divide small objects; identifying the target by using the contrast between the target and the surrounding environment; false alarm acquisition targets are suppressed using patch similarity. However, these methods are sensitive to super parameter settings, lack generalization, and have low performance in complex real scenes.

With deep learning becoming the mainstream in many computer vision tasks, many people choose to use deep neural networks for research on infrared target detection and have made great progress. Using classical encoder-decoder architecture, the weights of the target and background are adjusted and their differences are enhanced by embedding fully connected layers; learning features that reduce missed detection and false alarms by using an antagonistic learning scheme; modules are presented to fuse low-level and high-level features. However, these methods have large deviations in the shape edge information representation, and cannot well retain fine details such as the target shape and edges.

Disclosure of Invention

The invention aims to overcome the defects in the prior art and provide an infrared small target detection method based on learning guided filtering. Therefore, the invention adopts the following technical scheme.

The infrared small target detection method comprises the following steps:

Step 1: inputting the data after preprocessing into a model, extracting shallow features by using a central difference convolution module, and extracting deep features by using a fast Fourier convolution module;

step 2: the learning guide filtering module is designed, shallow layer characteristics and deep layer characteristics are respectively processed by using the learning guide filtering module, and processed output characteristics are fused to obtain edge-reinforced fusion characteristics.

1. In the step 1, a central difference convolution module and a fast Fourier convolution module are respectively used for extracting shallow features and deep features, and the specific steps are as follows:

1) Preprocessing data, adjusting the infrared image size into a uniform size, inputting the uniform size into a model, and processing the characteristics by using common convolution and center differential convolution in a center differential convolution module to obtain different outputs, wherein the process is as follows:

Wherein y1 and y2 are output characteristics of the common convolution and the central differential convolution respectively, and Conv (·), cdc (·) and x represent the common convolution operation, the central differential convolution operation and the input characteristics respectively;

2) The two output characteristics y1 and y2 are subjected to convolution and fusion addition operation to obtain the output of the central differential convolution module, and the process is as follows:

wherein L and Representing the operation of fusion and addition of the output and the characteristics of the central differential convolution module, wherein L is used as the input characteristic of the next central differential convolution module, and a plurality of central differential convolution modules are used for obtaining the shallow layer output characteristic L _f of the last layer;

3) The characteristics are input into a fast Fourier convolution module, the input characteristics are processed by using local branches and global branches respectively and then fused and added to obtain deep characteristics output by the fast Fourier convolution module, and the process is as follows:

Wherein F, h (·) and l (·) respectively represent the output of the fft convolution module, the global branch operation and the local branch operation, and F is taken as the input of the next fft convolution module, and the output characteristic of the last fft module can be obtained by using a plurality of fft convolution modules, which is F _f.

2. In step 2, a learning guide filtering module is designed and fusion characteristics of edge reinforcement are obtained, and the specific steps are as follows:

1) The correlation coefficient of the filter is solved based on the principle of traditional guided filtering, and the expression is as follows:

Wherein a and b represent the correlation coefficient of the filter, I, p, Respectively representing a guide image, an input image, a mean value of the guide image and a mean value of the input image, cov (I, p) representing/and covariance of p, var (I) representing variance of I, epsilon representing an L2 norm regularization coefficient;

2) The filtered output can be obtained from the above coefficients, the input image and the guide image, and its expression is as follows:

q＝a·I+b (5)

wherein q represents the filtered output of the input image;

3) Filtering the jump connection feature L _f based on the operations of the step 1) and the step 2), wherein the input diagram and the guide diagram are both L _f, and a filtered result can be obtained, and the expression is as follows:

Where q ₁ represents the filtered output of the skip connect feature;

4) Filtering the upsampling feature F _f based on the operations of step 1) and step 2), wherein the input map is L _f for the guide map F _f, and a filtered result is obtained, and the expression is as follows:

Where q ₂ represents the filtered output of the upsampling feature;

5) And carrying out fusion processing on the obtained filter output of the jump connection characteristic and the filter output of the up-sampling characteristic to obtain the output of the learnable guide filter module, wherein the expression is as follows:

wherein the method comprises the steps of Representing a splice operation, relu representing an activate operation, BN representing a batch normalization operation, M representing the output characteristics of the first learnable guided filter module, M being the upsampling characteristics of the subsequent learnable guided filter modules, the output characteristics M _f of the last module being obtained using a plurality of learnable guided filter modules.

The beneficial effects of the invention are as follows:

1. edge enhancement detection: the invention is based on a learning guided filter model, a central differential convolution and a fast Fourier convolution, fully ensures the integrality and the segmentation accuracy of targets with different sizes, and can realize multi-scale edge enhancement segmentation without manually setting parameters.

2. The detection effect is good: the method not only ensures the integrity of the candidate target, but also has stronger robustness to the images under different noise environments. Meanwhile, edge information can be enhanced for targets with different scales in the segmentation process, and the detection performance of small targets is remarkably improved.

Drawings

Fig. 1: the invention discloses a flow chart of an infrared small target image detection method based on learning guide filtering.

Fig. 2: a network flow chart is detected based on the learnable guided filtered infrared small target image.

Fig. 3: the structure diagram of the CDC module and FFC module of the encoder section in the network is detected.

Fig. 4: the decoder portion of the detection network may learn the structure of the guided filtered LGF module.

Detailed Description

The flow of the invention is shown in figure 1, the method firstly preprocesses the infrared image data, uniformly processes the image size and inputs the processed image size into a network model. The image features are then encoded using a central differential convolution (CENTRAL DIFFERENCE Convolution, CDC) and CDC module to obtain shallow downsampled features that better preserve shape edge information. A fast fourier convolution module (Fast Fourier Convolution, FFC) is then used to obtain advanced semantic features of the global receptive field by converting the features from the spatial domain to the frequency domain, and then reconverting the features from the frequency domain to the spatial domain using an inverse transform after processing the features in the frequency domain. Then we use the proposed learning guided filter module (Learnable Guided Filtering, LGF) to perform edge feature enhancement on the up-sampling feature and the skip connection feature by using different processing strategies, and then perform up-sampling operation after fusing the edge enhanced features. And finally, a characteristic diagram with the channel number of 1 is obtained by using convolution operation with the convolution kernel size of 1x1 of the last layer, and a small target is extracted. The following describes the implementation process of the technical scheme of the invention with reference to the attached drawings.

1. Data preprocessing and shallow layer feature and deep layer feature extraction

The infrared image is unified to 512 x 512 in size, and then the preprocessed image is input to the network model. In order to obtain the shallow features and the deep features, a specific operation flow is shown in the left side and the lower side of fig. 2, and the specific steps are as follows:

1) Acquiring a feature representation with contrast information using two differential center convolution operations; the differential convolution operation process is shown in formula (1), and the result obtained after two times of central differential convolution is as follows:

In the above formula, L ₀ represents the output result of cdc_1 in fig. 2, that is, the result obtained by performing central difference convolution twice, and C (x) represents the combination operation of central difference and normal convolution, and is specifically defined as shown in formula (1).

2) Using four central differential convolution modules to obtain shallow layer characteristics with different resolutions and capable of better retaining shape edge information, and connecting the shallow layer characteristics to a decoder part in a jumping manner; after the above results are obtained, we use L ₀ as input to obtain shallow features using a central differential convolution module, and the specific operation flow is shown in the left part of fig. 3. The operation process in the solid line box corresponds to C (x) in the formula (9), and then the obtained center difference convolution result and the subtracted result are subjected to common convolution and fusion addition respectively to obtain the output L _f of the final center difference convolution module. The specific operation is shown in formula (2). The outputs L ₁、L₂、L₃、L_f are obtained sequentially using four-layer center differential convolution modules corresponding to cdc_2 in fig. 2, respectively, with the former output result being the input of the latter module. The output after the operation of the last center differential convolution module is L _f, which is expressed as follows:

Wherein the input x corresponding to the output L ₁、L₂、L₃ in turn takes the value L ₀、L₁、L₂.

3) Taking the final output L _f of the central differential convolution module as the input of a fast Fourier convolution module (FFC), the specific flow is shown in the lower part of FIG. 2, N FFC modules are used for extracting high-level semantic features with global receptive fields, N is set to be 7 in actual use, and the specific structural composition of the FFC modules is shown in the block FFC part of FIG. 3. The left half part of the block shows that the FFC module is divided into L, G branches, and the two branches respectively show local branches and global branches, wherein the local branches process characteristics by using common convolution, relu activation and BN batch normalization operation, the global branches transform input characteristics from a space domain to a frequency domain by using fast Fourier transform after using common convolution, relu activation and BN batch normalization operation of the right half part of the FFC block, then perform common convolution, relu activation and BN batch normalization operation on the frequency domain, then transform the characteristics from the frequency domain to the space domain by using inverse fast Fourier transform, and then obtain output characteristics of the global branches by using common convolution after residual connection of the characteristics before conversion. And finally, fusing and adding the features of the two branches to obtain the final module output feature. The process is as follows:

In the above formula, x represents the input characteristic of the FFC module, U (x) represents the normal convolution, relu activation and BN batch normalization operation are sequentially carried out, L (x), G (x) and F (x) respectively represent the local branch output, the global branch output and the final output characteristic of the FFC module, and InvFFT (·) and FFT (·) respectively represent the inverse fast Fourier transform and the fast Fourier transform. We use 7 FFC modules to obtain 7 outputs F ₁、F₂、F₃、F₄、F₅、F₆、F_f in turn, with the input of the first module being L _f and the output of the previous module being the input of the next module, so that the final output of the FFC module is F _f.

2. Acquiring edge-enhanced fusion features

After the shallow features L ₁、L₂、L₃、L_f and the deep features F _f are obtained, the shallow features are used as jump connection features of the decoder part, and the edge information of the shallow features is enhanced and fused by using a learning guided filter module (LGF) designed by us to obtain finer edge information. The specific flow is shown in the right half of fig. 2. Specific details of our design of LGF modules are shown in fig. 4. The block LGFU (Learnable Guided Filtering Unit, LGFU) of fig. 4 is a part of a learning guided filter unit designed for us, the input contains x and y, which represent the input image and the guided image respectively, their variances and covariances can be calculated by using a common convolution with a convolution kernel of 3×3, corresponding to the parameters in formula (4), and then the final enhanced features are obtained by using the common convolution, relu activation, BN batch normalization operations. The block LGF part of fig. 4 is a learner-driven filter module designed for us, which processes with a guide filter in different ways according to the input feature class. Specifically, when a leapfrog connection feature is processed with a learnable pilot filter, the pilot filter treats the leapfrog connection feature itself as a pilot map; when processing the upsampled features, the pilot filter processes the upsampled features as a pilot pattern with the corresponding skip connection features. The corresponding skip connect feature is shown in fig. 2 as a shallow feature of the skip connection of the same LGF module that is input to the up-sampling feature. And then splicing the output characteristics processed by the leachable guide filter to obtain new characteristics, and finally up-sampling the new characteristics by using common convolution, relu activation and BN batch normalization operation to obtain the final output of the LGF module. The specific process is as follows:

Where Var_x represents the variance of the input image x, cov_xy represents the covariance of the input image and the guide image, Representing a stitching operation, L xy represents the output of the learnable guided filtering unit. The outputs of the LGF module are as follows:

In the above equations, l_x, l_xy, and M represent the pilot filter output of the skip connect feature x, the pilot filter output of the upsampling feature y, and the learner-driven pilot filter module output, respectively. The four LGF modules are used for edge enhancement of the features, the input of the first module is the jump connection feature L _f and the up-sampling feature F _f, the output obtained in sequence is M ₄、M₃、M₂、M_f, the previous output is the up-sampling feature input of the next module, and therefore the final output feature of the LGF module is M _f.

3. Small object extraction

In order to extract a small target, we input the final output feature M _f of the LGF module to the last layer of convolution layer, and the flow is as shown in fig. 2, and the convolution operation with a convolution kernel size of 1x1 and a step size of 1 is used to finally obtain a feature map with a channel number of 1, so as to obtain the small target, which comprises the following specific steps:

T＝Conv_L(M_f) (14)

t, conv _L (·) are the final output results, respectively, and the last layer of convolution operation.

The invention provides an infrared small target detection method based on learning guide filtering, which is based on edge distribution characteristics of small targets and edge protection characteristics of guide filtering, realizes edge-enhanced multi-scale target detection, increases details and edge information of targets, and improves detection performance of infrared small targets.

Claims

1. The invention mainly relates to an infrared small target detection method based on learning guided filtering, which comprises the following steps:

2. The method for detecting the infrared small target based on the learning guided filtering according to claim 1, wherein in the step 1, a central difference convolution module and a fast fourier convolution module are used for extracting shallow features and deep features respectively, and the specific steps are as follows:

3. The method for detecting infrared small targets based on learning guided filtering according to claim 1, wherein in step 2, a learning guided filtering module is designed and fusion characteristics of edge reinforcement are obtained, and the specific steps are as follows: 1) The correlation coefficient of the filter is solved based on the principle of traditional guided filtering, and the expression is as follows:

Wherein a and b represent the correlation coefficient of the filter, I, p, Respectively representing a guide image, an input image, a mean value of the guide image and a mean value of the input image, cov (I, p) representing covariance of I and p, var (I) representing variance of I, epsilon representing an L2 norm regularization coefficient;

q＝a·I+b (5)

wherein q represents the filtered output of the input image;

Where q ₁ represents the filtered output of the skip connect feature;

Where q ₂ represents the filtered output of the upsampling feature;

wherein the method comprises the steps of Representing a stitching operation, relu representing an activation operation, BN representing a batch normalization operation, M representing the output characteristics of the first learnable guided filter module, M being the upsampling characteristics of the subsequent learnable guided filter modules, the output characteristics M _f of the last module being available using a plurality of learnable guided filter modules.