CN117994573A - Infrared dim target detection method based on superpixel and deformable convolution - Google Patents
Infrared dim target detection method based on superpixel and deformable convolution Download PDFInfo
- Publication number
- CN117994573A CN117994573A CN202410073036.6A CN202410073036A CN117994573A CN 117994573 A CN117994573 A CN 117994573A CN 202410073036 A CN202410073036 A CN 202410073036A CN 117994573 A CN117994573 A CN 117994573A
- Authority
- CN
- China
- Prior art keywords
- convolution
- feature
- super
- infrared
- feature map
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 58
- 230000004927 fusion Effects 0.000 claims abstract description 30
- 238000000605 extraction Methods 0.000 claims abstract description 27
- 238000000034 method Methods 0.000 claims abstract description 26
- 238000003064 k means clustering Methods 0.000 claims abstract description 5
- 230000011218 segmentation Effects 0.000 claims abstract description 4
- 238000012545 processing Methods 0.000 claims description 9
- 238000010606 normalization Methods 0.000 claims description 8
- 238000012549 training Methods 0.000 claims description 8
- 238000005070 sampling Methods 0.000 claims description 7
- 101100502526 Caenorhabditis elegans fcp-1 gene Proteins 0.000 claims description 6
- 230000004913 activation Effects 0.000 claims description 6
- 238000004364 calculation method Methods 0.000 claims description 6
- 238000012937 correction Methods 0.000 claims description 6
- 230000008569 process Effects 0.000 claims description 5
- 101150092075 FIP1 gene Proteins 0.000 claims description 4
- 238000004088 simulation Methods 0.000 claims description 3
- 238000012360 testing method Methods 0.000 claims description 3
- 238000012795 verification Methods 0.000 claims description 3
- 230000000694 effects Effects 0.000 abstract description 4
- 238000013527 convolutional neural network Methods 0.000 description 7
- 239000000284 extract Substances 0.000 description 4
- 230000006870 function Effects 0.000 description 4
- 238000010586 diagram Methods 0.000 description 3
- 238000007781 pre-processing Methods 0.000 description 3
- 101100120137 Caenorhabditis elegans fip-3 gene Proteins 0.000 description 2
- 238000003384 imaging method Methods 0.000 description 2
- 238000003331 infrared imaging Methods 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 239000013598 vector Substances 0.000 description 2
- 230000003213 activating effect Effects 0.000 description 1
- 230000006978 adaptation Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 238000013434 data augmentation Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- -1 fip2 Proteins 0.000 description 1
- 238000010438 heat treatment Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 230000000877 morphologic effect Effects 0.000 description 1
- 238000002203 pretreatment Methods 0.000 description 1
- 238000003672 processing method Methods 0.000 description 1
- 230000005855 radiation Effects 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/762—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using clustering, e.g. of similar faces in social networks
- G06V10/763—Non-hierarchical techniques, e.g. based on statistics of modelling distributions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/80—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
- G06V10/806—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10048—Infrared image
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Computing Systems (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Multimedia (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- Computational Linguistics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biophysics (AREA)
- Molecular Biology (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Probability & Statistics with Applications (AREA)
- Image Analysis (AREA)
Abstract
An infrared weak and small target detection method based on super pixels and deformable convolution belongs to the field of target detection and comprises the steps of original image acquisition; clustering superpixel segmentation is carried out on the original image to obtain a plurality of superpixel block sequences; inputting the super pixel block sequence into a deformable convolution feature extraction backbone network to obtain a plurality of feature graphs with different scales; inputting a plurality of feature images with different scales into a multi-scale feature fusion network to obtain a plurality of fused feature images with different scales; and detecting a plurality of fused feature maps with different scales by using anchor frames, optimizing all the anchor frames by using a K-means clustering method, and extracting all areas where the infrared weak and small targets possibly exist to obtain a final detection result. The method can divide the possible region of the infrared dim target from the original image under the conditions of small infrared dim target size and low contrast relative to the background, thereby improving the detection effect of the infrared dim target and reducing the false alarm rate of the infrared dim target detection.
Description
Technical Field
The invention belongs to the technical field of target detection, and particularly relates to an infrared dim target detection method based on super pixels and deformable convolution.
Background
Object Detection (Object Detection) technology belongs to the field of computer vision, and aims to find all interested objects in images or videos, so that the Object Detection technology has wide application in the fields of traffic, video monitoring, military and the like. Conventional target detection methods mainly comprise a morphological processing method, a sliding window method and an HOG detector method, which are simple, and have relatively limited detection performance and result.
In recent years, deep learning is increasingly applied to a target detection task, RCNN networks are firstly applied to the target detection task through convolutional neural networks, feature vectors are extracted through generating a plurality of candidate areas, and then the feature vectors are input into an SVM classifier to predict probability values of objects contained in each candidate area. In the restoration experiment, the existing target detection task has the following problems: typical infrared weaknesses have several prominent features: firstly, the size of the target is smaller, the brightness is lower, and the target is easy to submerge in the background or is interfered by noise; secondly, color, texture and shape information of the target are difficult to extract, some typical methods based on the color, the shape and the texture, such as methods of moment features, outline features, local invariant feature points and the like are difficult to apply, and the distinguishability between objects is low; thirdly, due to the infrared imaging mechanism, when areas with similar infrared radiation characteristics to the targets appear in the background, the areas can be treated as false targets, and the false targets are often difficult to identify. Infrared target detection is more susceptible to interference than visible light target detection, and particularly, interference caused by heating sources with similar shapes is more serious; and fourthly, the resolution ratio of the infrared imaging sensor is low, the obtained infrared image has certain blurring, the noise interference is serious, the signal to noise ratio of the image is low, and the low signal to noise ratio can cause more error information to be doped in the characteristic extraction process.
Disclosure of Invention
Aiming at the problems of low infrared dim target detection rate and high false alarm rate of the existing target detection method, the invention provides the infrared dim target detection method based on super pixels and deformable convolution, which can divide the possible region of the infrared dim target from the original image under the conditions of small infrared dim target size and low relative background contrast, carry out target detection on a specific region, introduce a deformable convolution feature extraction backbone network to realize multi-scale feature extraction, realize infrared dim target detection more adaptively, and further can be widely applied to an actual infrared image processing system.
The technical scheme adopted by the invention for solving the technical problems is as follows:
The invention discloses an infrared dim target detection method based on super pixels and deformable convolution, which mainly comprises the following steps of:
step S1: collecting an original image;
step S2: performing clustering super-pixel segmentation on the original image to obtain a plurality of super-pixel block sequences;
step S3: inputting the super pixel block sequence into a deformable convolution feature extraction backbone network to obtain a plurality of feature graphs with different scales;
step S4: inputting a plurality of feature images with different scales into a multi-scale feature fusion network to obtain a plurality of fused feature images with different scales;
Step S5: and detecting a plurality of fusion feature images with different scales by using anchor frames, optimizing all the obtained anchor frames by using a K-means clustering method, and extracting all areas where the infrared weak and small targets possibly exist to obtain a final detection result.
Furthermore, the original image is generated through the existing infrared state scene simulation system, is manually marked, and is divided into a training set, a verification set and a test set.
Further, the specific operation flow of the step S2 is as follows:
step S2.1: sampling is carried out on a regular grid with S pixels at intervals, so that a super pixel block is obtained;
step S2.2: moving the sampling center to a position corresponding to the lowest gradient position in the 3×3 neighborhood;
Step S2.3: introducing a Euclidean distance D to calculate the nearest cluster center of each pixel, and searching similar pixels in a region 2S multiplied by 2S around the super pixel center;
Step S2.4: re-calculating the center of each cluster according to the new label of each pixel; calculating residual errors of the new clustering center and the previous clustering center by adopting an L2 norm, and comparing the residual errors with a residual error threshold value: if the residual error is larger than the residual error threshold value, repeating calculation of the Euclidean distance D and the clustering center; and if the residual error is smaller than the residual error threshold value, stopping iteration, and finally obtaining the super-pixel block sequence after super-pixel pretreatment.
Further, the deformable convolution feature extraction backbone network consists of a deformable convolution layer, a batch normalization layer, an activation function SiLu, three convolution layers and a correction linear unit ReLu; inputting the super pixel block sequence into a deformable convolution feature extraction backbone network, and sequentially processing the deformable convolution layer, the batch normalization layer, the activation function SiLu, the three convolution layers and the correction linear unit ReLu to obtain a plurality of feature maps with different scales.
Further, in the step S3, in the multi-scale feature extraction process, an offset is introduced into a convolution kernel, where the introduced offset is obtained by applying a convolution layer on the same input feature map, and the convolution kernel has the same spatial resolution and expansion as the current convolution layer; generating the offset with the same spatial resolution as the input feature map; in the training process, the convolution kernel of the output feature map and the convolution kernel for generating the offset are simultaneously learned, and the calculation formula is as follows:
Y=wpn*(xp0+xpn+Δxpn)
where Y represents the output signature, w represents the sum of the sample values, x represents the input signature, p0 is each point in the convolution lattice, pn is the offset of point p0 in the convolution kernel range, n=1, …, N.
Further, the multi-scale feature fusion network is composed of 31×1 convolution layers, 33×3 convolution layers, 3 upsampling layers, 3 fusion modules and 3 feature extraction modules; the 1X 1 convolution layer is used for changing the number of characteristic channels, the 3X 3 convolution layer is used for downsampling the characteristic images, the upsampling layer is used for converting the low-resolution characteristic images into high-resolution characteristic images, the fusion module is used for splicing and fusing the deep and shallow characteristic images with the same resolution, and the characteristic extraction module is used for extracting characteristic information from the fused characteristic images.
Further, the specific operation flow of the step S4 is as follows:
Step S4.1: the feature maps of different scales are fip, fip2, fip and fip4 respectively, the feature map fip4 is up-sampled and then multi-scale fusion is carried out on the feature map 3562 and the feature map fip, so that a feature map fcp1 is obtained;
Step S4.2: performing multiscale fusion on the feature map fcp1 and the feature map fip2 after upsampling to obtain an output feature map fop1;
step S4.3: the output feature map fop1 is up-sampled and then is subjected to multi-scale fusion with the feature map fip1, so that an output feature map fop2 is obtained;
step S4.4: and carrying out convolution processing on the output characteristic map fop2, and carrying out multi-scale fusion on the output characteristic map fop1 subjected to up-sampling to obtain an output characteristic map fop3.
The beneficial effects of the invention are as follows:
Compared with visible light images, infrared image imaging is more complex, and is more difficult to acquire under the influence of equipment and environment, so that the traditional large-target detection method cannot be used. In addition, the existing target detection based on CNN (convolutional neural network) is limited to the problems of overlarge model volume and unstable transformation mode, and the problems are caused by that a convolutional unit of a CNN module can only sample an input characteristic diagram at a fixed position, the sizes of receptive fields of all activating units in the same CNN layer are the same, and an internal mechanism for processing various geometric shapes is lacked. Conventional CNNs are not suitable for finer target detection tasks, since different locations may correspond to objects of different dimensions or shapes. Therefore, the invention provides the infrared weak and small target detection method based on the super pixel and the deformable convolution, which uses the super pixel method to divide the possible area of the infrared weak and small target, extracts the backbone network through the deformable convolution characteristic to adaptively extract the target characteristic, and uses the anchor frame-based method to detect the target, thereby greatly improving the detection capability and having better detection effect.
Drawings
FIG. 1 is a flow chart of a method for detecting infrared dim targets based on superpixels and deformable convolution according to the present invention.
Fig. 2 is a schematic diagram of a deformable convolution feature extraction backbone network.
Fig. 3 is a schematic diagram of a multi-scale feature fusion network.
Detailed Description
The invention provides an infrared dim target detection method based on super pixels and deformable convolution, which mainly comprises two parts, namely a first part: using a super-pixel algorithm to divide a region where the infrared weak and small target possibly exists; a second part: and extracting the characteristics of the original image subjected to the super-pixel pretreatment by using a deformable convolution characteristic extraction backbone network, generating a characteristic map with multiple scales, processing the characteristic map with the multi-scale characteristic fusion network, outputting the characteristic map to a target detection network, and obtaining a final detection result by using the target detection network.
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings. The embodiments described below by referring to the drawings are illustrative only and are not to be construed as limiting the invention.
The invention provides an infrared dim target detection method based on super pixels and deformable convolution, which specifically comprises the following steps as shown in fig. 1:
step S1: collecting an original image;
The infrared video data selected by the invention is generated by an infrared state scene simulation system, the training set comprises 20 infrared videos, each infrared video comprises 100 original images, the images are manually marked, and the training set, the verification set and the test set are divided.
Step S2: pre-processing super pixels;
And respectively carrying out simple clustering super-pixel segmentation on the original images to obtain a plurality of super-pixel block sequences. Through super-pixel preprocessing of the input original image, the learning capacity of the background can be enhanced, and meanwhile, the effect of data augmentation is achieved. The specific operation steps are as follows:
Step S2.1: sampling is carried out on a regular grid with S pixels at intervals, so that approximately equal super-pixel blocks are obtained;
Step S2.2: moving the sampling center to a position corresponding to the lowest gradient position in the 3 x 3 neighborhood, avoiding positioning the superpixel on the edge, and reducing the chance of infrared weak and small objects and noise affecting the superpixel result;
Step S2.3: introducing a Euclidean distance D to calculate the nearest cluster center of each pixel, and searching similar pixels in a region 2S multiplied by 2S around the super pixel center;
Step S2.4: re-calculating the center of each cluster according to the new label of each pixel; calculating residual errors of the new clustering center and the previous clustering center by adopting an L2 norm, and comparing the residual errors with a residual error threshold value: if the residual error is larger than the residual error threshold value, repeating calculation of the Euclidean distance D and the clustering center; and if the residual error is smaller than the residual error threshold value, stopping iteration, and finally obtaining the super-pixel block sequence after super-pixel pretreatment.
Step S3: further, inputting the super pixel block sequence into a deformable convolution feature extraction backbone network to obtain a plurality of feature graphs with different scales. By using the deformable convolution characteristic to extract the backbone network, the target detection can be realized more adaptively, and the detection probability of the infrared weak and small target image can be further improved.
The specific operation steps are as follows:
Step S3.1: inputting the super-pixel block sequence into a deformable convolution feature extraction backbone network, and extracting the multi-scale features of the super-pixel block sequence.
The adopted deformable convolution feature extraction backbone network is shown in fig. 2, and consists of four stages, wherein each stage consists of a deformable convolution layer DConv, a batch normalization layer BN (Batch Normalization), an activation function SiLu (Sigmoid Linear Unit), three convolution layers Conv and a correction linear unit ReLu (RECTIFIED LINEAR unit). The input original image is subjected to super-pixel preprocessing to obtain a super-pixel block sequence, and the super-pixel block sequence is input into a deformable convolution feature extraction backbone network, and is sequentially subjected to processing of a deformable convolution layer DConv, a batch normalization layer BN (Batch Normalization), an activation function SiLu (Sigmoid Linear Unit), three convolution layers Conv and a correction linear unit ReLu (Rectified linearunit) to obtain a feature map; finally, the backbone network can be extracted through the deformable convolution feature extraction, so that feature graphs { fip1, fip2, fip3, fip4, i=i-N, & i, & i+n }, with four different scales, can be extracted. The deformable convolution feature extraction backbone network adopted by the invention uses the deformable convolution to replace a common convolution kernel, so that the target feature is extracted more adaptively.
Step S3.2: in the process of multi-scale feature extraction, the invention introduces an offset into a convolution kernel, the introduced offset is obtained by applying a convolution layer on the same input feature map, and the convolution kernel has the same spatial resolution and expansion as the current convolution layer. The generated offset has the same spatial resolution as the input feature map. In the training process, the convolution kernel of the output feature map and the convolution kernel for generating the offset are simultaneously learned, and the calculation formula is as follows:
Y=wpn*(xp0+xpn+Δxpn)
where Y represents the output signature, w represents the sum of the sample values, x represents the input signature, p0 is each point in the convolution lattice, pn is the offset of point p0 in the convolution kernel range, n=1, …, N.
Step S4: further, the obtained four feature images with different scales are input into a multi-scale feature fusion network to obtain three fused feature images with different scales.
The adopted multi-scale feature fusion network is shown in fig. 3, and mainly comprises 31×1 convolution layers Conv, 33×3 convolution layers Conv, 3 upsampling layers upsample, 3 fusion modules and 3 feature extraction modules. The 1×1 convolution layer Conv is used for changing the number of feature channels, the 3×3 convolution layer Conv is used for downsampling the feature images, the upsampling layer upsample is used for converting the low-resolution feature images into high-resolution feature images, the fusion module is used for splicing and fusing the deep and shallow feature images with the same resolution, and the feature extraction module is used for extracting feature information from the fused feature images. Because the high-resolution low-level feature map contains more detailed structure and texture information, and the low-resolution high-level feature map contains rich target semantic information, the depth feature fusion strategy provided by the invention combines the structure and texture information with the semantic information for infrared dim target detection, and can solve the problem of detection missing under a complex scene.
The specific operation steps are as follows:
Step S4.1: up-sampling the feature map fip, and then performing multi-scale fusion with the feature map fip3 to obtain a feature map fcp1;
Step S4.2: performing multiscale fusion on the feature map fcp1 and the feature map fip2 after upsampling to obtain an output feature map fop1;
step S4.3: the output feature map fop1 is up-sampled and then is subjected to multi-scale fusion with the feature map fip1, so that an output feature map fop2 is obtained;
step S4.4: and carrying out convolution processing on the output characteristic map fop2, and carrying out multi-scale fusion on the output characteristic map fop1 subjected to up-sampling to obtain an output characteristic map fop3.
Thus, three output feature maps fop1, fop2, and fop3 can be obtained by step S4.
Step S5: and detecting the three fused feature images with different scales by using the anchor frames, optimizing all the obtained anchor frames by using a K-means clustering method, and extracting all areas where the infrared weak and small targets possibly exist to obtain a final detection result.
The specific operation steps are as follows:
Step S5.1: because the larger the feature map size, the smaller the receptive field, and standard convolutional neural networks have difficulty detecting infrared weak targets because these targets are typically much smaller in size than the general targets. To solve this problem, output feature maps fop1, fop2 and fop3 with sizes of 40×40, 80×80 and 160×160, respectively, are input into a target detection network (specifically, a detection head network of YOLO may be selected, but not limited thereto), and feature maps with these dimensions are suitable for detection of most infrared weak targets. The lowest-layer feature map l 3 is fused with other upper-layer feature maps l 1、l2 to obtain a feature map with the size of 160×160, and the feature map is more sensitive to smaller infrared targets, so that the method provided by the invention can detect infrared weak small targets and extreme targets from an input original image.
Step S5.2: and optimizing all the obtained anchor frames by using a K-means clustering method to obtain a final detection result and outputting a target mark image. The training stage of the infrared dim target detection method based on the superpixel and the deformable convolution is completed.
In summary, according to the infrared dim target detection method based on superpixel and deformable convolution provided by the invention, compared with a visible light image, infrared image imaging is more complex, and is affected by equipment and environment, so that the acquisition difficulty is higher, and therefore, the infrared dim target cannot be detected by using the traditional large target detection method. In order to solve the detection problem of the infrared weak and small targets, the invention provides the infrared weak and small target detection method based on super pixels and deformable convolution, which uses a super pixel pretreatment method to segment the possible areas of the infrared weak and small targets, extracts backbone network through deformable convolution characteristics to adaptively extract the target characteristics, and uses an anchor frame-based method to detect the targets, thereby greatly improving the detection capability of the network to the infrared weak and small targets, obtaining better detection effect than the existing target detection method, improving the infrared weak and small target detection rate and reducing the detection false alarm rate.
The foregoing is merely a preferred embodiment of the present invention and it should be noted that modifications and adaptations to those skilled in the art may be made without departing from the principles of the present invention, which are intended to be comprehended within the scope of the present invention.
Claims (7)
1. The infrared dim target detection method based on super pixels and deformable convolution is characterized by comprising the following steps of:
step S1: collecting an original image;
step S2: performing clustering super-pixel segmentation on the original image to obtain a plurality of super-pixel block sequences;
step S3: inputting the super pixel block sequence into a deformable convolution feature extraction backbone network to obtain a plurality of feature graphs with different scales;
step S4: inputting a plurality of feature images with different scales into a multi-scale feature fusion network to obtain a plurality of fused feature images with different scales;
Step S5: and detecting a plurality of fusion feature images with different scales by using anchor frames, optimizing all the obtained anchor frames by using a K-means clustering method, and extracting all areas where the infrared weak and small targets possibly exist to obtain a final detection result.
2. The method for detecting the infrared small target based on the super pixels and the deformable convolution according to claim 1, wherein the original image is generated through an existing infrared state scene simulation system, and the original image is manually marked and divided into a training set, a verification set and a test set.
3. The method for detecting infrared small targets based on super pixels and deformable convolution according to claim 1, wherein the specific operation flow of the step S2 is as follows:
step S2.1: sampling is carried out on a regular grid with S pixels at intervals, so that a super pixel block is obtained;
step S2.2: moving the sampling center to a position corresponding to the lowest gradient position in the 3×3 neighborhood;
Step S2.3: introducing a Euclidean distance D to calculate the nearest cluster center of each pixel, and searching similar pixels in a region 2S multiplied by 2S around the super pixel center;
Step S2.4: re-calculating the center of each cluster according to the new label of each pixel; calculating residual errors of the new clustering center and the previous clustering center by adopting an L2 norm, and comparing the residual errors with a residual error threshold value: if the residual error is larger than the residual error threshold value, repeating calculation of the Euclidean distance D and the clustering center; and if the residual error is smaller than the residual error threshold value, stopping iteration, and finally obtaining the super-pixel block sequence after super-pixel pretreatment.
4. The method for detecting the infrared small target based on the super-pixel and the deformable convolution according to claim 1, wherein the deformable convolution feature extraction backbone network consists of a deformable convolution layer, a batch normalization layer, an activation function SiLu, three convolution layers and a correction linear unit ReLu; inputting the super pixel block sequence into a deformable convolution feature extraction backbone network, and sequentially processing the deformable convolution layer, the batch normalization layer, the activation function SiLu, the three convolution layers and the correction linear unit ReLu to obtain a plurality of feature maps with different scales.
5. The method for detecting infrared small targets based on super-pixels and deformable convolution according to claim 1, wherein in the step S3, an offset is introduced into a convolution kernel in the process of multi-scale feature extraction, the introduced offset is obtained by applying a convolution layer on the same input feature map, and the convolution kernel has the same spatial resolution and expansion as the current convolution layer; generating the offset with the same spatial resolution as the input feature map; in the training process, the convolution kernel of the output feature map and the convolution kernel for generating the offset are simultaneously learned, and the calculation formula is as follows:
Y=wpn*(xp0+xpn+Δxpn)
where Y represents the output signature, w represents the sum of the sample values, x represents the input signature, p0 is each point in the convolution lattice, pn is the offset of point p0 in the convolution kernel range, n=1, …, N.
6. The method for detecting infrared small targets based on super pixels and deformable convolution according to claim 1, wherein the multi-scale feature fusion network consists of 31×1 convolution layers, 33×3 convolution layers, 3 upsampling layers, 3 fusion modules and 3 feature extraction modules; the 1X 1 convolution layer is used for changing the number of characteristic channels, the 3X 3 convolution layer is used for downsampling the characteristic images, the upsampling layer is used for converting the low-resolution characteristic images into high-resolution characteristic images, the fusion module is used for splicing and fusing the deep and shallow characteristic images with the same resolution, and the characteristic extraction module is used for extracting characteristic information from the fused characteristic images.
7. The method for detecting infrared small targets based on super pixels and deformable convolution according to claim 1, wherein the specific operation flow of the step S4 is as follows:
Step S4.1: the feature maps of different scales are fip, fip2, fip and fip4 respectively, the feature map fip4 is up-sampled and then multi-scale fusion is carried out on the feature map 3562 and the feature map fip, so that a feature map fcp1 is obtained;
Step S4.2: performing multiscale fusion on the feature map fcp1 and the feature map fip2 after upsampling to obtain an output feature map fop1;
step S4.3: the output feature map fop1 is up-sampled and then is subjected to multi-scale fusion with the feature map fip1, so that an output feature map fop2 is obtained;
step S4.4: and carrying out convolution processing on the output characteristic map fop2, and carrying out multi-scale fusion on the output characteristic map fop1 subjected to up-sampling to obtain an output characteristic map fop3.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410073036.6A CN117994573A (en) | 2024-01-18 | 2024-01-18 | Infrared dim target detection method based on superpixel and deformable convolution |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410073036.6A CN117994573A (en) | 2024-01-18 | 2024-01-18 | Infrared dim target detection method based on superpixel and deformable convolution |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117994573A true CN117994573A (en) | 2024-05-07 |
Family
ID=90890476
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202410073036.6A Pending CN117994573A (en) | 2024-01-18 | 2024-01-18 | Infrared dim target detection method based on superpixel and deformable convolution |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117994573A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118570493A (en) * | 2024-08-05 | 2024-08-30 | 北京航空航天大学 | Feature enhancement method for weak and small targets in infrared image |
-
2024
- 2024-01-18 CN CN202410073036.6A patent/CN117994573A/en active Pending
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118570493A (en) * | 2024-08-05 | 2024-08-30 | 北京航空航天大学 | Feature enhancement method for weak and small targets in infrared image |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Zhang et al. | CAD-Net: A context-aware detection network for objects in remote sensing imagery | |
CN113052210B (en) | Rapid low-light target detection method based on convolutional neural network | |
CN110135366B (en) | Shielded pedestrian re-identification method based on multi-scale generation countermeasure network | |
CN109344701B (en) | Kinect-based dynamic gesture recognition method | |
Wang et al. | Small-object detection based on yolo and dense block via image super-resolution | |
CN115063573B (en) | Multi-scale target detection method based on attention mechanism | |
CN110782420A (en) | Small target feature representation enhancement method based on deep learning | |
CN113591968A (en) | Infrared weak and small target detection method based on asymmetric attention feature fusion | |
CN111666842B (en) | Shadow detection method based on double-current-cavity convolution neural network | |
CN110929593A (en) | Real-time significance pedestrian detection method based on detail distinguishing and distinguishing | |
Cho et al. | Semantic segmentation with low light images by modified CycleGAN-based image enhancement | |
CN113159158B (en) | License plate correction and reconstruction method and system based on generation countermeasure network | |
CN113888461A (en) | Method, system and equipment for detecting defects of hardware parts based on deep learning | |
CN116342894B (en) | GIS infrared feature recognition system and method based on improved YOLOv5 | |
CN117994573A (en) | Infrared dim target detection method based on superpixel and deformable convolution | |
Zhao et al. | Research on detection method for the leakage of underwater pipeline by YOLOv3 | |
CN114299383A (en) | Remote sensing image target detection method based on integration of density map and attention mechanism | |
CN111507184A (en) | Human body posture detection method based on parallel cavity convolution and body structure constraint | |
CN115861756A (en) | Earth background small target identification method based on cascade combination network | |
CN116363535A (en) | Ship detection method in unmanned aerial vehicle aerial image based on convolutional neural network | |
CN117557774A (en) | Unmanned aerial vehicle image small target detection method based on improved YOLOv8 | |
CN117292117A (en) | Small target detection method based on attention mechanism | |
CN116168240A (en) | Arbitrary-direction dense ship target detection method based on attention enhancement | |
Liu et al. | SLPR: A deep learning based Chinese ship license plate recognition framework | |
Cho et al. | Modified perceptual cycle generative adversarial network-based image enhancement for improving accuracy of low light image segmentation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |