Nothing Special   »   [go: up one dir, main page]

CN117994573A - Infrared dim target detection method based on superpixel and deformable convolution - Google Patents

Infrared dim target detection method based on superpixel and deformable convolution Download PDF

Info

Publication number
CN117994573A
CN117994573A CN202410073036.6A CN202410073036A CN117994573A CN 117994573 A CN117994573 A CN 117994573A CN 202410073036 A CN202410073036 A CN 202410073036A CN 117994573 A CN117994573 A CN 117994573A
Authority
CN
China
Prior art keywords
convolution
feature
super
infrared
feature map
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202410073036.6A
Other languages
Chinese (zh)
Inventor
赵岩
王生杰
郑裕隆
黄艳金
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Forestry Star Beijing Technology Information Co ltd
Original Assignee
China Forestry Star Beijing Technology Information Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Forestry Star Beijing Technology Information Co ltd filed Critical China Forestry Star Beijing Technology Information Co ltd
Priority to CN202410073036.6A priority Critical patent/CN117994573A/en
Publication of CN117994573A publication Critical patent/CN117994573A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/762Arrangements for image or video recognition or understanding using pattern recognition or machine learning using clustering, e.g. of similar faces in social networks
    • G06V10/763Non-hierarchical techniques, e.g. based on statistics of modelling distributions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10048Infrared image

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Probability & Statistics with Applications (AREA)
  • Image Analysis (AREA)

Abstract

An infrared weak and small target detection method based on super pixels and deformable convolution belongs to the field of target detection and comprises the steps of original image acquisition; clustering superpixel segmentation is carried out on the original image to obtain a plurality of superpixel block sequences; inputting the super pixel block sequence into a deformable convolution feature extraction backbone network to obtain a plurality of feature graphs with different scales; inputting a plurality of feature images with different scales into a multi-scale feature fusion network to obtain a plurality of fused feature images with different scales; and detecting a plurality of fused feature maps with different scales by using anchor frames, optimizing all the anchor frames by using a K-means clustering method, and extracting all areas where the infrared weak and small targets possibly exist to obtain a final detection result. The method can divide the possible region of the infrared dim target from the original image under the conditions of small infrared dim target size and low contrast relative to the background, thereby improving the detection effect of the infrared dim target and reducing the false alarm rate of the infrared dim target detection.

Description

Infrared dim target detection method based on superpixel and deformable convolution
Technical Field
The invention belongs to the technical field of target detection, and particularly relates to an infrared dim target detection method based on super pixels and deformable convolution.
Background
Object Detection (Object Detection) technology belongs to the field of computer vision, and aims to find all interested objects in images or videos, so that the Object Detection technology has wide application in the fields of traffic, video monitoring, military and the like. Conventional target detection methods mainly comprise a morphological processing method, a sliding window method and an HOG detector method, which are simple, and have relatively limited detection performance and result.
In recent years, deep learning is increasingly applied to a target detection task, RCNN networks are firstly applied to the target detection task through convolutional neural networks, feature vectors are extracted through generating a plurality of candidate areas, and then the feature vectors are input into an SVM classifier to predict probability values of objects contained in each candidate area. In the restoration experiment, the existing target detection task has the following problems: typical infrared weaknesses have several prominent features: firstly, the size of the target is smaller, the brightness is lower, and the target is easy to submerge in the background or is interfered by noise; secondly, color, texture and shape information of the target are difficult to extract, some typical methods based on the color, the shape and the texture, such as methods of moment features, outline features, local invariant feature points and the like are difficult to apply, and the distinguishability between objects is low; thirdly, due to the infrared imaging mechanism, when areas with similar infrared radiation characteristics to the targets appear in the background, the areas can be treated as false targets, and the false targets are often difficult to identify. Infrared target detection is more susceptible to interference than visible light target detection, and particularly, interference caused by heating sources with similar shapes is more serious; and fourthly, the resolution ratio of the infrared imaging sensor is low, the obtained infrared image has certain blurring, the noise interference is serious, the signal to noise ratio of the image is low, and the low signal to noise ratio can cause more error information to be doped in the characteristic extraction process.
Disclosure of Invention
Aiming at the problems of low infrared dim target detection rate and high false alarm rate of the existing target detection method, the invention provides the infrared dim target detection method based on super pixels and deformable convolution, which can divide the possible region of the infrared dim target from the original image under the conditions of small infrared dim target size and low relative background contrast, carry out target detection on a specific region, introduce a deformable convolution feature extraction backbone network to realize multi-scale feature extraction, realize infrared dim target detection more adaptively, and further can be widely applied to an actual infrared image processing system.
The technical scheme adopted by the invention for solving the technical problems is as follows:
The invention discloses an infrared dim target detection method based on super pixels and deformable convolution, which mainly comprises the following steps of:
step S1: collecting an original image;
step S2: performing clustering super-pixel segmentation on the original image to obtain a plurality of super-pixel block sequences;
step S3: inputting the super pixel block sequence into a deformable convolution feature extraction backbone network to obtain a plurality of feature graphs with different scales;
step S4: inputting a plurality of feature images with different scales into a multi-scale feature fusion network to obtain a plurality of fused feature images with different scales;
Step S5: and detecting a plurality of fusion feature images with different scales by using anchor frames, optimizing all the obtained anchor frames by using a K-means clustering method, and extracting all areas where the infrared weak and small targets possibly exist to obtain a final detection result.
Furthermore, the original image is generated through the existing infrared state scene simulation system, is manually marked, and is divided into a training set, a verification set and a test set.
Further, the specific operation flow of the step S2 is as follows:
step S2.1: sampling is carried out on a regular grid with S pixels at intervals, so that a super pixel block is obtained;
step S2.2: moving the sampling center to a position corresponding to the lowest gradient position in the 3×3 neighborhood;
Step S2.3: introducing a Euclidean distance D to calculate the nearest cluster center of each pixel, and searching similar pixels in a region 2S multiplied by 2S around the super pixel center;
Step S2.4: re-calculating the center of each cluster according to the new label of each pixel; calculating residual errors of the new clustering center and the previous clustering center by adopting an L2 norm, and comparing the residual errors with a residual error threshold value: if the residual error is larger than the residual error threshold value, repeating calculation of the Euclidean distance D and the clustering center; and if the residual error is smaller than the residual error threshold value, stopping iteration, and finally obtaining the super-pixel block sequence after super-pixel pretreatment.
Further, the deformable convolution feature extraction backbone network consists of a deformable convolution layer, a batch normalization layer, an activation function SiLu, three convolution layers and a correction linear unit ReLu; inputting the super pixel block sequence into a deformable convolution feature extraction backbone network, and sequentially processing the deformable convolution layer, the batch normalization layer, the activation function SiLu, the three convolution layers and the correction linear unit ReLu to obtain a plurality of feature maps with different scales.
Further, in the step S3, in the multi-scale feature extraction process, an offset is introduced into a convolution kernel, where the introduced offset is obtained by applying a convolution layer on the same input feature map, and the convolution kernel has the same spatial resolution and expansion as the current convolution layer; generating the offset with the same spatial resolution as the input feature map; in the training process, the convolution kernel of the output feature map and the convolution kernel for generating the offset are simultaneously learned, and the calculation formula is as follows:
Y=wpn*(xp0+xpn+Δxpn)
where Y represents the output signature, w represents the sum of the sample values, x represents the input signature, p0 is each point in the convolution lattice, pn is the offset of point p0 in the convolution kernel range, n=1, …, N.
Further, the multi-scale feature fusion network is composed of 31×1 convolution layers, 33×3 convolution layers, 3 upsampling layers, 3 fusion modules and 3 feature extraction modules; the 1X 1 convolution layer is used for changing the number of characteristic channels, the 3X 3 convolution layer is used for downsampling the characteristic images, the upsampling layer is used for converting the low-resolution characteristic images into high-resolution characteristic images, the fusion module is used for splicing and fusing the deep and shallow characteristic images with the same resolution, and the characteristic extraction module is used for extracting characteristic information from the fused characteristic images.
Further, the specific operation flow of the step S4 is as follows:
Step S4.1: the feature maps of different scales are fip, fip2, fip and fip4 respectively, the feature map fip4 is up-sampled and then multi-scale fusion is carried out on the feature map 3562 and the feature map fip, so that a feature map fcp1 is obtained;
Step S4.2: performing multiscale fusion on the feature map fcp1 and the feature map fip2 after upsampling to obtain an output feature map fop1;
step S4.3: the output feature map fop1 is up-sampled and then is subjected to multi-scale fusion with the feature map fip1, so that an output feature map fop2 is obtained;
step S4.4: and carrying out convolution processing on the output characteristic map fop2, and carrying out multi-scale fusion on the output characteristic map fop1 subjected to up-sampling to obtain an output characteristic map fop3.
The beneficial effects of the invention are as follows:
Compared with visible light images, infrared image imaging is more complex, and is more difficult to acquire under the influence of equipment and environment, so that the traditional large-target detection method cannot be used. In addition, the existing target detection based on CNN (convolutional neural network) is limited to the problems of overlarge model volume and unstable transformation mode, and the problems are caused by that a convolutional unit of a CNN module can only sample an input characteristic diagram at a fixed position, the sizes of receptive fields of all activating units in the same CNN layer are the same, and an internal mechanism for processing various geometric shapes is lacked. Conventional CNNs are not suitable for finer target detection tasks, since different locations may correspond to objects of different dimensions or shapes. Therefore, the invention provides the infrared weak and small target detection method based on the super pixel and the deformable convolution, which uses the super pixel method to divide the possible area of the infrared weak and small target, extracts the backbone network through the deformable convolution characteristic to adaptively extract the target characteristic, and uses the anchor frame-based method to detect the target, thereby greatly improving the detection capability and having better detection effect.
Drawings
FIG. 1 is a flow chart of a method for detecting infrared dim targets based on superpixels and deformable convolution according to the present invention.
Fig. 2 is a schematic diagram of a deformable convolution feature extraction backbone network.
Fig. 3 is a schematic diagram of a multi-scale feature fusion network.
Detailed Description
The invention provides an infrared dim target detection method based on super pixels and deformable convolution, which mainly comprises two parts, namely a first part: using a super-pixel algorithm to divide a region where the infrared weak and small target possibly exists; a second part: and extracting the characteristics of the original image subjected to the super-pixel pretreatment by using a deformable convolution characteristic extraction backbone network, generating a characteristic map with multiple scales, processing the characteristic map with the multi-scale characteristic fusion network, outputting the characteristic map to a target detection network, and obtaining a final detection result by using the target detection network.
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings. The embodiments described below by referring to the drawings are illustrative only and are not to be construed as limiting the invention.
The invention provides an infrared dim target detection method based on super pixels and deformable convolution, which specifically comprises the following steps as shown in fig. 1:
step S1: collecting an original image;
The infrared video data selected by the invention is generated by an infrared state scene simulation system, the training set comprises 20 infrared videos, each infrared video comprises 100 original images, the images are manually marked, and the training set, the verification set and the test set are divided.
Step S2: pre-processing super pixels;
And respectively carrying out simple clustering super-pixel segmentation on the original images to obtain a plurality of super-pixel block sequences. Through super-pixel preprocessing of the input original image, the learning capacity of the background can be enhanced, and meanwhile, the effect of data augmentation is achieved. The specific operation steps are as follows:
Step S2.1: sampling is carried out on a regular grid with S pixels at intervals, so that approximately equal super-pixel blocks are obtained;
Step S2.2: moving the sampling center to a position corresponding to the lowest gradient position in the 3 x 3 neighborhood, avoiding positioning the superpixel on the edge, and reducing the chance of infrared weak and small objects and noise affecting the superpixel result;
Step S2.3: introducing a Euclidean distance D to calculate the nearest cluster center of each pixel, and searching similar pixels in a region 2S multiplied by 2S around the super pixel center;
Step S2.4: re-calculating the center of each cluster according to the new label of each pixel; calculating residual errors of the new clustering center and the previous clustering center by adopting an L2 norm, and comparing the residual errors with a residual error threshold value: if the residual error is larger than the residual error threshold value, repeating calculation of the Euclidean distance D and the clustering center; and if the residual error is smaller than the residual error threshold value, stopping iteration, and finally obtaining the super-pixel block sequence after super-pixel pretreatment.
Step S3: further, inputting the super pixel block sequence into a deformable convolution feature extraction backbone network to obtain a plurality of feature graphs with different scales. By using the deformable convolution characteristic to extract the backbone network, the target detection can be realized more adaptively, and the detection probability of the infrared weak and small target image can be further improved.
The specific operation steps are as follows:
Step S3.1: inputting the super-pixel block sequence into a deformable convolution feature extraction backbone network, and extracting the multi-scale features of the super-pixel block sequence.
The adopted deformable convolution feature extraction backbone network is shown in fig. 2, and consists of four stages, wherein each stage consists of a deformable convolution layer DConv, a batch normalization layer BN (Batch Normalization), an activation function SiLu (Sigmoid Linear Unit), three convolution layers Conv and a correction linear unit ReLu (RECTIFIED LINEAR unit). The input original image is subjected to super-pixel preprocessing to obtain a super-pixel block sequence, and the super-pixel block sequence is input into a deformable convolution feature extraction backbone network, and is sequentially subjected to processing of a deformable convolution layer DConv, a batch normalization layer BN (Batch Normalization), an activation function SiLu (Sigmoid Linear Unit), three convolution layers Conv and a correction linear unit ReLu (Rectified linearunit) to obtain a feature map; finally, the backbone network can be extracted through the deformable convolution feature extraction, so that feature graphs { fip1, fip2, fip3, fip4, i=i-N, & i, & i+n }, with four different scales, can be extracted. The deformable convolution feature extraction backbone network adopted by the invention uses the deformable convolution to replace a common convolution kernel, so that the target feature is extracted more adaptively.
Step S3.2: in the process of multi-scale feature extraction, the invention introduces an offset into a convolution kernel, the introduced offset is obtained by applying a convolution layer on the same input feature map, and the convolution kernel has the same spatial resolution and expansion as the current convolution layer. The generated offset has the same spatial resolution as the input feature map. In the training process, the convolution kernel of the output feature map and the convolution kernel for generating the offset are simultaneously learned, and the calculation formula is as follows:
Y=wpn*(xp0+xpn+Δxpn)
where Y represents the output signature, w represents the sum of the sample values, x represents the input signature, p0 is each point in the convolution lattice, pn is the offset of point p0 in the convolution kernel range, n=1, …, N.
Step S4: further, the obtained four feature images with different scales are input into a multi-scale feature fusion network to obtain three fused feature images with different scales.
The adopted multi-scale feature fusion network is shown in fig. 3, and mainly comprises 31×1 convolution layers Conv, 33×3 convolution layers Conv, 3 upsampling layers upsample, 3 fusion modules and 3 feature extraction modules. The 1×1 convolution layer Conv is used for changing the number of feature channels, the 3×3 convolution layer Conv is used for downsampling the feature images, the upsampling layer upsample is used for converting the low-resolution feature images into high-resolution feature images, the fusion module is used for splicing and fusing the deep and shallow feature images with the same resolution, and the feature extraction module is used for extracting feature information from the fused feature images. Because the high-resolution low-level feature map contains more detailed structure and texture information, and the low-resolution high-level feature map contains rich target semantic information, the depth feature fusion strategy provided by the invention combines the structure and texture information with the semantic information for infrared dim target detection, and can solve the problem of detection missing under a complex scene.
The specific operation steps are as follows:
Step S4.1: up-sampling the feature map fip, and then performing multi-scale fusion with the feature map fip3 to obtain a feature map fcp1;
Step S4.2: performing multiscale fusion on the feature map fcp1 and the feature map fip2 after upsampling to obtain an output feature map fop1;
step S4.3: the output feature map fop1 is up-sampled and then is subjected to multi-scale fusion with the feature map fip1, so that an output feature map fop2 is obtained;
step S4.4: and carrying out convolution processing on the output characteristic map fop2, and carrying out multi-scale fusion on the output characteristic map fop1 subjected to up-sampling to obtain an output characteristic map fop3.
Thus, three output feature maps fop1, fop2, and fop3 can be obtained by step S4.
Step S5: and detecting the three fused feature images with different scales by using the anchor frames, optimizing all the obtained anchor frames by using a K-means clustering method, and extracting all areas where the infrared weak and small targets possibly exist to obtain a final detection result.
The specific operation steps are as follows:
Step S5.1: because the larger the feature map size, the smaller the receptive field, and standard convolutional neural networks have difficulty detecting infrared weak targets because these targets are typically much smaller in size than the general targets. To solve this problem, output feature maps fop1, fop2 and fop3 with sizes of 40×40, 80×80 and 160×160, respectively, are input into a target detection network (specifically, a detection head network of YOLO may be selected, but not limited thereto), and feature maps with these dimensions are suitable for detection of most infrared weak targets. The lowest-layer feature map l 3 is fused with other upper-layer feature maps l 1、l2 to obtain a feature map with the size of 160×160, and the feature map is more sensitive to smaller infrared targets, so that the method provided by the invention can detect infrared weak small targets and extreme targets from an input original image.
Step S5.2: and optimizing all the obtained anchor frames by using a K-means clustering method to obtain a final detection result and outputting a target mark image. The training stage of the infrared dim target detection method based on the superpixel and the deformable convolution is completed.
In summary, according to the infrared dim target detection method based on superpixel and deformable convolution provided by the invention, compared with a visible light image, infrared image imaging is more complex, and is affected by equipment and environment, so that the acquisition difficulty is higher, and therefore, the infrared dim target cannot be detected by using the traditional large target detection method. In order to solve the detection problem of the infrared weak and small targets, the invention provides the infrared weak and small target detection method based on super pixels and deformable convolution, which uses a super pixel pretreatment method to segment the possible areas of the infrared weak and small targets, extracts backbone network through deformable convolution characteristics to adaptively extract the target characteristics, and uses an anchor frame-based method to detect the targets, thereby greatly improving the detection capability of the network to the infrared weak and small targets, obtaining better detection effect than the existing target detection method, improving the infrared weak and small target detection rate and reducing the detection false alarm rate.
The foregoing is merely a preferred embodiment of the present invention and it should be noted that modifications and adaptations to those skilled in the art may be made without departing from the principles of the present invention, which are intended to be comprehended within the scope of the present invention.

Claims (7)

1. The infrared dim target detection method based on super pixels and deformable convolution is characterized by comprising the following steps of:
step S1: collecting an original image;
step S2: performing clustering super-pixel segmentation on the original image to obtain a plurality of super-pixel block sequences;
step S3: inputting the super pixel block sequence into a deformable convolution feature extraction backbone network to obtain a plurality of feature graphs with different scales;
step S4: inputting a plurality of feature images with different scales into a multi-scale feature fusion network to obtain a plurality of fused feature images with different scales;
Step S5: and detecting a plurality of fusion feature images with different scales by using anchor frames, optimizing all the obtained anchor frames by using a K-means clustering method, and extracting all areas where the infrared weak and small targets possibly exist to obtain a final detection result.
2. The method for detecting the infrared small target based on the super pixels and the deformable convolution according to claim 1, wherein the original image is generated through an existing infrared state scene simulation system, and the original image is manually marked and divided into a training set, a verification set and a test set.
3. The method for detecting infrared small targets based on super pixels and deformable convolution according to claim 1, wherein the specific operation flow of the step S2 is as follows:
step S2.1: sampling is carried out on a regular grid with S pixels at intervals, so that a super pixel block is obtained;
step S2.2: moving the sampling center to a position corresponding to the lowest gradient position in the 3×3 neighborhood;
Step S2.3: introducing a Euclidean distance D to calculate the nearest cluster center of each pixel, and searching similar pixels in a region 2S multiplied by 2S around the super pixel center;
Step S2.4: re-calculating the center of each cluster according to the new label of each pixel; calculating residual errors of the new clustering center and the previous clustering center by adopting an L2 norm, and comparing the residual errors with a residual error threshold value: if the residual error is larger than the residual error threshold value, repeating calculation of the Euclidean distance D and the clustering center; and if the residual error is smaller than the residual error threshold value, stopping iteration, and finally obtaining the super-pixel block sequence after super-pixel pretreatment.
4. The method for detecting the infrared small target based on the super-pixel and the deformable convolution according to claim 1, wherein the deformable convolution feature extraction backbone network consists of a deformable convolution layer, a batch normalization layer, an activation function SiLu, three convolution layers and a correction linear unit ReLu; inputting the super pixel block sequence into a deformable convolution feature extraction backbone network, and sequentially processing the deformable convolution layer, the batch normalization layer, the activation function SiLu, the three convolution layers and the correction linear unit ReLu to obtain a plurality of feature maps with different scales.
5. The method for detecting infrared small targets based on super-pixels and deformable convolution according to claim 1, wherein in the step S3, an offset is introduced into a convolution kernel in the process of multi-scale feature extraction, the introduced offset is obtained by applying a convolution layer on the same input feature map, and the convolution kernel has the same spatial resolution and expansion as the current convolution layer; generating the offset with the same spatial resolution as the input feature map; in the training process, the convolution kernel of the output feature map and the convolution kernel for generating the offset are simultaneously learned, and the calculation formula is as follows:
Y=wpn*(xp0+xpn+Δxpn)
where Y represents the output signature, w represents the sum of the sample values, x represents the input signature, p0 is each point in the convolution lattice, pn is the offset of point p0 in the convolution kernel range, n=1, …, N.
6. The method for detecting infrared small targets based on super pixels and deformable convolution according to claim 1, wherein the multi-scale feature fusion network consists of 31×1 convolution layers, 33×3 convolution layers, 3 upsampling layers, 3 fusion modules and 3 feature extraction modules; the 1X 1 convolution layer is used for changing the number of characteristic channels, the 3X 3 convolution layer is used for downsampling the characteristic images, the upsampling layer is used for converting the low-resolution characteristic images into high-resolution characteristic images, the fusion module is used for splicing and fusing the deep and shallow characteristic images with the same resolution, and the characteristic extraction module is used for extracting characteristic information from the fused characteristic images.
7. The method for detecting infrared small targets based on super pixels and deformable convolution according to claim 1, wherein the specific operation flow of the step S4 is as follows:
Step S4.1: the feature maps of different scales are fip, fip2, fip and fip4 respectively, the feature map fip4 is up-sampled and then multi-scale fusion is carried out on the feature map 3562 and the feature map fip, so that a feature map fcp1 is obtained;
Step S4.2: performing multiscale fusion on the feature map fcp1 and the feature map fip2 after upsampling to obtain an output feature map fop1;
step S4.3: the output feature map fop1 is up-sampled and then is subjected to multi-scale fusion with the feature map fip1, so that an output feature map fop2 is obtained;
step S4.4: and carrying out convolution processing on the output characteristic map fop2, and carrying out multi-scale fusion on the output characteristic map fop1 subjected to up-sampling to obtain an output characteristic map fop3.
CN202410073036.6A 2024-01-18 2024-01-18 Infrared dim target detection method based on superpixel and deformable convolution Pending CN117994573A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410073036.6A CN117994573A (en) 2024-01-18 2024-01-18 Infrared dim target detection method based on superpixel and deformable convolution

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410073036.6A CN117994573A (en) 2024-01-18 2024-01-18 Infrared dim target detection method based on superpixel and deformable convolution

Publications (1)

Publication Number Publication Date
CN117994573A true CN117994573A (en) 2024-05-07

Family

ID=90890476

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410073036.6A Pending CN117994573A (en) 2024-01-18 2024-01-18 Infrared dim target detection method based on superpixel and deformable convolution

Country Status (1)

Country Link
CN (1) CN117994573A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118570493A (en) * 2024-08-05 2024-08-30 北京航空航天大学 Feature enhancement method for weak and small targets in infrared image

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118570493A (en) * 2024-08-05 2024-08-30 北京航空航天大学 Feature enhancement method for weak and small targets in infrared image

Similar Documents

Publication Publication Date Title
Zhang et al. CAD-Net: A context-aware detection network for objects in remote sensing imagery
CN113052210B (en) Rapid low-light target detection method based on convolutional neural network
CN110135366B (en) Shielded pedestrian re-identification method based on multi-scale generation countermeasure network
CN109344701B (en) Kinect-based dynamic gesture recognition method
Wang et al. Small-object detection based on yolo and dense block via image super-resolution
CN115063573B (en) Multi-scale target detection method based on attention mechanism
CN110782420A (en) Small target feature representation enhancement method based on deep learning
CN113591968A (en) Infrared weak and small target detection method based on asymmetric attention feature fusion
CN111666842B (en) Shadow detection method based on double-current-cavity convolution neural network
CN110929593A (en) Real-time significance pedestrian detection method based on detail distinguishing and distinguishing
Cho et al. Semantic segmentation with low light images by modified CycleGAN-based image enhancement
CN113159158B (en) License plate correction and reconstruction method and system based on generation countermeasure network
CN113888461A (en) Method, system and equipment for detecting defects of hardware parts based on deep learning
CN116342894B (en) GIS infrared feature recognition system and method based on improved YOLOv5
CN117994573A (en) Infrared dim target detection method based on superpixel and deformable convolution
Zhao et al. Research on detection method for the leakage of underwater pipeline by YOLOv3
CN114299383A (en) Remote sensing image target detection method based on integration of density map and attention mechanism
CN111507184A (en) Human body posture detection method based on parallel cavity convolution and body structure constraint
CN115861756A (en) Earth background small target identification method based on cascade combination network
CN116363535A (en) Ship detection method in unmanned aerial vehicle aerial image based on convolutional neural network
CN117557774A (en) Unmanned aerial vehicle image small target detection method based on improved YOLOv8
CN117292117A (en) Small target detection method based on attention mechanism
CN116168240A (en) Arbitrary-direction dense ship target detection method based on attention enhancement
Liu et al. SLPR: A deep learning based Chinese ship license plate recognition framework
Cho et al. Modified perceptual cycle generative adversarial network-based image enhancement for improving accuracy of low light image segmentation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination