MM-IRSTD: Conv Self-Attention-Based Multi-Modal Small and Dim Target Detection in Infrared Dual-Band Images
"> Figure 1
<p>MM-IRSTD structure.</p> "> Figure 2
<p>Conv-based self-attention module schematic.</p> "> Figure 3
<p>Schematic of the Sobel operator.</p> "> Figure 4
<p>Geometric transfer loss from <math display="inline"><semantics> <mrow> <msub> <mi>l</mi> <mi>i</mi> </msub> </mrow> </semantics></math> to <math display="inline"><semantics> <mrow> <msub> <mi>m</mi> <mi>i</mi> </msub> </mrow> </semantics></math>.</p> "> Figure 5
<p>Sample of the dataset. The red box contains the target, and the blue box shows the enlarged view of the target.</p> "> Figure 6
<p>The variation of training loss and testing loss with epochs.</p> "> Figure 7
<p>Qualitative analysis of results. Mid-wave represents the mid-wave image, while Long-wave represents the long-wave image. The red box indicates the target, and the blue box shows the situation after the target has been magnified.</p> "> Figure 8
<p>Incorrect detection examples. In the detection results, the green box indicates correctly detected targets, while the red box indicates incorrectly detected targets. In the labeled original images, the area within the green edges represents the target.</p> ">
Abstract
:1. Introduction
- We proposed an end-to-end dual-band infrared small and dim target detection model framework that combines the Conv-Based Self-Attention module with the self-distillation mechanism.
- We introduced a Conv-Based Self-Attention module that incorporates dynamic weights to achieve adaptive feature fusion, uses convolutional operations to reduce the method’s computational complexity, and enhances the model’s global perception capabilities.
- We proposed a KL divergence-based self-distillation mechanism that constrains multi-band feature information from the spatial and channel dimensions of the feature maps. The self-distillation mechanism is used only during the model’s training process, ensuring efficient inference.
- Experimental results show that compared to SOTA approaches, MM-IRSTD improves F-score and Kappa by 3.13% and 7.73%, respectively, in dual-band infrared small and dim target detection tasks.
2. Related Work
2.1. Target Dtection Based on Multiband Image
2.2. Attention Mechanism
3. Methods
3.1. Overall Architecture
3.2. Conv-Based Self-Attention Module
3.3. Boundary Extraction Method Based on Sobel
3.4. KL Divergence-Based Self-Distillation Mechanism
3.4.1. Spatial Similarity Constraints
3.4.2. Channel Similarity Constraints
3.4.3. Invocation of the Module
Algorithm 1. Invocation process of the self-distillation mechanism |
Step 1: Extract long-wave image features and medium-wave image features Step 2: Fuse features Step 3: Upsample and feature concatenation if training apply L2 norm, get , then , out else Not invoking the self-distillation mechanism to improve inference speed return out |
3.5. Loss Function
4. Experimental
4.1. Dataset
4.2. Metrics
4.3. Implementation Details
4.4. Model Training
4.5. Comparison Experiment
4.5.1. Quantitative Analysis
4.5.2. Qualitative Evaluation
4.6. Ablation Experiments
4.6.1. Model Framework Ablation Experiment
4.6.2. Model Parameter Ablation Experiment
5. Discussion
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Yang, X.; Li, S.; Niu, S.; Yan, B.; Meng, Z. Graph-based Spatio-Temporal Semantic Reasoning Model for Anti-occlusion Infrared Aerial Target Recognition. IEEE Trans. Multimed. 2024, 1–15. [Google Scholar] [CrossRef]
- Yang, X.; Li, S.; Zhang, L.; Yan, B.; Meng, Z. Anti-Occlusion Infrared Aerial Target Recognition with Vision-Inspired Dual-Stream Graph Network. IEEE Trans. Geosci. Remote Sens. 2024, 52, 5004614. [Google Scholar] [CrossRef]
- Yang, X.; Li, S.; Cai, B.; Meng, Z.; Yan, J. MF-GCN: Motion Flow-Based Graph Network Learning Dynamics for Aerial IR Target Recognition. IEEE Trans. Aerosp. Electron. Syst. 2023, 59, 6346–6359. [Google Scholar] [CrossRef]
- Tian, X.; Li, S.; Yang, X.; Zhang, L.; Li, C. Joint spatio-temporal features and sea background prior for infrared dim and small target detection. Infrared Phys. Technol. 2023, 130, 104612. [Google Scholar] [CrossRef]
- Bai, X.; Zhou, F. Analysis of new top-hat transformation and the application for infrared dim small target detection. Pattern Recognit. 2010, 43, 2145–2156. [Google Scholar] [CrossRef]
- Chen, C.L.P.; Li, H.; Wei, Y.; Xia, T.; Tang, Y.Y. A Local Contrast Method for Small Infrared Target Detection. IEEE Trans. Geosci. Remote. Sens. 2013, 52, 574–581. [Google Scholar] [CrossRef]
- Dai, Y.; Wu, Y. Reweighted infrared patch-tensor model with both nonlocal and local priors for single-frame small target detection. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2017, 10, 3752–3767. [Google Scholar] [CrossRef]
- Lin, J.; Li, S.; Yang, X.; Niu, S.; Yan, B.; Meng, Z. CS-ViG-UNet: Infrared small and dim target detection based on cycle shift vision graph convolution network. Expert Syst. Appl. 2024, 254, 124385. [Google Scholar] [CrossRef]
- Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar]
- Girshick, R. Fast r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef]
- Lin, T.-Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2117–2125. [Google Scholar]
- Cai, Z.; Vasconcelos, N. Cascade r-cnn: Delving into high quality object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 6154–6162. [Google Scholar]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
- Redmon, J.; Farhadi, A. YOLO9000: Better, faster, stronger. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 7263–7271. [Google Scholar]
- Redmon, J.; Farhadi, A. Yolov3: An incremental improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
- Bochkovskiy, A.; Wang, C.-Y.; Liao, H.-Y.M. Yolov4: Optimal speed and accuracy of object detection. arXiv 2020, arXiv:2004.10934. [Google Scholar]
- Jocher, G.; Chaurasia, A.; Stoken, A.; Borovec, J.; Kwon, Y.; Michael, K.; Fang, J.; Yifu, Z.; Wong, C.; Montes, D.; et al. YOLOv5 by Ultralytics, Version 7.0. 2020. Available online: https://github.com/ultralytics/yolov5 (accessed on 22 July 2024).
- Ge, Z.; Liu, S.; Wang, F.; Li, Z.; Sun, J. Yolox: Exceeding yolo series in 2021. arXiv 2021, arXiv:2107.08430. [Google Scholar]
- Li, C.; Li, L.; Jiang, H.; Weng, K.; Geng, Y.; Li, L.; Ke, Z.; Li, Q.; Cheng, M.; Nie, W.; et al. YOLOv6: A single-stage object detection framework for industrial applications. arXiv 2022, arXiv:2209.02976v1. [Google Scholar]
- Wang, C.-Y.; Bochkovskiy, A.; Liao, H.-Y.M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. arXiv 2022, arXiv:2207.02696v1. [Google Scholar]
- Jocher, G.; Chaurasia, A.; Qiu, J. Ultralytics YOLO, Version 8.0.0. 2023. Available online: https://github.com/ultralytics/ultralytics (accessed on 22 July 2024).
- Wang, C.Y.; Yeh, I.H.; Liao, H.-Y.M. Yolov9: Learning what you want to learn using programmable gradient information. arXiv 2024, arXiv:2402.13616. [Google Scholar]
- Wang, A.; Chen, H.; Liu, L.; Chen, K.; Lin, Z.; Han, J.; Ding, G. Yolov10: Real-time end-to-end object detection. arXiv 2024, arXiv:2405.14458. [Google Scholar]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.-Y.; Berg, A.C. Ssd: Single shot multibox detector. In Proceedings of the Computer Vision—ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; Part I 14. pp. 21–37. [Google Scholar]
- Fu, C.-Y.; Liu, W.; Ranga, A.; Tyagi, A.; Berg, A.C. Dssd: Deconvolutional single shot detector. arXiv 2017, arXiv:1701.06659. [Google Scholar]
- Carion, N.; Massa, F.; Synnaeve, G.; Usunier, N.; Kirillov, A.; Zagoruyko, S. End-to-end object detection with transformers. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; pp. 213–229. [Google Scholar]
- Zhu, X.; Su, W.; Lu, L.; Li, B.; Wang, X.; Dai, J. Deformable detr: Deformable transformers for end-to-end object detection. arXiv 2020, arXiv:2010.04159. [Google Scholar]
- Sun, Z.; Cao, S.; Yang, Y.; Kitani, K.M. Rethinking transformer-based set prediction for object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 3611–3620. [Google Scholar]
- Dai, Z.; Cai, B.; Lin, Y.; Chen, J. Up-detr: Unsupervised pre-training for object detection with transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 1601–1610. [Google Scholar]
- Zheng, M.; Gao, P.; Zhang, R.; Li, K.; Wang, X.; Li, H.; Dong, H. End-to-end object detection with adaptive clustering transformer. arXiv 2020, arXiv:2011.09315. [Google Scholar]
- Kou, T.; Zhou, Z.; Liu, H.; Yang, Y. Multi-band composite detection and recognition of aerial infrared point targets. Infrared Phys. Technol. 2018, 94, 102–109. [Google Scholar] [CrossRef]
- Liu, J.; Gong, W.; Zhang, T.; Zhang, Y.; Deng, W.; Liu, H. Multi-band Image Fusion With Infrared Broad Spectrum For Low And Slow Small Target Recognition. In Proceedings of the 2022 International Conference on Artificial Intelligence and Computer Information Technology (AICIT), Yichang, China, 16–18 September 2022; pp. 1–5. [Google Scholar]
- Guo, L.; Rao, P.; Chen, X.; Li, Y. Infrared differential detection and band selection for space-based aerial targets under complex backgrounds. Infrared Phys. Technol. 2024, 138, 105172. [Google Scholar] [CrossRef]
- Liu, F.; Shen, T.; Ma, X. Convolutional neural network based multi-band ship target recognition with feature fusion. Acta Opt. Sin. 2017, 37, 1015002. [Google Scholar]
- Li, Y.; Zhang, D.; Fan, L.; Ma, H.; Xu, Z. Performance Analysis of Ship Target Recognition in Multi-Band Infrared Images Based on Deep Learning. In Proceedings of the 2023 7th International Conference on Transportation Information and Safety (ICTIS), Xi’an, China, 4–6 August 2023; pp. 432–436. [Google Scholar]
- Zhang, T.; Zhang, D.; Liu, Y. Research on Camouflage Target Detection Method Based on Dual Band Optics and SAR Image Fusion. In Proceedings of the International Conference on Image, Vision and Intelligent Systems, Baoding, China, 16–18 August 2023; pp. 320–335. [Google Scholar]
- Yin, W.; He, K.; Xu, D.; Yue, Y.; Luo, Y. Adaptive low light visual enhancement and high-significant target detection for infrared and visible image fusion. Vis. Comput. 2023, 39, 6723–6742. [Google Scholar] [CrossRef]
- Dahai, N.; Sheng, Z. An object detection algorithm based on decision-level fusion of visible and infrared images. Infrared Technol. 2023, 45, 282–291. [Google Scholar]
- Zheng, X.; Yang, Q.; Si, P.; Wu, Q. A multi-stage visible and infrared image fusion network based on attention mechanism. Sensors 2022, 22, 3651. [Google Scholar] [CrossRef]
- Fu, H.; Wang, S.; Duan, P.; Xiao, C.; Dian, R.; Li, S.; Li, Z. LRAF-Net: Long-Range Attention Fusion Network for Visible–Infrared Object Detection. IEEE Trans. Neural Netw. Learn. Syst. 2023, 35, 13232–13245. [Google Scholar] [CrossRef]
- Zhang, S.; Rao, P.; Zhang, H.; Chen, X.; Hu, T. Spatial infrared objects discrimination based on multi-channel CNN with attention mechanism. Infrared Phys. Technol. 2023, 132, 104670. [Google Scholar] [CrossRef]
- Li, H.; Wu, X.-J. CrossFuse: A novel cross attention mechanism based infrared and visible image fusion approach. Inf. Fusion 2023, 103, 102147. [Google Scholar] [CrossRef]
- Zhao, X.; Xu, Y.; Wu, F.; Niu, J.; Cai, W.; Zhang, Z. Ground infrared target detection method based on a parallel attention mechanism. Infrared Laser Eng. 2022, 51, 20210290-1. [Google Scholar]
- Wang, X.; Lu, R.; Bi, H.; Li, Y. An Infrared Small Target Detection Method Based on Attention Mechanism. Sensors 2023, 23, 8608. [Google Scholar] [CrossRef]
- Lin, J.; Li, S.; Zhang, L.; Yang, X.; Yan, B.; Meng, Z. IR-TransDet: Infrared dim and small target detection with IR-transformer. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5004813. [Google Scholar] [CrossRef]
- Zhan, W.; Zhang, C.; Guo, S.; Guo, J.; Shi, M. EGISD-YOLO: Edge Guidance Network for Infrared Ship Target Detection. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024, 17, 10097–10107. [Google Scholar] [CrossRef]
- Wang, K.; Gao, X.; Zhao, Y.; Li, X.; Dou, D.; Xu, C.-Z. Pay attention to features, transfer learn faster CNNs. In Proceedings of the International Conference on Learning Representations, New Orleans, LA, USA, 6–9 May 2019. [Google Scholar]
Size of Images | Number of Images | Types of Background | Size of the Target | Signal-to-Noise Ratio | |
---|---|---|---|---|---|
Training Set | 512 × 512 | 12,214 | Clouds/city/sea/field/river/mountains | 10–15 | 0.399625(l) 1.612668(m) |
Test Set | 512 × 512 | 3117 | Clouds/sea/field | 10–15 | 0.305886(l) 0.640464(m) |
Model | IoU | Dice | Precision | Recall | Fscore | Kappa | Fps |
---|---|---|---|---|---|---|---|
ACM | 0.711 | 0.797 | 0.877 | 0.745 | 0.797 | 0.594 | 25.18 |
DNANet | 0.688 | 0.774 | 0.900 | 0.708 | 0.774 | 0.548 | 4.43 |
CAVER 50 | 0.734 | 0.819 | 0.776 | 0.877 | 0.819 | 0.638 | 22.49 |
CAVER 101 | 0.746 | 0.830 | 0.787 | 0.887 | 0.830 | 0.660 | 18.61 |
LSNet | 0.745 | 0.829 | 0.783 | 0.892 | 0.829 | 0.658 | 43.01 |
SPNet | 0.670 | 0.754 | 0.676 | 0.954 | 0.754 | 0.508 | 20.01 |
Ours | 0.776 | 0.856 | 0.929 | 0.804 | 0.856 | 0.711 | 45.84 |
Exp | Model | IoU | Dice | Precision | Recall | F-Score | Kappa |
---|---|---|---|---|---|---|---|
1 | No Sobel | 0.742 | 0.826 | 0.873 | 0.789 | 0.826 | 0.652 |
2 | No Transformer | 0.745 | 0.829 | 0.783 | 0.892 | 0.829 | 0.658 |
3 | No KL | 0.770 | 0.851 | 0.860 | 0.849 | 0.851 | 0.709 |
4 | Ours (Sobel + KL + TF) | 0.776 | 0.856 | 0.929 | 0.804 | 0.856 | 0.711 |
Exp | Parameters | IoU | Dice | Precision | Recall | F-Score | Kappa |
---|---|---|---|---|---|---|---|
1 | (4,4,4,4,4) | 0.735 | 0.820 | 0.937 | 0.752 | 0.820 | 0.639 |
2 | (4,4,8,8,8) | 0.732 | 0.817 | 0.953 | 0.744 | 0.817 | 0.634 |
3 | (4,4,4,8,8) | 0.714 | 0.800 | 0.938 | 0.728 | 0.800 | 0.599 |
4 | (4,8,8,8,8) | 0.728 | 0.813 | 0.890 | 0.762 | 0.813 | 0.627 |
5 | Ours (4,4,4,6,8) | 0.776 | 0.856 | 0.929 | 0.804 | 0.856 | 0.711 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Yang, J.; Ye, Z.; Lin, J.; Chen, D.; Du, L.; Li, S. MM-IRSTD: Conv Self-Attention-Based Multi-Modal Small and Dim Target Detection in Infrared Dual-Band Images. Remote Sens. 2024, 16, 3937. https://doi.org/10.3390/rs16213937
Yang J, Ye Z, Lin J, Chen D, Du L, Li S. MM-IRSTD: Conv Self-Attention-Based Multi-Modal Small and Dim Target Detection in Infrared Dual-Band Images. Remote Sensing. 2024; 16(21):3937. https://doi.org/10.3390/rs16213937
Chicago/Turabian StyleYang, Junyan, Zhihui Ye, Jian Lin, Dongfang Chen, Lingbian Du, and Shaoyi Li. 2024. "MM-IRSTD: Conv Self-Attention-Based Multi-Modal Small and Dim Target Detection in Infrared Dual-Band Images" Remote Sensing 16, no. 21: 3937. https://doi.org/10.3390/rs16213937