CAA-YOLO: Combined-Attention-Augmented YOLO for Infrared Ocean Ships Detection
<p>Modern detection structures with the input, backbone, neck and head. (<b>a</b>) Input: data preprocessing modules, such as data augmentation; (<b>b</b>) Backbone: feature extraction module; (<b>c</b>) Neck: multiscale feature fusion module; (<b>d</b>) Head: object classification and localization module.</p> "> Figure 2
<p>The framework of the CAA-YOLO.</p> "> Figure 3
<p>Structure with high-resolution detection layer P2. Red lines represent the newly added P2 layer.</p> "> Figure 4
<p>Illustration of the triplet attention, which has three branches. The first branch (1) is used to compute attention weights across channel dimension C and spatial dimension W. Similarly, the second branch (2) uses channel dimension C and spatial dimension H. The final branch (3) is used to capture spatial dependencies (H and W). Finally, the weights are aggregated by simple averaging.</p> "> Figure 5
<p>Illustration of the backbone with triplet attention. (<b>a</b>) Structure of backbone; (<b>b</b>) C3 module with residual structure; (<b>c</b>) Detailed structure of residual blocks in YOLOv5; (<b>d</b>) Proposed Bottleneck_TA module with triplet attention. * stands for repeating the module.</p> "> Figure 6
<p>Feature network design: (<b>a</b>) FPN introduces a top-down pathway to fuse multiscale features from level 3 to 6 (P3–P6); (<b>b</b>) PANet adds an additional bottom-up pathway on top of FPN; (<b>c</b>) BiFPN with cross-scale connections (<b>d</b>) is our CAA-YOLO with more context information.</p> "> Figure 7
<p>Illustration of the CAA module. (<b>a</b>) High-level feature maps use GC attention; (<b>b</b>) Low-level feature maps use CBAM attention; (<b>c</b>) Top-down feature information <math display="inline"><semantics> <msub> <mi mathvariant="normal">F</mi> <mi>td</mi> </msub> </semantics></math>.</p> "> Figure 8
<p>Data set distribution: The gray bar chart represents the number distribution of images for each type of ship; The blue bar chart represents the number distribution of each type of ship.</p> "> Figure 9
<p>Mosaic data augmentation method. 0–6 denotes Liner, Bulk carrier, Warship, Sailboat, Canoe, Container ship, and Fishing boat, respectively.</p> "> Figure 10
<p>The detection results of small targets with different methods. GT represents the ground truth box, YOLOv5 represents the original baseline network model and YOLOv5+P2 represents the addition of high resolution feature layer P2 based on YOLOv5; (<b>a</b>,<b>b</b>) represent the 2 groups of detection results, respectively.</p> "> Figure 11
<p>The infrared ship detection results of different methods. GT represents the ground truth box, YOLOv5 represents the original baseline network model and YOLOv5+P2 represents the addition of high resolution feature layer P2 based on YOLOv5; (<b>a</b>–<b>f</b>) represents the 6 groups of detection results obtained by using different models.</p> "> Figure 12
<p>The infrared ship detection results of different methods. GT represents the ground truth box, YOLOv5 represents the original baseline network model, YOLOv5+P2 represents the addition of high resolution feature layer P2 based on YOLOv5, YOLOv5+P2+CAA represents the addition of P2 and the CAA feature fusion module based on YOLOv5 and (<b>a</b>–<b>f</b>) represents the 6 groups of detection results obtained by using different models.</p> "> Figure 13
<p>The infrared ship detection results of different methods. GT represents the ground truth box, YOLOv5 represents the original baseline network model, YOLOv5+P2 represents the addition of high resolution feature layer P2 based on YOLOv5, YOLOv5+P2+TA represents the addition of P2 layer and triple attention in backbone based on YOLOv5, YOLOv5+P2+CAA represents the addition of P2 and the CAA feature fusion module based on YOLOv5, CAA-YOLO represents the additions of all modules based on YOLOv5 and (<b>a</b>–<b>d</b>) represents the 4 groups of detection results obtained by using different models.</p> "> Figure 14
<p>Confusion matrices. (<b>a</b>) YOLOv5; (<b>b</b>) CAA-YOLO.</p> "> Figure 15
<p>PR curve. (<b>a</b>) YOLOv5; (<b>b</b>) CAA-YOLO.</p> "> Figure 16
<p>The infrared ship detection results of different methods. GT represents the ground truth box, the second to sixth lines represent the detection results of faster R-CNN, SSD, RetinaNet, EfficientDet-D3 and CAA-YOLO, respectively. (<b>a</b>–<b>d</b>) represents the 4 groups of detection results.</p> ">
Abstract
:1. Introduction
- Aiming at the problems existing in infrared ocean ship scenes, this paper proposes a CAA-YOLO for infrared ocean ship detection based on YOLOv5. By introducing an attention module in the stage of feature extraction and feature fusion to utilize more shallow information, we improve the detection for small and weak targets. Compared with some state-of-the-art algorithms, the proposed method achieves better detection results.
- To reserve more shallow details and location information, we add the high-resolution feature layer P2, which improves the detection accuracy of small objects.
- To suppress the background noise and allow the network to independently distinguish the correlation and effectiveness between different feature mapping channels, we introduce a TA module into the backbone network.
- To capture the long-range contextual information of small objects, we design a novel feature fusion method and use a combined attention mechanism to enhance the ability of feature fusion and suppress the noise interference brought by shallow feature layers.
2. Related Work
3. Methodology
3.1. Network Architecture
3.2. High-Resolution Feature Layer P2
3.3. Enhanced Backbone
3.4. Feature Fusion
4. Experiments
4.1. Data Set
4.2. Experimental Settings
4.2.1. Implementation Details
4.2.2. Evaluation Metrics
4.2.3. Results
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Li, S.; Li, Y.; Li, Y.; Li, M.; Xu, X. YOLO-FIRI: Improved YOLOv5 for Infrared Image Object Detection. IEEE Access 2021, 9, 141861–141875. [Google Scholar] [CrossRef]
- Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar]
- Girshick, R. Fast r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. In Proceedings of the 28th International Conference on Neural Information Processing Systems, Montreal, QC, Canada, 7–12 December 2015. [Google Scholar]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
- Redmon, J.; Farhadi, A. YOLOv3: An Incremental Improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
- Bochkovskiy, A.; Wang, C.Y.; Liao, H.Y.M. Yolov4: Optimal speed and accuracy of object detection. arXiv 2020, arXiv:2004.10934. [Google Scholar]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. Ssd: Single shot multibox detector. In European Conference on Computer Vision; Springer: Berlin/Heidelberg, Germany, 2016; pp. 21–37. [Google Scholar]
- Nayan, A.A.; Saha, J.; Mozumder, A.N.; Mahmud, K.R.; Al Azad, A.K. Real Time Detection of Small Objects Detection and Recognition Using Vision Augmentation Algorithm. arXiv 2020, arXiv:2003.07442. [Google Scholar]
- Chen, C.; Liu, M.Y.; Tuzel, O.; Xiao, J. R-CNN for small object detection. In Asian Conference on Computer Vision; Springer: Berlin/Heidelberg, Germany, 2016; pp. 214–230. [Google Scholar]
- Tan, M.; Pang, R.; Le, Q.V. Efficientdet: Scalable and efficient object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–18 June 2020; pp. 10781–10790. [Google Scholar]
- Lim, J.S.; Astrid, M.; Yoon, H.J.; Lee, S.I. Small object detection using context and attention. In Proceedings of the 2021 International Conference on Artificial Intelligence in Information and Communication (ICAIIC), Jeju Island, Korea, 13–16 April 2021; pp. 181–186. [Google Scholar]
- Zhang, Y.; Guo, L.; Wang, Z.; Yu, Y.; Liu, X.; Xu, F. Intelligent ship detection in remote sensing images based on multi-layer convolutional feature fusion. Remote Sens. 2020, 12, 3316. [Google Scholar] [CrossRef]
- Kim, M.; Jeong, J.; Kim, S. ECAP-YOLO: Efficient Channel Attention Pyramid YOLO for Small Object Detection in Aerial Image. Remote Sens. 2021, 13, 4851. [Google Scholar] [CrossRef]
- Shao, J.; Yang, Q.; Luo, C.; Li, R.; Zhou, Y.; Zhang, F. Vessel Detection From Nighttime Remote Sensing Imagery Based on Deep Learning. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2021, 14, 12536–12544. [Google Scholar] [CrossRef]
- Bi, F.; Hou, J.; Chen, L.; Yang, Z.; Wang, Y. Ship detection for optical remote sensing images based on visual attention enhanced network. Sensors 2019, 19, 2271. [Google Scholar] [CrossRef] [Green Version]
- Cui, Z.; Wang, X.; Liu, N.; Cao, Z.; Yang, J. Ship detection in large-scale SAR images via spatial shuffle-group enhance attention. IEEE Trans. Geosci. Remote Sens. 2020, 59, 379–391. [Google Scholar] [CrossRef]
- Chen, L.; Shi, W.; Deng, D. Improved YOLOv3 based on attention mechanism for fast and accurate ship detection in optical remote sensing images. Remote Sens. 2021, 13, 660. [Google Scholar] [CrossRef]
- Dewi, C.; Chen, R.C.; Jiang, X.; Yu, H. Deep convolutional neural network for enhancing traffic sign recognition developed on Yolo V4. Multimed. Tools Appl. 2022, 1–25. [Google Scholar] [CrossRef]
- Liu, S.; Huang, D.; Wang, Y. Receptive field block net for accurate and fast object detection. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 385–400. [Google Scholar]
- Lin, T.Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2117–2125. [Google Scholar]
- Liu, S.; Qi, L.; Qin, H.; Shi, J.; Jia, J. Path aggregation network for instance segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 8759–8768. [Google Scholar]
- Zhou, P.; Ni, B.; Geng, C.; Hu, J.; Xu, Y. Scale-transferrable object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 528–537. [Google Scholar]
- Jiang, J.; Fu, X.; Qin, R.; Wang, X.; Ma, Z. High-speed lightweight ship detection algorithm based on YOLO-v4 for three-channels RGB SAR image. Remote Sens. 2021, 13, 1909. [Google Scholar] [CrossRef]
- Hu, J.; Zhi, X.; Shi, T.; Zhang, W.; Cui, Y.; Zhao, S. PAG-YOLO: A portable attention-guided YOLO network for small ship detection. Remote Sens. 2021, 13, 3059. [Google Scholar] [CrossRef]
- Rezatofighi, H.; Tsoi, N.; Gwak, J.; Sadeghian, A.; Reid, I.; Savarese, S. Generalized intersection over union: A metric and a loss for bounding box regression. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 658–666. [Google Scholar]
- Zheng, Z.; Wang, P.; Liu, W.; Li, J.; Ye, R.; Ren, D. Distance-IoU loss: Faster and better learning for bounding box regression. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 12993–13000. [Google Scholar]
- Kisantal, M.; Wojna, Z.; Murawski, J.; Naruniec, J.; Cho, K. Augmentation for small object detection. arXiv 2019, arXiv:1902.07296. [Google Scholar]
- Chen, C.; Zhang, Y.; Lv, Q.; Wei, S.; Wang, X.; Sun, X.; Dong, J. Rrnet: A hybrid detector for object detection in drone-captured images. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, Long Beach, CA, USA, 16–17 June 2019. [Google Scholar]
- Gadekallu, T.R.; Srivastava, G.; Liyanage, M.; Iyapparaja, M.; Chowdhary, C.L.; Koppu, S.; Maddikunta, P.K.R. Hand gesture recognition based on a Harris hawks optimized convolution neural network. Comput. Electr. Eng. 2022, 100, 107836. [Google Scholar] [CrossRef]
- Loshchilov, I.; Hutter, F. Sgdr: Stochastic gradient descent with warm restarts. arXiv 2016, arXiv:1608.03983. [Google Scholar]
- Liangkui, L.; Shaoyou, W.; Zhongxing, T. Using deep learning to detect small targets in infrared oversampling images. J. Syst. Eng. Electron. 2018, 29, 947–952. [Google Scholar]
- Li, M.; Zhang, T.; Cui, W. Research of infrared small pedestrian target detection based on YOLOv3. Infrared Technoiogy 2020, 42, 176–181. [Google Scholar] [CrossRef]
- Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 7132–7141. [Google Scholar]
- Li, Y.; Li, S.; Du, H.; Chen, L.; Zhang, D.; Li, Y. YOLO-ACN: Focusing on small target and occluded object detection. IEEE Access 2020, 8, 227288–227303. [Google Scholar] [CrossRef]
- Sun, M.; Zhang, H.; Huang, Z.; Luo, Y.; Li, Y. Road infrared target detection with I-YOLO. IET Image Process. 2022, 16, 92–101. [Google Scholar] [CrossRef]
- Dai, X.; Yuan, X.; Wei, X. TIRNet: Object detection in thermal infrared images for autonomous driving. Appl. Intell. 2021, 51, 1244–1261. [Google Scholar] [CrossRef]
- Du, S.; Zhang, B.; Zhang, P.; Xiang, P.; Xue, H. FA-YOLO: An Improved YOLO Model for Infrared Occlusion Object Detection under Confusing Background. Wirel. Commun. Mob. Comput. 2021, 2021. [Google Scholar] [CrossRef]
- Dai, Y.; Wu, Y.; Zhou, F.; Barnard, K. Asymmetric contextual modulation for infrared small target detection. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA, 3–8 January 2021; pp. 950–959. [Google Scholar]
- Zhang, J.; Jin, Y.; Xu, J.; Xu, X.; Zhang, Y. Mdu-net: Multi-scale densely connected u-net for biomedical image segmentation. arXiv 2018, arXiv:1812.00352. [Google Scholar]
- Dolz, J.; Ben Ayed, I.; Desrosiers, C. Dense multi-path U-Net for ischemic stroke lesion segmentation in multiple image modalities. In International MICCAI Brainlesion Workshop; Springer: Berlin/Heidelberg, Germany, 2018; pp. 271–282. [Google Scholar]
- Huang, H.; Lin, L.; Tong, R.; Hu, H.; Zhang, Q.; Iwamoto, Y.; Han, X.; Chen, Y.W.; Wu, J. Unet 3+: A full-scale connected unet for medical image segmentation. In Proceedings of the ICASSP 2020—2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, 4–8 May 2020; pp. 1055–1059. [Google Scholar]
- Zhou, Z.; Siddiquee, M.M.R.; Tajbakhsh, N.; Liang, J. Unet++: Redesigning skip connections to exploit multiscale features in image segmentation. IEEE Trans. Med. Imaging 2019, 39, 1856–1867. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Li, B.; Xiao, C.; Wang, L.; Wang, Y.; Lin, Z.; Li, M.; An, W.; Guo, Y. Dense nested attention network for infrared small target detection. arXiv 2021, arXiv:2106.00487. [Google Scholar]
- Cao, Y.; Zhou, T.; Zhu, X.; Su, Y. Every feature counts: An improved one-stage detector in thermal imagery. In Proceedings of the 2019 IEEE 5th International Conference on Computer and Communications (ICCC), Chengdu, China, 6–9 December 2019; pp. 1965–1969. [Google Scholar]
- Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention; Springer: Berlin/Heidelberg, Germany, 2015; pp. 234–241. [Google Scholar]
- Misra, D.; Nalamada, T.; Arasanipalai, A.U.; Hou, Q. Rotate to attend: Convolutional triplet attention module. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA, 3–8 January 2021; pp. 3139–3148. [Google Scholar]
- Shrivastava, A.; Gupta, A. Contextual priming and feedback for faster r-cnn. In European Conference on Computer Vision; Springer: Berlin/Heidelberg, Germany, 2016; pp. 330–348. [Google Scholar]
- Cai, Z.; Fan, Q.; Feris, R.S.; Vasconcelos, N. A unified multi-scale deep convolutional neural network for fast object detection. In European Conference on Computer Vision; Springer: Berlin/Heidelberg, Germany, 2016; pp. 354–370. [Google Scholar]
- Sermanet, P.; Kavukcuoglu, K.; Chintala, S.; LeCun, Y. Pedestrian detection with unsupervised multi-stage feature learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Portland, OR, USA, 23–28 June 2013; pp. 3626–3633. [Google Scholar]
- Cao, Y.; Xu, J.; Lin, S.; Wei, F.; Hu, H. Gcnet: Non-local networks meet squeeze-excitation networks and beyond. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, Long Beach, CA, USA, 16–17 June 2019. [Google Scholar]
- Wang, X.; Girshick, R.; Gupta, A.; He, K. Non-local neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 7794–7803. [Google Scholar]
- Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. Cbam: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
- Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar]
Strategy | Detection Algorithm | Description of Methods |
---|---|---|
Backbone | Lin et al. [32] | A 7-layer CNN was designed to automatically extract small target features and suppress clutters in an end-to-end manner. |
SE-YOLO [33] | An SE block was introduced into YOLOv3 to achieve higher accuracy and lower false alarm rate in small pedestrian detection task. | |
YOLO-ACN [35] | An attention mechanism was introduced in the channel and spatial dimensions in each residual block of YOLOv3 to focus on small targets. | |
I-YOLO [36] | A dilated-residual U-Net was also introduced to reduce the noise of infrared road images. | |
FA-YOLO [38] | A dilated CBAM module was added to the CSPDarknet53 in the YOLOv4 backbone. | |
TIRNet [37] | A residual branch was added when training to force the network to learn robust and discriminating features. | |
Fusion strategy | ACM-U-Net [39] | An asymmetric contextual modulation module was proposed for detecting infrared small targets based on FPN and U-Net [46]. |
DNANet [44] | A DNIM module was designed to achieve progressive interaction among high-level and low-level features on a infrared small target dataset. | |
ThermalDet [45] | A DFB block and CEM module were designed to directly fuse features from all different levels based on RefineDet. | |
High-resolution detection layer | YOLO-FIRI [1] | Multiscale detection was added to improve small object detection accuracy. |
Area (pix) | |||
---|---|---|---|
Type (Number) | ⩽10 × 10 | ⩽16 × 16 | ⩽32 × 32 |
Liner | 0 | 29 | 258 |
Bulk carrier | 0 | 17 | 175 |
Warship | 23 | 139 | 625 |
Sailboat | 3 | 76 | 560 |
Canoe | 87 | 423 | 1239 |
Container ship | 0 | 0 | 44 |
Fishing boat | 229 | 1460 | 5765 |
Ship Type | Max | Min | Mean | Variance | S/M/L |
---|---|---|---|---|---|
Liner | 306,600 | 36 | 21,817 | 1,330,938,169 | 602/266/741 |
Bulk carrier | 408,960 | 40 | 36,785 | 2,703,590,022 | 164/562/1460 |
Warship | 308,016 | 195 | 16,758 | 672,915,660 | 278/1191/1389 |
Sailboat | 384,678 | 20 | 19,207 | 1,197,080,051 | 1800/2044/2488 |
Canoe | 279,000 | 6 | 7567 | 421,559,886 | 2020/2354/1059 |
Container ship | 122,808 | 216 | 14,565 | 210,081,332 | 115/284/364 |
Fishing boat | 408,321 | 4 | 2278 | 190,665,384 | 8249/1020/888 |
Area (pix) | ||||
---|---|---|---|---|
Ship Type | Max | Min | Mean | Variance |
Liner | 6.73 | 0.23 | 1.76 | 0.72 |
Bulk carrier | 14.7 | 0.19 | 2.43 | 1.57 |
Warship | 4.28 | 0.24 | 2.16 | 0.56 |
Sailboat | 3.82 | 0.05 | 0.43 | 0.04 |
Canoe | 7.48 | 0.18 | 1.59 | 0.53 |
Container ship | 5.62 | 0.35 | 2.87 | 1.47 |
Fishing boat | 15.5 | 0.11 | 2.06 | 0.79 |
Detection Layer | Anchor |
---|---|
D2 | (10,5), (15,22), (21,7) |
D3 | (28,14), (37,57), (53,23) |
D4 | (49,130), (99,39), (188,54) |
D5 | (106,202), (189,273), (300,97) |
Model | AP (%) | AP50 (%) | AP75 (%) | APs (%) | APm (%) | APl (%) |
---|---|---|---|---|---|---|
YOLOv5 | 73.79 | 91.61 | 81.83 | 44.96 | 74.07 | 85.54 |
YOLOv5+P2 | 74.3 | 93.37 | 83.46 | 45.94 | 75.59 | 83.96 |
YOLOv5+P2+TA | 74.37 | 93.83 | 83.87 | 46.12 | 75.28 | 85.56 |
YOLOv5+P2+CAA | 74.79 | 93.84 | 84.14 | 48.35 | 75.63 | 85.83 |
CAA-YOLO | 75.35 | 94.25 | 83.98 | 50.59 | 76.16 | 87.31 |
Model | AR (%) | AR50 (%) | AR75 (%) | ARs (%) | ARm (%) | ARl (%) |
---|---|---|---|---|---|---|
YOLOv5 | 55.45 | 76.06 | 76.92 | 50.06 | 77.84 | 87.26 |
YOLOv5+P2 | 55.76 | 77.29 | 78.38 | 51.72 | 79.95 | 88.32 |
YOLOv5+P2+TA | 55.93 | 77.47 | 78.84 | 56.73 | 79.96 | 88.33 |
YOLOv5+P2+CAA | 55.97 | 77.81 | 78.84 | 56.73 | 79.96 | 88.33 |
CAA-YOLO | 56.98 | 78.22 | 79.31 | 59.07 | 80.33 | 89.78 |
Model | [email protected]:0.95 (%) | GFLOPs | Params (MB) |
---|---|---|---|
YOLOv5 | 79.4 | 108 | 92.9 |
YOLOv5+P2 | 79.8 | 127.5 | 95.5 |
YOLOv5+P2+TA | 80.1 | 129.2 | 95.8 |
YOLOv5+P2+CAA | 81.5 | 130.1 | 98 |
CAA-YOLO | 82.8 | 131.9 | 98.3 |
Model | Framework | Params (MB) | FPS | ||
---|---|---|---|---|---|
Faster-RCNN | ResNet50+FPN | 59.74 | 86.97 | 166 | 20 |
SSD | ResNet50+FPN | 48.21 | 74.47 | 55.8 | 100 |
RetinaNet | ResNet50+FPN | 46.82 | 79.19 | 146 | 22 |
EfficientDet-D3 | EfficientNet+BiFPN | 46.17 | 80.65 | 48.5 | 14 |
YOLOv5 | CSPDarknet+PAN | 79.40 | 90.09 | 92.9 | 53 |
CAA-YOLO | CAA-YOLO | 82.8 | 94.81 | 98.3 | 42 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Ye, J.; Yuan, Z.; Qian, C.; Li, X. CAA-YOLO: Combined-Attention-Augmented YOLO for Infrared Ocean Ships Detection. Sensors 2022, 22, 3782. https://doi.org/10.3390/s22103782
Ye J, Yuan Z, Qian C, Li X. CAA-YOLO: Combined-Attention-Augmented YOLO for Infrared Ocean Ships Detection. Sensors. 2022; 22(10):3782. https://doi.org/10.3390/s22103782
Chicago/Turabian StyleYe, Jing, Zhaoyu Yuan, Cheng Qian, and Xiaoqiong Li. 2022. "CAA-YOLO: Combined-Attention-Augmented YOLO for Infrared Ocean Ships Detection" Sensors 22, no. 10: 3782. https://doi.org/10.3390/s22103782
APA StyleYe, J., Yuan, Z., Qian, C., & Li, X. (2022). CAA-YOLO: Combined-Attention-Augmented YOLO for Infrared Ocean Ships Detection. Sensors, 22(10), 3782. https://doi.org/10.3390/s22103782