HTDet: A Hybrid Transformer-Based Approach for Underwater Small Object Detection
<p>Feeble and small objects in the underwater image. The image appears severely degraded, showing low contrast, non-uniform illumination, and blurring: (<b>a</b>) shows the detection results, (<b>b</b>) shows the feeble and small object that is hard to detect, where the object’s SNR is −3.7 dB, and (<b>c</b>) shows the object with SNR equal to −2.7 dB.</p> "> Figure 2
<p>Overview schematic diagram of our algorithm. The images are fed into the network, first through a series of enhancements, then through a hybrid backbone network to extract features and a fine-grained pyramid to enhance the features. Finally, TTA is used to improve recognition accuracy.</p> "> Figure 3
<p>Overview of our detector. The input of the network is a color image with three channels, and the lightweight backbone is applied to extract multiscale features from the input image. A novel fine-grained FPN is applied to get feeble and small object friendly features. The hyperparameter <span class="html-italic">L</span> in the backbone is the repeated times of transformer blocks for each MobileVit module. The <span class="html-italic">s</span> in the MBV2 module is the down-sampling stride number relative to each module’s input.</p> "> Figure 4
<p>Mobilevit block. Compared to pure CNN or pure transformer, the Mobilevit block comprises the benefits of CNN and transformer together, and thus can get meaningful global context features.</p> "> Figure 5
<p>Fine-grained FPN. Compared to a standard FPN module, The fine-grained FPN directly builds the connection from the low-level feature to high-level features and avoids the context information being lost.</p> "> Figure 6
<p>Degraded underwater images in URPC2018. Images taken underwater usually present severe degradation because of the light window effect and the light scatter effect.</p> "> Figure 7
<p>Trade-offs between performance and number of parameters for each model. The <span class="html-italic">x</span>-axis is the number of parameters of the model and the <span class="html-italic">y</span>-axis is the APm point of the model. The closer the model is to the upper left corner, the better the model performs. Our model is represented by a pentagram. Considering both the number of parameters and performance, our model has the best performance among many models.</p> "> Figure 8
<p>Error analysis on URPC 2018. We decompose the error of our detector on the URPC2018 dataset and determine that the main error contribution is the BG error, which means the detector has difficulty finding the object in a severely degraded image. To this end, our work introduced a novel fine-grained FPN module and enhanced the feature representation of feeble and small objects.</p> "> Figure 9
<p>Precision–recall curves for each category in URPC 2018. There are four categories in the dataset. The area under the curve indicates the performance of that model in the specific category. The larger the area, the better the model performance.</p> "> Figure 10
<p>Confusion matrix for the classification branch. All values in the matrix are normalized, and color depths in the matrix represent the magnitude of the different values. As we do not detect the background, the diagonal value of the background is zero.</p> "> Figure 11
<p>Robustness analysis of our model under different noise disturbances. In (<b>a</b>), the triangle symbol indicates low-frequency noise, and the circle icon indicates high-frequency noise. The red bars in (<b>b</b>–<b>f</b>) are our model performance, whereas the blue bars are the baseline. From (<b>b</b>) to (<b>f</b>), the severity of the noise is sequentially increased. (<b>a</b>) The performance trend of our model under different levels of noise disturbance. (<b>b</b>) Comparison of model performances under different noise disturbances for a noise severity of 1. (<b>c</b>) Comparison of model performances under different noise disturbances for a noise severity of 2. (<b>d</b>) Comparison of model performances under different noise disturbances for a noise severity of 3. (<b>e</b>) Comparison of model performances under different noise disturbances for a noise severity of 4. (<b>f</b>) Comparison of model performances under different noise disturbances for a noise severity of 5.</p> "> Figure 12
<p>Visualization of the detection results of our model in the scenarios with three types of noise and five varying severity levels of input noise disturbances. In the figure, <span class="html-italic">S</span> is the severity of the noise, each column is the detection result under the same noise, and each row is the detection result of the model under different noise disturbances of the same severity. It is noticeable that <math display="inline"><semantics> <mrow> <mi>S</mi> <mo>=</mo> <mn>0</mn> </mrow> </semantics></math> is the model detection result without additional noise.</p> "> Figure 13
<p>Visualization of the model’s image detection results for different levels of noise severity in the scenario of four types of blurred noise that may be caused by movement. In the figure, <span class="html-italic">S</span> represents the same meaning as in <a href="#remotesensing-15-01076-f012" class="html-fig">Figure 12</a>. In contrast to the previous figure, we use four different kinds of blurred noise here.</p> "> Figure 14
<p>Detection results of our proposed model on the URPC 2018 dataset. The blue boxes are starfish and the red boxes are echinus. Almost all objects are located with a bounding box, although for some objects, it is hard to pick it out. Our detector achieved relatively good qualitative results.</p> ">
Abstract
:1. Introduction
- A novel transformer-based hybrid detector for underwater feeble and small object detection is proposed, which can extract the global and local context information efficiently and effectively;
- To tackle the signal vanishing problem of feeble and small objects, fine-grained (FPN) is designed to cumulatively fuse low-level and high-level features;
- To further enhance the detector’s accuracy, we use the memory-free TTA approach for real-time detection.
2. Related Work
2.1. General Purpose Object Detection
2.2. Underwater Object Detection
2.3. Lightweight Detectors
3. Method
3.1. Overall Architecture
3.2. Fine-Grained Feature Pyramid Network
3.3. Loss Function
3.4. Test Time Augmentation
4. Experiments
4.1. Dataset
4.2. Evaluation Metric
4.3. Implementation Details
4.4. Comparison with Other Models
Method | #Param | AP | AP50 | AP75 | APs | APm | APl | Schedules |
---|---|---|---|---|---|---|---|---|
Two-Stage Method: | ||||||||
Faster R-CNN [13] | 33.6 M | 35.8 | 69.8 | 33.4 | 16.4 | 36.5 | 51.4 | 2× |
Grid R-CNN [50] | 64.3 M | 36.4 | 69.9 | 34.1 | 15.3 | 37.4 | 51.2 | 2× |
Dynamic R-CNN [51] | 41.5 M | 35.8 | 66.9 | 35.2 | 13.3 | 37.2 | 51.4 | 2× |
Cascade R-CNN [52] | 68.9 M | 37.0 | 69.2 | 35.6 | 16.0 | 37.9 | 52.0 | 2× |
Libra R-CNN [53] | 41.4 M | 36.0 | 68.8 | 33.7 | 16.4 | 36.7 | 51.2 | 2× |
Sparse R-CNN [54] | 106.1 M | 30.9 | 61.2 | 27.8 | 16.2 | 30.8 | 46.2 | 2× |
One-Stage Method: | ||||||||
YOLOv3 237e [21] | 61.5 M | 37.8 | 72.1 | 35.0 | 19.4 | 38.4 | 50.5 | 237e |
RetinaNet [16] | 36.2 M | 32.2 | 65.3 | 28.2 | 9.2 | 32.6 | 48.0 | 2× |
FCOS [55] | 32.0 M | 34.7 | 69.7 | 29.8 | 13.9 | 35.6 | 47.7 | 2× |
ATSS [19] | 32.1 M | 29.3 | 62.1 | 22.9 | 13.5 | 30.9 | 36.0 | 2× |
Auto Assign [20] | 35.9 M | 35.5 | 71.8 | 30.1 | 15.5 | 36.1 | 49.6 | 2× |
SSD300 [15] | 24.2 M | 32.4 | 64.7 | 27.1 | 16.0 | 33.7 | 42.5 | 2× |
Ours | 7.7 M | 38.5 | 76.3 | 32.7 | 22.8 | 39.3 | 49.0 | 2× |
4.5. Ablation Study and Analysis
4.6. URPC 2018 Error Analysis
- Localization (Loc):classification is correct, and ;
- Other (Oth):wrong classes, and ;
- Background (BG): for all objects;
- False Negative (FN): , but the classification is wrong.
4.7. Detection Results Analysis for Each Category
4.8. Classification Analysis for Each Category
4.9. Analysis on Robustness of the Detector
4.10. Visualization of the Detection Results
5. Discussion
5.1. Lightweight Object Detection
5.2. Underwater Feeble and Small Object Detection
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
SNR | Signal-to-Noise Ratio |
mAP | Mean Average Precision |
AP | Average Precision |
CNN | Convolutional Neural Network |
MBV2 | MobileNetV2 Block |
MVIT | MobileViT Block |
AUV | Autonomous Underwater Vehicle |
FPN | Feature Pyramid Network |
FG-FPN | Fine-Grained Feature Pyramid Network |
References
- Moniruzzaman, M.; Islam, S.M.S.; Bennamoun, M.; Lavery, P. Deep learning on underwater marine object detection: A survey. In Proceedings of the Advanced Concepts for Intelligent Vision Systems: 18th International Conference, ACIVS 2017, Antwerp, Belgium, 18–21 September 2017; Springer: Berlin/Heidelberg, Germany, 2017; pp. 150–160. [Google Scholar]
- Fayaz, S.; Parah, S.A.; Qureshi, G. Underwater object detection: Architectures and algorithms–a comprehensive review. Multimed. Tools Appl. 2022, 81, 20871–20916. [Google Scholar] [CrossRef]
- Er, M.J.; Jie, C.; Zhang, Y.; Gao, W. Research Challenges, Recent Advances and Benchmark Datasets in Deep-Learning-Based Underwater Marine Object Detection: A Review. TechRxiv 2022. [Google Scholar] [CrossRef]
- Moniruzzaman, M.; Islam, S.M.S.; Lavery, P.; Bennamoun, M. Faster R-CNN based deep learning for seagrass detection from underwater digital images. In Proceedings of the 2019 Digital Image Computing: Techniques and Applications (DICTA), Perth, Australia, 2–4 December 2019; pp. 1–7. [Google Scholar]
- Tian, M.; Li, X.; Kong, S.; Wu, L.; Yu, J. A modified YOLOv4 detection method for a vision-based underwater garbage cleaning robot. Front. Inf. Technol. Electron. Eng. 2022, 23, 1217–1228. [Google Scholar] [CrossRef]
- Wang, Y.; Tang, C.; Cai, M.; Yin, J.; Wang, S.; Cheng, L.; Wang, R.; Tan, M. Real-time underwater onboard vision sensing system for robotic gripping. IEEE Trans. Instrum. Meas. 2020, 70, 5002611. [Google Scholar] [CrossRef]
- Zhang, W.; Dong, L.; Zhang, T.; Xu, W. Enhancing underwater image via color correction and bi-interval contrast enhancement. Signal Process. Image Commun. 2021, 90, 116030. [Google Scholar] [CrossRef]
- Han, M.; Lyu, Z.; Qiu, T.; Xu, M. A review on intelligence dehazing and color restoration for underwater images. IEEE Trans. Syst. Man Cybern. Syst. 2018, 50, 1820–1832. [Google Scholar] [CrossRef]
- Wang, N.; Zheng, B.; Zheng, H.; Yu, Z. Feeble object detection of underwater images through LSR with delay loop. Opt. Express 2017, 25, 22490–22498. [Google Scholar] [CrossRef]
- Song, Y.; He, B.; Liu, P. Real-time object detection for AUVs using self-cascaded convolutional neural networks. IEEE J. Ocean. Eng. 2019, 46, 56–67. [Google Scholar] [CrossRef]
- Zhang, M.; Xu, S.; Song, W.; He, Q.; Wei, Q. Lightweight underwater object detection based on yolo v4 and multi-scale attentional feature fusion. Remote Sens. 2021, 13, 4706. [Google Scholar] [CrossRef]
- Shorten, C.; Khoshgoftaar, T.M. A survey on image data augmentation for deep learning. J. Big Data 2019, 6, 60. [Google Scholar] [CrossRef]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. In Proceedings of the 29th Annual Conference on Neural Information Processing Systems 2015, Montreal, QC, Canada, 7–12 December 2015; Volume 28. [Google Scholar]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26 June–1 July 2016; pp. 779–788. [Google Scholar]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. Ssd: Single shot multibox detector. In Proceedings of the Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; Springer: Berlin/Heidelberg, Germany, 2016; pp. 21–37. [Google Scholar]
- Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar]
- Carion, N.; Massa, F.; Synnaeve, G.; Usunier, N.; Kirillov, A.; Zagoruyko, S. End-to-end object detection with transformers. In Proceedings of the Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; Springer: Berlin/Heidelberg, Germany, 2020; pp. 213–229. [Google Scholar]
- Zhu, X.; Su, W.; Lu, L.; Li, B.; Wang, X.; Dai, J. Deformable detr: Deformable transformers for end-to-end object detection. arXiv 2020, arXiv:2010.04159. [Google Scholar]
- Zhang, S.; Chi, C.; Yao, Y.; Lei, Z.; Li, S.Z. Bridging the gap between anchor-based and anchor-free detection via adaptive training sample selection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 9759–9768. [Google Scholar]
- Zhu, B.; Wang, J.; Jiang, Z.; Zong, F.; Liu, S.; Li, Z.; Sun, J. Autoassign: Differentiable label assignment for dense object detection. arXiv 2020, arXiv:2007.03496. [Google Scholar]
- Redmon, J.; Farhadi, A. Yolov3: An incremental improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
- Bochkovskiy, A.; Wang, C.Y.; Liao, H.Y.M. Yolov4: Optimal speed and accuracy of object detection. arXiv 2020, arXiv:2004.10934. [Google Scholar]
- Peng, F.; Miao, Z.; Li, F.; Li, Z. S-FPN: A shortcut feature pyramid network for sea cucumber detection in underwater images. Expert Syst. Appl. 2021, 182, 115306. [Google Scholar] [CrossRef]
- Zong, C.; Wang, H.; Wan, Z. An improved 3D point cloud instance segmentation method for overhead catenary height detection. Comput. Electr. Eng. 2022, 98, 107685. [Google Scholar] [CrossRef]
- Yang, M.; Wang, H.; Hu, K.; Yin, G.; Wei, Z. IA-Net: An Inception–Attention-Module-Based Network for Classifying Underwater Images From Others. IEEE J. Ocean. Eng. 2022, 47, 704–717. [Google Scholar] [CrossRef]
- Liao, L.; Du, L.; Guo, Y. Semi-supervised SAR target detection based on an improved faster R-CNN. Remote Sens. 2021, 14, 143. [Google Scholar] [CrossRef]
- Zhou, G.; Li, W.; Zhou, X.; Tan, Y.; Lin, G.; Li, X.; Deng, R. An innovative echo detection system with STM32 gated and PMT adjustable gain for airborne LiDAR. Int. J. Remote Sens. 2021, 42, 9187–9211. [Google Scholar] [CrossRef]
- Zhou, G.; Zhou, X.; Song, Y.; Xie, D.; Wang, L.; Yan, G.; Hu, M.; Liu, B.; Shang, W.; Gong, C.; et al. Design of supercontinuum laser hyperspectral light detection and ranging (LiDAR)(SCLaHS LiDAR). Int. J. Remote Sens. 2021, 42, 3731–3755. [Google Scholar] [CrossRef]
- Wu, X.; Hong, D.; Tian, J.; Chanussot, J.; Li, W.; Tao, R. ORSIm detector: A novel object detection framework in optical remote sensing imagery using spatial-frequency channel features. IEEE Trans. Geosci. Remote Sens. 2019, 57, 5146–5158. [Google Scholar] [CrossRef] [Green Version]
- Wu, X.; Hong, D.; Chanussot, J. UIU-Net: U-Net in U-Net for infrared small object detection. IEEE Trans. Image Process. 2022, 32, 364–376. [Google Scholar] [CrossRef]
- Zhou, G.; Li, C.; Zhang, D.; Liu, D.; Zhou, X.; Zhan, J. Overview of underwater transmission characteristics of oceanic LiDAR. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2021, 14, 8144–8159. [Google Scholar] [CrossRef]
- Liu, L.; Zhang, S.; Zhang, L.; Pan, G.; Yu, J. Multi-UUV Maneuvering Counter-Game for Dynamic Target Scenario Based on Fractional-Order Recurrent Neural Network. IEEE Trans. Cybern. 2022, 1–14. [Google Scholar] [CrossRef]
- Xie, B.; Li, S.; Lv, F.; Liu, C.H.; Wang, G.; Wu, D. A collaborative alignment framework of transferable knowledge extraction for unsupervised domain adaptation. IEEE Trans. Knowl. Data Eng. 2022; Early Access. [Google Scholar] [CrossRef]
- Zhao, Z.; Liu, Y.; Sun, X.; Liu, J.; Yang, X.; Zhou, C. Composited FishNet: Fish Detection and Species Recognition From Low-Quality Underwater Videos. IEEE Trans. Image Process. 2021, 30, 4719–4734. [Google Scholar] [CrossRef] [PubMed]
- Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.C. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 4510–4520. [Google Scholar]
- Chollet, F. Xception: Deep learning with depthwise separable convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1251–1258. [Google Scholar]
- Lin, T.Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2117–2125. [Google Scholar]
- Yeh, C.H.; Lin, C.H.; Kang, L.W.; Huang, C.H.; Lin, M.H.; Chang, C.Y.; Wang, C.C. Lightweight deep neural network for joint learning of underwater object detection and color conversion. IEEE Trans. Neural Netw. Learn. Syst. 2021, 33, 6129–6143. [Google Scholar] [CrossRef]
- Tan, C.; DanDan, C.; Huang, H.; Yang, Q.; Huang, X. A Lightweight Underwater Object Detection Model: FL-YOLOV3-TINY. In Proceedings of the 2021 IEEE 12th Annual Information Technology, Electronics and Mobile Communication Conference (IEMCON), Vancouver, BC, Canada, 27–30 October 2021; pp. 0127–0133. [Google Scholar]
- Mehta, S.; Rastegari, M. Mobilevit: Light-weight, general-purpose, and mobile-friendly vision transformer. arXiv 2021, arXiv:2110.02178. [Google Scholar]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
- Tong, K.; Wu, Y.; Zhou, F. Recent advances in small object detection based on deep learning: A review. Image Vis. Comput. 2020, 97, 103910. [Google Scholar] [CrossRef]
- Liu, Y.; Sun, P.; Wergeles, N.; Shang, Y. A survey and performance evaluation of deep learning methods for small object detection. Expert Syst. Appl. 2021, 172, 114602. [Google Scholar] [CrossRef]
- Sun, W.; Dai, L.; Zhang, X.; Chang, P.; He, X. RSOD: Real-time small object detection algorithm in UAV-based traffic monitoring. Appl. Intell. 2022, 52, 8448–8463. [Google Scholar] [CrossRef]
- Qi, G.; Zhang, Y.; Wang, K.; Mazur, N.; Liu, Y.; Malaviya, D. Small Object Detection Method Based on Adaptive Spatial Parallel Convolution and Fast Multi-Scale Fusion. Remote Sens. 2022, 14, 420. [Google Scholar] [CrossRef]
- Chen, Q.; Wang, Y.; Yang, T.; Zhang, X.; Cheng, J.; Sun, J. You only look one-level feature. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 19–25 June 2021; pp. 13039–13048. [Google Scholar]
- Lin, W.H.; Zhong, J.X.; Liu, S.; Li, T.; Li, G. RoIMix: Proposal-fusion among multiple images for underwater object detection. In Proceedings of the ICASSP 2020—2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, 4–8 May 2020; pp. 2588–2592. [Google Scholar]
- Lin, T.Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft coco: Common objects in context. In Proceedings of the European Conference on Computer Vision, Zurich, Switzerland, 6–12 September 2014; Springer: Berlin/Heidelberg, Germany, 2014; pp. 740–755. [Google Scholar]
- Chen, K.; Wang, J.; Pang, J.; Cao, Y.; Xiong, Y.; Li, X.; Sun, S.; Feng, W.; Liu, Z.; Xu, J.; et al. MMDetection: Open mmlab detection toolbox and benchmark. arXiv 2019, arXiv:1906.07155. [Google Scholar]
- Lu, X.; Li, B.; Yue, Y.; Li, Q.; Yan, J. Grid r-cnn. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 7363–7372. [Google Scholar]
- Zhang, H.; Chang, H.; Ma, B.; Wang, N.; Chen, X. Dynamic R-CNN: Towards high quality object detection via dynamic training. In Proceedings of the Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; Springer: Berlin/Heidelberg, Germany, 2020; pp. 260–275. [Google Scholar]
- Cai, Z.; Vasconcelos, N. Cascade R-CNN: High quality object detection and instance segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2019, 43, 1483–1498. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Pang, J.; Chen, K.; Shi, J.; Feng, H.; Ouyang, W.; Lin, D. Libra r-cnn: Towards balanced learning for object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 821–830. [Google Scholar]
- Sun, P.; Zhang, R.; Jiang, Y.; Kong, T.; Xu, C.; Zhan, W.; Tomizuka, M.; Li, L.; Yuan, Z.; Wang, C.; et al. Sparse r-cnn: End-to-end object detection with learnable proposals. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 19–25 June 2021; pp. 14454–14463. [Google Scholar]
- Tian, Z.; Shen, C.; Chen, H.; He, T. Fcos: Fully convolutional one-stage object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 9627–9636. [Google Scholar]
- Jin, H.S.; Cho, H.; Jiafeng, H.; Lee, J.H.; Kim, M.J.; Jeong, S.K.; Ji, D.H.; Joo, K.; Jung, D.; Choi, H.S. Hovering control of UUV through underwater object detection based on deep learning. Ocean. Eng. 2022, 253, 111321. [Google Scholar] [CrossRef]
- Álvarez-Tuñón, O.; Jardón, A.; Balaguer, C. Generation and processing of simulated underwater images for infrastructure visual inspection with UUVs. Sensors 2019, 19, 5497. [Google Scholar] [CrossRef] [Green Version]
- Watson, S.; Duecker, D.A.; Groves, K. Localisation of unmanned underwater vehicles (UUVs) in complex and confined environments: A review. Sensors 2020, 20, 6203. [Google Scholar] [CrossRef]
- Yang, M.; Hu, J.; Li, C.; Rohde, G.; Du, Y.; Hu, K. An in-depth survey of underwater image enhancement and restoration. IEEE Access 2019, 7, 123638–123657. [Google Scholar] [CrossRef]
- Anwar, S.; Li, C. Diving deeper into underwater image enhancement: A survey. Signal Process. Image Commun. 2020, 89, 115978. [Google Scholar] [CrossRef]
- Hendrycks, D.; Dietterich, T.G. Benchmarking neural network robustness to common corruptions and surface variations. arXiv 2018, arXiv:1807.01697. [Google Scholar]
Method | #Param | AP | AP50 | AP75 | APs | APm | APl |
---|---|---|---|---|---|---|---|
Baseline | 36.2 M | 32.2 | 65.3 | 28.2 | 9.2 | 32.6 | 48.0 |
Baseline + mobilevit | 6.94 M | 37.3 | 74.6 | 32.3 | 21.9 | 37.9 | 48.1 |
Baseline + mobilevit + FG-FPN | 7.68 M | 38.1 | 75.4 | 32.9 | 22.3 | 38.6 | 49.1 |
Baseline + mobilevit + FG-FPN + TTA | 7.68 M | 38.5 | 76.3 | 32.7 | 22.8 | 39.3 | 49.0 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Chen, G.; Mao, Z.; Wang, K.; Shen, J. HTDet: A Hybrid Transformer-Based Approach for Underwater Small Object Detection. Remote Sens. 2023, 15, 1076. https://doi.org/10.3390/rs15041076
Chen G, Mao Z, Wang K, Shen J. HTDet: A Hybrid Transformer-Based Approach for Underwater Small Object Detection. Remote Sensing. 2023; 15(4):1076. https://doi.org/10.3390/rs15041076
Chicago/Turabian StyleChen, Gangqi, Zhaoyong Mao, Kai Wang, and Junge Shen. 2023. "HTDet: A Hybrid Transformer-Based Approach for Underwater Small Object Detection" Remote Sensing 15, no. 4: 1076. https://doi.org/10.3390/rs15041076