A Dynamic Interference Detection Method of Underwater Scenes Based on Deep Learning and Attention Mechanism
<p>YOLOv8 network framework. The arrows in the diagram indicate sequential execution, where each module is performed one after the other upon completion of the previous module. In the upper-right corner of the diagram, the detailed components of each module are shown: the purple represents the detailed components of the Detect module, the light green represents the detailed components of the Conv module, the light gray represents the detailed components of the SPPF module, the blue represents the detailed components of the C2f module, and the yellow represents the detailed components of the Bottleneck module within the C2f module.</p> "> Figure 2
<p>PConv (Partial Convolution) structure.</p> "> Figure 3
<p>Improved Bottleneck structure.</p> "> Figure 4
<p>SE-net structure.</p> "> Figure 5
<p>ReLU function curve.</p> "> Figure 6
<p>PReLU function curve.</p> "> Figure 7
<p>MPDIoU parameter diagram.</p> "> Figure 8
<p>Dynamic target detection dataset.</p> "> Figure 9
<p>Original YOLOv8 network training result graph.</p> "> Figure 10
<p>Improved YOLOv8 network training result graph.</p> "> Figure 11
<p>The original images.</p> "> Figure 12
<p>The detection results using the unmodified YOLOv8 algorithm.</p> "> Figure 13
<p>The detection results using the proposed modified YOLOv8 algorithm.</p> ">
Abstract
:1. Introduction
- An improved YOLOv8 algorithm for detecting targets in underwater scenes is put forward. This approach enhances the feature extraction layer of the YOLOv8 network and optimizes the convolutional network topology of Bottleneck, resulting in reduced computational requirements and improved detection accuracy.
- The enhanced YOLOv8 underwater scene target detection algorithm incorporates the enhanced SE attention mechanism to enhance the network’s feature extraction capabilities. Additionally, the network’s confidence box loss function is enhanced by replacing the CIoU loss function with the MPDIoU loss function. Significantly enhances the rate at which the model reaches convergence.
- The suggested underwater scene target identification algorithm underwent ablation experiments and comparison experiments. A dataset for detecting dynamic targets underwater was created, followed by conducting experiments. The experimental results confirmed the efficacy of the method.
2. Improved YOLOv8 Underwater Dynamic Object Detection Method
2.1. YOLOv8-Based Object Detection Framework
2.2. Improved YOLOv8 Underwater Dynamic Object Detection Method
2.2.1. Bottleneck Convolutional Network Improvements
2.2.2. Feature Extraction Layer Improvements
2.2.3. Loss Function Improvements
3. Experimental Results and Analysis
3.1. Experimental Setup
3.2. Dataset Creation
3.3. Experimental Results and Analysis
3.3.1. Ablation Experiments
3.3.2. Analysis of the Training Process
3.3.3. Comparative Experiments of Different Models
3.3.4. Detection Results and Analysis
4. Discussion
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Data Availability Statement
Conflicts of Interest
References
- Hu, K.; Wang, T.; Shen, C.; Weng, C.; Zhou, F.; Xia, M.; Weng, L. Overview of underwater 3D reconstruction technology based on optical images. J. Mar. Sci. Eng. 2023, 11, 949. [Google Scholar] [CrossRef]
- Sermanet, P.; Eigen, D.; Zhang, X.; Mathieu, M.; Fergus, R.; LeCun, Y. Overfeat: Integrated recognition, localization and detectionusing convolutional networks. arXiv 2013, arXiv:1312.6229. [Google Scholar]
- Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and Semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; IEEE: Piscataway, NJ, USA, 2014; pp. 580–587. [Google Scholar]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. SSD: Single shot multibox detector. In European Conferenceon Computer Vision, Proceedings of the 4th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; Springer: Berlin, Germany, 2016; pp. 21–37. [Google Scholar]
- Redmon, J.; Farhadi, A. YOLO9000: Better, faster, stronger. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 7263–7271. [Google Scholar]
- Redmon, J.; Farhadi, A. YOLOv3: An incremental improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Zhang, S.; Wen, L.; Bian, X.; Lei, Z.; Li, S.Z. Single-shot refinement neural network for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 4203–4212. [Google Scholar]
- Law, H.; Deng, J. CornerNet: Detecting Objects as Paired Keypoints. Int. J. Comput. Vis. 2020, 128, 642–656. [Google Scholar] [CrossRef]
- Bochkovskiy, A.; Wang, C.Y.; Liao, H.Y.M. YOLOv4: Optimal speed and accuracy of object detection. arXiv 2020, arXiv:2004.10934. [Google Scholar]
- Jocher, G.; Stoken, A.; Borovec, J.; Chaurasia, A.; Changyu, L.; Hogan, A.; Hajek, J.; Diaconu, L.; Kwon, Y.; Defretin, Y.; et al. ultralytics/YOLOv5: v5. 0-YOLOv5-P6 1280 models AWS Supervise. ly and YouTube integrations. Zenodo 2021, 11. [Google Scholar]
- Li, C.; Li, L.; Jiang, H.; Weng, K.; Geng, Y.; Li, L.; Ke, Z.; Li, Q.; Cheng, M.; Nie, W.; et al. YOLOv6: A single-stage object detection framework for industrial applications. arXiv 2022, arXiv:2209:02976. [Google Scholar]
- Wang, C.Y.; Bochkovskiy, A.; Liao, H.Y.M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. arXiv 2022, arXiv:2207:02696. [Google Scholar]
- Wang, C.Y.; Yeh, I.H.; Mark Liao, H.Y. YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information. arXiv 2024, arXiv:2402:13616. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Spatial pyramid pooling in deep convolutional networks for visual recognition. In IEEE Transactions on Pattern Analysis and Machine Intelligence; Springer: Berlin, Germany, 2015; Volume 37, pp. 1904–1916. [Google Scholar]
- Girshick, R. Fast r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed]
- Lin, T.-Y.; Dollar, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Patter Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2117–2125. [Google Scholar]
- Cai, Z.; Vasconcelos, N. Cascade r-cnn: Delving into high quality object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, IEEE, Salt Lake City, UT, USA, 18–23 June 2018; pp. 6154–6162. [Google Scholar]
- Duan, K.; Xie, L.; Qi, H.; Huang, Q.; Tian, Q. Corner proposal network for anchor-free, two-stage object detection. In European Conference on Computer Vision; Springer International Publishing: Cham, Switzerland, 2020; pp. 399–416. [Google Scholar]
- Ge, Z. Yolox: Exceeding yolo series in 2021. arXiv 2021, arXiv:2107.08430. [Google Scholar]
- Liu, S.; Qi, L.; Qin, H.; Shi, J.; Jia, J. Path aggregation network for instance segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 8759–8768. [Google Scholar]
- Chen, J.; Kao, S.H.; He, H.; Zhuo, W.; Wen, S.; Lee, C.H.; Chan, S.H.G. Run, Don’t Walk: Chasing Higher FLOPS for Faster Neural Networks. In Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 17–24 June 2023; pp. 12021–12031. [Google Scholar]
- Hu, J.; Shen, L.; Sun, G. Squeeze-and-Excitation Networks. arXiv 2019, arXiv:1709.01507. [Google Scholar]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 2012, 25, 1097–1105. [Google Scholar] [CrossRef]
- Jiang, T.; Cheng, J. Target Recognition Based on CNN with LeakyReLU and PReLU Activation Functions. In Proceedings of the 2019 International Conference on Sensing, Diagnostics, Prognostics, and Control (SDPC), Beijing, China, 15–17 August 2019; pp. 718–722. [Google Scholar]
- Zheng, Z.; Wang, P.; Liu, W.; Li, J.; Ye, R.; Ren, D. Distance-IoU loss: Faster and better learning for bounding box regression. Proc. AAAI Conf. Artif. Intell. 2020, 34, 12993–13000. [Google Scholar] [CrossRef]
- Ma, S.; Xu, Y. MPDIoU: A Loss for Efficient and Accurate Bounding Box Regression. arXiv 2023, arXiv:2307.07662. [Google Scholar]
Software Environment | PyCharm2020 + Anaconda3 + Python3.8 + Pytorch1.18.0 | |
---|---|---|
hardware environment | Dell desktop computer | operating system: Windows11 |
processor: AMD Ryzen 7 5800H | ||
graphics card: NVIDIA GeForce RTX 4060 |
Network | [email protected] (%) | Param/M | Time (ms) |
---|---|---|---|
yolov8n | 93.5 | 3.01 | 1.0 |
yolov8n+CBAM | 93.8 | 3.08 | 1.9 |
yolov8n+ECA | 93.5 | 3.01 | 1.2 |
yolov8n+SE | 94.0 | 3.01 | 1.3 |
yolov8n+Pconv | 94.1 | 2.12 | 2.5 |
yolov8n+Pconv+SE | 94.3 | 2.12 | 1.5 |
yolov8n+Pconv+P-SE | 94.7 | 2.12 | 1.5 |
yolov8n+Pconv+P-SE+MPDIoU | 95.1 | 2.12 | 2.2 |
Evaluation Criteria | Proposed Algorithm | YOLOv5 | SSD | YOLOv4 | Faster RCNN |
---|---|---|---|---|---|
[email protected] (%) | 95.1 | 91 | 89 | 85 | 87 |
Time (ms) | 2.2 | 2.1 | 2.3 | 2.8 | 2.5 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Shang, S.; Cao, J.; Wang, Y.; Wang, M.; Zhao, Q.; Song, Y.; Gao, H. A Dynamic Interference Detection Method of Underwater Scenes Based on Deep Learning and Attention Mechanism. Biomimetics 2024, 9, 697. https://doi.org/10.3390/biomimetics9110697
Shang S, Cao J, Wang Y, Wang M, Zhao Q, Song Y, Gao H. A Dynamic Interference Detection Method of Underwater Scenes Based on Deep Learning and Attention Mechanism. Biomimetics. 2024; 9(11):697. https://doi.org/10.3390/biomimetics9110697
Chicago/Turabian StyleShang, Shuo, Jianrong Cao, Yuanchang Wang, Ming Wang, Qianchuan Zhao, Yuanyuan Song, and He Gao. 2024. "A Dynamic Interference Detection Method of Underwater Scenes Based on Deep Learning and Attention Mechanism" Biomimetics 9, no. 11: 697. https://doi.org/10.3390/biomimetics9110697
APA StyleShang, S., Cao, J., Wang, Y., Wang, M., Zhao, Q., Song, Y., & Gao, H. (2024). A Dynamic Interference Detection Method of Underwater Scenes Based on Deep Learning and Attention Mechanism. Biomimetics, 9(11), 697. https://doi.org/10.3390/biomimetics9110697