Open AccessArticle

Smart Monitoring Method for Land-Based Sources of Marine Outfalls Based on an Improved YOLOv8 Model

Shicheng Zhao

Haolan Zhou

^* and

Haiyan Yang

Department of Hydraulic Engineering, College of Water Conservancy and Civil Engineering, South China Agricultural University, Guangzhou 510642, China

Author to whom correspondence should be addressed.

Water 2024, 16(22), 3285; https://doi.org/10.3390/w16223285

Submission received: 9 September 2024 / Revised: 11 November 2024 / Accepted: 12 November 2024 / Published: 15 November 2024

(This article belongs to the Section Oceans and Coastal Zones)

Download

Browse Figures

Figure 1
Zhanjiang city outlets point map. (a) “gully”, (b) “weir”, (c) “pipe”, (d) “culvert”, (e) “gully”, (f) “weir”, (g) “pipe”, (h) “culvert”. "> Figure 2
YOLOv8 model structure. "> Figure 3
MSDA mechanism structure. The red points represent the key positions of the convolutional kernel, the yellow area shows the dilation of the kernel at <math display="inline"><semantics> <mrow> <mi mathvariant="normal">r</mi> <mo>=</mo> <mn>1</mn> </mrow> </semantics></math>, the blue area shows the dilation at <math display="inline"><semantics> <mrow> <mi mathvariant="normal">r</mi> <mo>=</mo> <mn>2</mn> </mrow> </semantics></math>, and the green area shows the dilation at <math display="inline"><semantics> <mrow> <mi mathvariant="normal">r</mi> <mo>=</mo> <mn>3</mn> </mrow> </semantics></math>. "> Figure 4
C2f module structure. "> Figure 5
DSConv selectable receptive fields. The blue line represents the continuous shift of the convolutional kernel in the horizontal direction, while the red line represents the continuous shift of the convolutional kernel in the vertical direction. "> Figure 6
Inner-MPDIoU diagram. "> Figure 7
(a) Anchor box category number statistics, (b) Anchor box position statistics. The color of Anchor box in (b) belongs to the same category as that in (a). "> Figure 8
(a) Normalized confusion matrices for YOLOv8 model, (b) normalized confusion matrices for YOLOv8+MSDA model. "> Figure 9
(a) YOLOv8 model’s predicted results, (b) our model’s predicted results. "> Figure 10
(a) P–R curve of the improved model, (b) P–R curve of the improved model after transfer learning. "> Figure 11
Model training process. ">

Versions Notes

Abstract

Land-based sources of marine outfalls are a major source of marine pollution. The monitoring of land-based sources of marine outfalls is an important means for marine environmental protection and governance. Traditional on-site manual monitoring methods are inefficient, expensive, and constrained by geographic conditions. Satellite remote sensing spectral analysis methods can only identify pollutant plumes and are affected by discharge timing and cloud/fog interference. Therefore, we propose a smart monitoring method for land-based sources of marine outfalls based on an improved YOLOv8 model, using unmanned aerial vehicles (UAVs). This method can accurately identify and classify marine outfalls, offering high practical application value. Inspired by the sparse sampling method in compressed sensing, we incorporated a multi-scale dilated attention mechanism into the model and integrated dynamic snake convolutions into the C2f module. This approach enhanced the model’s detection capability for occluded and complex-feature targets while constraining the increase in computational load. Additionally, we proposed a new loss calculation method by combining Inner-IoU (Intersection over Union) and MPDIoU (IoU with Minimum Points Distance), which further improved the model’s regression speed and its ability to predict multi-scale targets. The final experimental results show that the improved model achieved an mAP50 (mean Average Precision at 50) of 87.0%, representing a 3.4% increase from the original model, effectively enabling the smart monitoring of land-based marine discharge outlets.

Keywords:

YOLOv8; UAVs; monitoring land-based sources of marine outfall; object detection

1. Introduction

With the rapid development of the social economy and increasing human activities, the marine environment has been continuously deteriorating. The degradation of the marine ecosystem and the aggravation of eutrophication [1,2] have become factors that cannot be ignored in considering sustainable socioeconomic development [3,4]. In particular, protecting the marine environment has become an important task for achieving sustainable development [5]. The sources of marine pollution can be divided into three categories: land-based sources, air-based sources, and sea-based sources [6,7], among which land-based sources account for approximately 80% of total marine pollution [8,9], with the greatest impact. Land-based sources mainly include industrial wastewater [10,11], domestic sewage [12], and agricultural wastewater [13,14], all of which eventually discharge into the marine environment [15]. These wastewaters contain a large number of pollutants [16], including oily substances [17], heavy metals [18,19], and microplastics [20]. Wu Gang [21] and others found 704 chemicals, including 42 toxic substances, in wastewater samples from the Yangtze River Delta region. These toxic substances can lead to the degradation of water quality, which in turn affects the ecological status of fish communities [22]. In order to efficiently manage and protect the marine environment, it is necessary to effectively conduct the monitoring of three types of pollution sources [23], namely marine environmental monitoring for marine sources [24], atmospheric monitoring for air sources [25], and coastal monitoring for land-based sources [26]. Monitoring and managing land-based sources, especially sewage outlets, which are the primary direct pollution sources from the coast, is particularly important.

The traditional monitoring methods for land-based sources of sewage outlets into the sea are mainly categorized as manual inspections or satellite remote sensing-based spectral analysis. Manual inspections are time-consuming and labor-intensive, and it is difficult to conduct field inspections in complex environments such as swamps and shrubs. Satellite remote sensing-based spectral analysis enables the monitoring of discharge behavior through the different reflectances of CDOM (colored dissolved organic matter) and TSM (total suspended matter) in sewage plumes and normal seawater [27,28,29]; however, this method only monitors the effluent plume. The plume spreads along the coast during high tide, so it is not possible to accurately locate the source of the plume. This method is also limited by cloud cover and makes it difficult to monitor outfalls that are not discharging sewage during the time period of satellite photography [30]. The two existing methods are unable to meet the current demand for marine environmental protection. However, thanks to the rapid development of UAV technology and deep learning object detection technology, a more efficient and accurate monitoring solution has been provided for the aforementioned problems.

UAV technology is currently widely used in agriculture [31], transportation [32], disaster [33], search and rescue [34], and other fields. The flexibility of UAVs to be applied in various scenarios makes them the most suitable instrument to pair with object detection algorithms [35]. Widely used object detection algorithms are mainly divided into two categories: one-stage algorithms and two-stage algorithms. One-stage object detection algorithms include YOLO [36] (You Only Look Once) and SSD [37] (Single Shot MultiBox Detector). These detection algorithms directly predict the category and location of the target from the image, offering high detection speed and good accuracy. Two-stage algorithms include RCNN [38] (Region-based Convolutional Neural Networks) and Faster R-CNN [39]. These algorithms generate candidate regions first and then classify and refine these regions. This method is usually more accurate but slower compared to one-stage algorithms. Compared to two-stage algorithms, single-stage algorithms have a stronger advantage in real-time detection and monitoring applications.

Given the widespread applications of UAV and object detection algorithms, many scholars have conducted research on the object detection of sewage outlets. Huang Yaohuan [40] and others proposed an improved deep learning method (GDCNN-outfalls), based on the Faster R-CNN framework, to detect river sewage outlets using UAV imagery. After validating the performance of the new model, the F1 Score improved by 0.03, and the precision (P) increased by 2.1% compared to the original model. Subsequently, they proposed an improvement strategy for the previously proposed GDCNN-outfalls model [41], suggesting a deep learning model combining Faster R-CNN and YOLOv4 to further enhance the model’s detection speed, considering accuracy, GPU (Graphics Processing Unit) performance, and application needs. Huang Qingsong [42] and others proposed an AFENet (Attention-guided Feature Enhancement Network) for low-altitude UAV-based sewage outlets detection, achieving a detection accuracy of 75.6%. Yu Mingxin [43] and others applied a lightweight model combining CNN (Convolutional Neural Networks) and Transformer to detect sewage outlets in UAV aerial images, achieving a detection accuracy of 81.5%. Xu Haoran [44] and others developed a real-time UAV-based sewage outlets detection system based on the YOLOv5 model. Recent research has shown that sewer outlet detection algorithms based on drone imagery have significantly improved in detection accuracy and efficiency, with advancements in the YOLO series providing strong support for this field. In the YOLO series, YOLOv8 employs a multi-scale feature fusion strategy, effectively detecting targets at various scales, which enhances its capability in identifying sewer outlets of different sizes. However, YOLOv8 still has room for improvement in accurately locating the ROI (Region of Interest) for partially obscured and elongated sewer outlets with curved features. A common optimization approach is to implement global attention mechanisms and expand the receptive field, but this significantly increases the number of model parameters and computational demands, adding to the computational burden on drones. In practical optimizations, balancing the improvement in model performance with the increase in computational costs is a crucial consideration.

Compressed sensing is a key technology for reducing computational costs, fundamentally based on reconstructing signals through sparsity—namely, reconstructing the original signal with a sampling rate much lower than the Nyquist theorem requires, provided the signal is sparse [45]. This approach significantly reduces computational demands. Wu Yirui et al. proposed compressing the YOLO model using compressed sensing, reducing computational costs by pruning redundant parameters [46]. Cihan Bayındır applied compressed sensing to reconstruct time series data of von Kármán vortex coefficients with sparse characteristics [47]. Priyanka and colleagues used compressed sensing to compress medical images while preserving the ROI, thereby reducing redundant parameters in the images [48]. Inspired by the concept of sparse sampling in compressed sensing, we designed an efficient sparse feature extraction scheme, optimizing YOLOv8 with a multi-scale dilated attention mechanism and dynamic snake convolutions. This method uses dilated sparse convolution kernels to capture essential information from images while discarding less critical details, improving detection performance while managing computational costs.

The improvements made to the YOLOv8 model in this paper are as follows: First, we introduced the MSDA (multi-scale dilated attention) mechanism, enabling the model to focus more on the sewer outlet itself and reduce interference from the surrounding environment. Second, we integrated dynamic snake convolutions into the C2f (CSP Bottleneck with 2 Convolutions) module, which is adaptive and can adjust itself according to the shape of the sewage outlets to more accurately identify elongated and curved sewage outlets. Third, we designed a new IoU calculation method, called Inner-MPDIoU, combining Inner-IoU and MPDIoU, which can more accurately calculate the bounding box loss during model prediction, making the model regression faster and the detection results more accurate. Experimental results show that our model is very effective in achieving intelligent UAV monitoring of sewage outlets. The following sections of this paper will detail the improved YOLOv8 algorithm and the actual test results and analysis.

2. Materials and Methods

2.1. Study Area and Data Set

The study area of this paper is located on the southern coast of Zhanjiang City. Zhanjiang is a prefecture-level city in the southwestern part of Guangdong Province, situated at the southernmost tip of mainland China, between longitudes 109°40′ and 110°58′ E and between latitudes 20°13′ and 21°57′ N. The area mainly consists of peninsulas and islands, with five counties and four districts under Zhanjiang’s jurisdiction, all facing the marine environment. The total coastline length is 2023.6 km. Due to its developed industrial and aquaculture sectors, there are numerous sewage outlets [49]. Therefore, we selected 718 coastal sewage outlets in Zhanjiang as the data collection targets, with the locations of these outlets shown in Figure 1.

We collected 2190 images of sewer outlets using a DJ-Innovations Phantom 4 drone and a handheld camera. To ensure model robustness in complex environments, we adjusted lighting and contrast, added Gaussian noise, and applied motion blur to these images to simulate conditions encountered during field detection of sewer outlets, such as variations in ambient light, rain, and fog, and camera blur from drone movement due to wind. This expanded the data set to 4380 images. We annotated the expanded data set using the LabelImg tool, with labels categorized into four types: gully, weir, pipe, and culvert. From the annotated data set, we selected 380 images as the test set, with the remaining 4000 images split into training and validation sets in an 8: 2 ratio. In addition to preprocessing, we applied mosaic data augmentation during training to enhance the model’s generalization ability.

2.2. Methods and Model Establishment

2.2.1. YOLOv8 Model

YOLOv8 is a state-of-the-art (SOTA) model proposed by Ultralytics, based on the structure of the YOLOv5 model. The model architecture is shown in Figure 2 and is composed of three parts: Backbone, Neck, and Head.

The Backbone is responsible for feature extraction from images. Compared to the Backbone of YOLOv5, the main change is the replacement of the C3 (CSP Bottleneck with 3 convolutions) module with the C2f module. The CSP (cross stage partial) module establishes cross-stage connections between different levels of feature maps, promoting the flow of information and feature fusion, thereby improving the network’s representation ability and receptive field. The C2f module, while inheriting this characteristic, increases gradient flow, enhancing the model’s convergence speed and performance.

The Neck of the YOLOv8 model, an intermediate layer, is used to fuse features from the Backbone, further improving the model’s performance. This network employs the PANet (Path Aggregation Network) structure, which achieves effective aggregation and fusion of multi-scale features through top-down paths and lateral connections. This captures context information at different scales and retains detailed features, making the model perform better when handling multi-scale objects and boundary details.

The Head of the YOLOv8 model is the final layer, responsible for predicting the bounding boxes and classes of targets. It uses a multi-scale prediction strategy, performing target predictions on feature maps at different levels, enabling precise detection of objects of various sizes.

Based on the YOLOv8 model, this study focuses on the coastal land-based sources sewage outlets monitoring scenario. It proposes improvements in three aspects: attention mechanism, convolution module, and loss function. The following sections will detail these three improvement methods.

2.2.2. Multi-Scale Dilated Attention

The YOLOv8 model uses an anchor-free approach to determine the ROI. This method divides the image into a grid and predicts the target center point, height, and width for each grid cell, but it does not specifically focus on the grid cells containing the ROI. As a result, in complex outdoor environments, when the sewer outlet is partially obscured by vegetation, the model may fail to effectively focus on the outlet itself. The global attention mechanism addresses this issue well by assigning association weights between any positions in the image, establishing long-range contextual dependencies that allow the model to prioritize ROI with higher association weights. However, the computational complexity of global attention grows quadratically, resulting in a substantial computational burden. To enable the model to focus on the ROI while reducing computational costs, we incorporated a MSDA (Multi-scale Dilated Attention) mechanism into the model’s feature extraction module [50]. The network load of the MSDA mechanism is significantly lower than that of the global attention mechanism. As shown in Figure 3, MSDA uses a sliding window and dilated convolution. By limiting the number of keys calculated each time with the window and using dilated convolution to expand the receptive field without increasing the number of convolutional blocks, it greatly reduces the computational load on long sequences. MSDA employs a multi-head mechanism, dividing the feature map into n different heads along the channel dimension, and uses different dilation rates r in different heads to aggregate semantic information at different scales.

The formula for SWDA (Sliding Window Dilated Attention) is as follows:

X = S W D A (Q, K, V, r)

(1)

where

Q

K

, and

V

represent the query, key, and value matrices, respectively, with each row of the matrix representing a feature vector and

r

representing the dilation rate.

For the query at position

x_{i j}

, SWDA calculates the key and value sparsely selected in a sliding window of size

w \times w

centered at

(i, j)

with the dilation rate

r

. The output value

X

at position

x_{i j}

is as follows:

x_{i j} = A t t e n t i o n (q_{i j}, K_{r}, V_{r}), = S o f t m a x (\frac{q_{i j} K_{r}^{T}}{\sqrt{d_{k}}}) V_{r}, 1 \leq i \leq W, 1 \leq j \leq H

(2)

where

H

and

W

represent the height and width of the feature map, respectively,

K_{r}

and

V_{r}

are the sampled values from

K

and

V

, respectively, and

d_{k}

is the dimension of the head attention. The final MSDA calculation result is as follows:

h_{i} = S W D A (Q_{i}, K_{i}, V_{i}, r_{i}), 1 \leq i \leq n

(3)

X_{M S D A} = L i n e a r (C o n c a t [h_{1}, \dots, h_{n}])

(4)

where

h_{i}

represents the calculation results of multiple head attentions. In Equation (4),

C o n c a t

represents the concatenation of result vectors, and

L i n e a r

represents the linear transformation.

2.2.3. Dynamic Snake Convolution

In the morphology of sewer outlets, types like pipe and gully exhibit curved, elongated continuous features. These shapes often occupy a large square ROI in the image, while the actual features within the ROI take up a relatively small portion. This results in a significant amount of redundant computation, even with an increased receptive field. To address this, it is necessary to use deformable convolutions that adjust the receptive field according to target features, enabling the convolution to better adapt to the target’s characteristics. We therefore introduced a flexible dynamic snake convolution into the C2f module [51]. The structure of the improved C2f module is shown in Figure 4.

Unlike general deformable convolutions, which have more flexibility, the completely free deformation learning of convolutions often causes the receptive field to deviate from the target, especially when dealing with curved and elongated features. DSConv (Dynamic Snake Convolution) uses an iterative strategy in the design of the convolution kernel, adding continuity constraints where each convolution bias is accumulated based on the previous convolution bias, allowing for free offset while maintaining continuity. This approach better focuses on the core features of tubular structures.

As shown in Figure 5, DSConv linearizes a standard convolution kernel

K

of size

3 \times 3

along both the x-axis and y-axis. Taking the x-axis as an example, the linearized convolution kernel’s central grid coordinates are

K_{i} (x_{i}, y_{i})

. The coordinates of other grids are

K_{i \pm c} (x_{i \pm c}, y_{i \pm c})

, where

c = 0,1, 2,3, 4

indicates the distance from the central grid. In the convolution kernel

K

, the selection of each grid position

K_{i \pm c}

is an accumulation process. Starting from the central position

K_{i}

, the position of grids away from the center depends on the previous grid position, compared to

K_{i}

K_{i \pm c}

adds an offset

∆ = {δ | δ \in [- 1,1]}

. The specific offset formula for the x-axis and y-axes is as follows:

K_{i \pm c} = \{\begin{matrix} (x_{i + c}, y_{i + c}) = (x_{i} + c, y_{i} + \sum_{i}^{i + c} ∆ y), \\ (x_{i - c}, y_{i - c}) = (x_{i} - c, y_{i} + \sum_{i - c}^{i} ∆ y) \end{matrix}

(5)

K_{j \pm c} = \{\begin{matrix} (x_{j + c}, y_{j + c}) = (x_{j} + \sum_{j}^{j + c} ∆ x, y_{j} + c), \\ (x_{j - c}, y_{j - c}) = (x_{j} + \sum_{j - c}^{j} ∆ x, y_{j} - c) \end{matrix}

(6)

where

K_{i \pm c}

represents the offset coordinates in the x-axis direction and

K_{j \pm c}

represents the offset coordinates in the y-axis direction.

2.2.4. Inner-MPDIoU

In the sewage outlets object detection task, the BBR (bounding box regression) mode usually uses IoU as an evaluation metric and, as part of the loss function, helps the model adjust the anchor box position. However, when the anchor box does not overlap with the target box,

I o U = 0

, failing to reflect the proximity of the two boxes. Since IoU does not fully consider the geometric characteristics of bounding boxes, we designed the Inner-MPDIoU loss calculation method, which integrates Inner-IoU [52] and MPDIoU [53]. As shown in Figure 6, target box indicates the real location of the target and anchor box indicates the predicted location of the target; to measure the loss more precisely, MPDIoU minimizes the distance between the top-left and bottom-right points of the anchor box and the target box as an auxiliary condition for loss calculation. Meanwhile, Inner-IoU generates an auxiliary box using the

r a t i o

scale factor, namely InnerTarget Box and InnerAnchorBox: using a small-scale factor for high IoU and a large-scale factor for low IoU. The IoU value of the auxiliary box changes in line with the actual box during regression, and the IoU gradient of the auxiliary box is larger, accelerating the convergence of IoU.

The calculation formula for Inner-MPDIoU is as follows:

b_{l}^{g t} = x_{c}^{g t} - \frac{w^{g t} \times r a t i o}{2}, b_{r}^{g t} = x_{c}^{g t} + \frac{w^{g t} \times r a t i o}{2}

(7)

b_{t}^{g t} = y_{c}^{g t} - \frac{h^{g t} \times r a t i o}{2}, b_{b}^{g t} = y_{c}^{g t} + \frac{h^{g t} \times r a t i o}{2}

(8)

b_{l} = x_{c} - \frac{w^{p r d} \times r a t i o}{2}, b_{r} = x_{c} + \frac{w^{p r d} \times r a t i o}{2}

(9)

b_{t} = y_{c} - \frac{h^{p r d} \times r a t i o}{2}, b_{b} = y_{c} + \frac{h^{p r d} \times r a t i o}{2}

(10)

where (

x_{c}^{g t}, y_{c}^{g t}

) represents the center coordinates of the target box, (

b_{l}^{g t}, b_{t}^{g t}

) represents the top-left coordinates of the InnerTarget Box, (

b_{r}^{g t}, b_{b}^{g t}

) represents the bottom-right coordinates of the InnerTarget Box,

(x_{c}, y_{c}

) represents the center coordinates of the anchor box,

{(b}_{l}, b_{t})

represents the top-left coordinates of the InnerAnchor Box,

{(b}_{r}, b_{b})

represents the bottom-right coordinates of the InnerAnchor Box, and

r a t i o

represents the scale factor.

i n t e r = (\min (b_{r}^{g t}, b_{r}) - \max (b_{l}^{g t}, b_{l})) \times (\min (b_{b}^{g t}, b_{b}) - m a x (b_{t}^{g t}, b_{t}))

(11)

u n i o n = (w^{g t} \times h^{g t}) \times {(r a t i o)}^{2} + (w^{p r d} \times h^{p r d}) \times {(r a t i o)}^{2} - i n t e r

(12)

{I o U}^{i n n e r} = \frac{i n t e r}{u n i o n}

(13)

L_{I n n e r - I o U} = 1 - {I o U}^{i n n e r}

(14)

where

i n t e r

represents the overlap area of the InnerTarget Box and the InnerAnchor Box,

u n i o n

represents the union area of the InnerTarget Box and the InnerAnchor Box, and

L_{I n n e r - I o U}

represents the inner bounding box regression loss.

d_{1}^{2} = {(x_{1}^{p r d} - x_{1}^{g t})}^{2} + {(y_{1}^{p r d} - y_{1}^{g t})}^{2}

(15)

d_{2}^{2} = {(x_{2}^{p r d} - x_{2}^{g t})}^{2} + {(y_{2}^{p r d} - y_{2}^{g t})}^{2}

(16)

where

(x_{1}^{p r d} {, y}_{1}^{p r d})

represents the top-left coordinates of the anchor box, (

x_{1}^{g t}, y_{1}^{g t}

) represents the top-left coordinates of the target box, and

d_{1}

represents the distance between these two points.

(x_{2}^{p r d} {, y}_{2}^{p r d})

represents the bottom-right coordinates of the anchor box, (

x_{2}^{g t}, y_{2}^{g t}

) represents the bottom-right coordinates of the target box, and

d_{2}

represents the distance between these two points.

{I o U}^{M P D} = I o U - \frac{d_{1}^{2}}{w^{2} + h^{2}} - \frac{d_{2}^{2}}{w^{2} + h^{2}}

(17)

L_{M P D I o U} = 1 - {I o U}^{M P D}

(18)

L_{I n n e r - M P D I o U} = 1 - {I o U}^{i n n e r} + \frac{d_{1}^{2}}{w^{2} + h^{2}} + \frac{d_{2}^{2}}{w^{2} + h^{2}}

(19)

where

w

and

h

represent the width and height of the image,

L_{M P D I o U}

represents the MPD bounding box regression loss, and

L_{I n n e r - M P D I o U}

represents the final Inner-MPD bounding box regression loss.

3. Results Analysis and Discussion

3.1. Parameter Selection

The experimental environment for this study is configured as follows: the operating system is Windows 10 64-bit, the CPU (Central Processing Unit) is an AMD (Advanced Micro Devices) Ryzen 5 PRO 4650G, the GPU is a NVIDIA GeForce RTX 3060 with 12 GB of VRAM (Video Random Access Memory), the Python version is Python 3.8.18, and the PyTorch version is PyTorch 2.1.0. The hyperparameters set for the experiment are shown in Table 1.

In Table 1, epoch represents the number of training iterations, lr0 indicates the initial learning rate, lrf is the final learning rate adjustment factor, patience denotes the number of epochs for early stopping, batch specifies the training batch size, and optimizer refers to the optimization algorithm.

3.2. Evaluation Metrics

The evaluation metrics used in this experiment include P (Precision), R (Recall), mAP (mean Average Precision), Params (Parameters), ONNX (Open Neural Network Exchange) file size, GFLOPS (Giga Floating-point Operations Per Second), and FPS (Frames Per Second). The relevant formulas are as follows:

P = \frac{T P}{T P + F P}

(20)

R = \frac{T P}{T P + F N}

(21)

A P = \int_{0}^{1} P (R) d R

(22)

m A P = \frac{\sum_{i = 0}^{n} A P (i)}{n}

(23)

where TP (True Positive) represents the number of instances correctly predicted as positive, FP (False Positive) represents the number of instances incorrectly predicted as positive, and FN (False Negative) represents the number of instances incorrectly predicted as negative. AP (Average Precision) is the area under the PR (Precision–Recall) curve for each category, and mAP is the average of AP across all categories.

The ONNX file is used for deploying the model on embedded devices and contains the model’s parameters and architecture, representing the model’s size. GFLOPS indicate the number of floating-point operations required during a single forward pass, serving as a metric for the model’s computational complexity. This helps assess the model’s computational demands on different hardware platforms, ensuring the model choice aligns reasonably with available computational resources.

3.3. Baseline Model

YOLOv8 offers multiple versions (n, s, m, l, x) that adjust model depth and width to control the number of layers and channels per layer, with a set maximum number of channels, to meet varying computational resource needs. We trained all five versions of YOLOv8, with the experimental results shown in Table 2. YOLOv8x, YOLOv8l, and YOLOv8m require larger model sizes and higher floating-point operation counts, making them suitable for deployment in high-performance environments equipped with chips like GPUs, TPUs (Tensor Processing Units), or NPUs (Neural Processing Units). However, these versions are still not lightweight enough for embedded devices with limited resources. In contrast, the lightweight design and lower floating-point operation requirements of YOLOv8n and YOLOv8s make them more suitable for devices with restricted computational capabilities. Experimental results show that YOLOv8s achieves a 2.6% improvement in mAP50 compared to YOLOv8n, but its model size increases by 267% and its floating-point operation requirement by 242%, resulting in a substantial increase in resource consumption without a proportionate gain in performance. Therefore, we ultimately selected YOLOv8n as our baseline model to better balance performance with computational resource demands.

3.4. Ablation Experiment

In order to reflect the optimization effect of various improvements in this experiment on the model, we conducted ablation experiments. Taking YOLOv8n as the benchmark model, the three optimization methods of replacing the DSConv convolution module, changing the Inner-MPDIoU loss function, and adding the MSDA mechanism were implemented, respectively. They were compared and analyzed with the original model and the final model, and the experimental results are shown in Table 3.

As shown in Table 3, our three improvement methods all enhanced the average detection accuracy of YOLOv8 to varying degrees. DSConv dynamic convolution had a good feature extraction effect when dealing with the pipe and gully with curved and elongated shapes. The mAP50 increased by 1% and 0.9%, respectively, with a comprehensive increase of 1.5%.

It can be seen, by combining Figure 7a,b, that the four types of sewage outlets have aggregated different scales. Especially for the pipe (red), since it is divided into buried underground pipes and exposed above-ground pipes and there are a large number of them, almost all scales are covered. On the benchmark model, Inner-MPDIoU further optimized the loss algorithm of the Anchor Box and enhanced the model’s prediction ability for multi-scale targets. It had an improvement effect on the detection of common sewage outlets. The mAP50 of the pipe, gully, and culvert increased by 2.3%, 0.6%, and 2.7%, respectively, with a comprehensive increase of 1.4%.

It can be seen from the normalized confusion matrix of the original YOLOv8 model in Figure 8a that, due to the problem of the sewage outlets being blocked by bushes, the original YOLOv8 model is prone to confusing the sewage outlets with the background.

We found that, by comparing with the normalized confusion matrix of YOLOv8+MSDA in Figure 8b, the MSDA mechanism strengthened the model’s detection ability for occluded targets, increasing the mAP50 of the three types of easily occluded sewage outlets, pipe, culvert, and weir by 1.2%, 2.8%, and 1.3%, respectively, for the YOLOv8 network, and the confusion coefficients decreased by 0.05, 0.04, and 0.02, respectively.

Figure 9a,b show the comparison of the results obtained after prediction by the original YOLOv8 model and our improved final model. It can be seen that our model not only improved the confidence of the detection results but also reduced the probability of false detection.

3.5. Comparison Experiments

Considering the low computing performance of UAVs, we selected some lightweight SOTA (state of the art) models for comparative experiments, including several generations of YOLO series models with better detection effects and the currently popular target detection model RT-DETR (Real-Time DEtection TRansformer). In addition, we also conducted a comparative experiment between the new model’s training from scratch and transfer learning training.

As shown in Table 4, we compared the performance of several models in terms of Precision, Recall, mAP50, model parameters (Params) and detection speed (FPS). First, we compared the RT-DETR detection model with the YOLO series models. The Precision of the RT-DETR model was 92.5%, and the Recall was 78.1%, both of which are better than other YOLO series models. The mAP50 was 84.1%, only behind YOLOv3n. However, RT-DETR had the largest number of Params, which is more than five times that of the YOLOv8 model, and its FPS was also the slowest, far behind other models. The Precision of the YOLOv8 model was 89.2%, the Recall was 76.1%, and the mAP50 was 83.6%. The three data performed well, its Params were only more numerous than YOLOv5, and YOLOv8 also achieved the fastest detection speed of 322 FPS.

When we compared our model with the YOLOv8 model, the Precision, Recall, and mAP50 increased by 2.4%, 1.2%, and 2.1%, respectively, but the Params increased, and the FPS decreased significantly. To address this issue, we adopted the method of transfer learning, which enables the presetting of the parameters of certain model layers, thereby reducing the need for extensive parameter adjustments.

After transfer learning, our model maintained the same Params as the model trained from scratch, while FPS increased by 29.1%. The Recall reached 80.6%, and mAP50 reached 87.0%, both achieving optimal results. Comparing Figure 10a,b, the new model trained with transfer learning showed mAP50 improvements for three types of sewer outlets: pipe, gully, and culvert, with increases of 3.3%, 0.5%, and 1.3%, respectively, though weir decreased by 0.1%. Overall, this resulted in a combined improvement of 1.1%.

In combination with the training process shown in Figure 11, it can be seen that, compared with the YOLOv8 benchmark model, the new transfer learning model not only improves the performance of the model, but also improves the performance of mAP50 by 3.4%. Moreover, the early convergence speed of the model is faster, which greatly improves the training efficiency of the model.

4. Conclusions

This paper proposes an improved model based on YOLOv8 for the intelligent monitoring of land-based sources of marine sewage outlets using UAVs. Inspired by sparse sampling in compressed sensing, the new model optimizes performance while constraining computational cost growth through three enhancements: introducing an MSDA mechanism, integrating flexible dynamic snake convolution with a wider receptive field into the C2f module, and proposing a novel IoU loss calculation method. The experimental results indicate that the new model improves Precision by 1.2%, Recall by 4.5%, and mAP50 by 3.4%, outperforming the original and other SOTA models in detection accuracy. Additionally, the model demonstrates a stronger capability in detecting obscured or elongated outfalls and a better performance with multi-scale outfall targets, with faster regression during training and an ideal detection speed, meeting the requirements for UAV monitoring of land-based sources of marine outfalls. Future research could focus on multispectral images of outfalls, in order to explore the application value of multispectral remote sensing in the monitoring of land-based sources of marine outfall monitoring.

Author Contributions

Methodology, S.Z.; Data curation, S.Z.; Writing—original draft, S.Z.; Writing—review & editing, H.Z. and H.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This work was financially supported by the National Natural Science Foundation of China (grant no. 42271011).

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author/s.

Conflicts of Interest

The authors declare no conflict of interest.

References

Lincoln, S.; Chowdhury, P.; Posen, P.E.; Robin, R.S.; Ramachandran, P.; Ajith, N.; Harrod, O.; Hoehn, D.; Harrod, R.; Townhill, B.L. Interaction of climate change and marine pollution in Southern India: Implications for coastal zone management practices and policies. Sci. Total Environ. 2023, 902, 19. [Google Scholar] [CrossRef]
Liang, H.R.; Zhang, J.X.; Zhang, J.B.; Zhang, P.; Deng, X.; Chen, J.Y.; Wang, Z.L.; Long, C.; Lu, C.Q.; Wang, D.; et al. Unveiling the eutrophication crisis: 20 years of nutrient development in Zhanjiang Bay, China. Front. Mar. Sci. 2024, 11, 16. [Google Scholar] [CrossRef]
Yuan, B.; Cui, Y.H.; An, D.; Jia, Z.X.; Ding, W.Y.; Yang, L. Marine environmental pollution and offshore aquaculture structure: Evidence from China. Front. Mar. Sci. 2023, 9, 15. [Google Scholar] [CrossRef]
Ma, J.T.; Wu, Z.J.; Guo, M.Q.; Hu, Q.G. Dynamic relationship between marine fisheries economic development, environmental protection and fisheries technological Progress-A case of coastal provinces in China. Ocean Coast. Manag. 2024, 247, 13. [Google Scholar] [CrossRef]
Qian, X.L. Research on the coordinated development model of marine ecological environment protection and economic sustainable development. J. Sea Res. 2023, 193, 10. [Google Scholar] [CrossRef]
Ma, J.W.; Ma, R.F.; Pan, Q.; Liang, X.J.; Wang, J.Q.; Ni, X.X. A Global Review of Progress in Remote Sensing and Monitoring of Marine Pollution. Water 2023, 15, 3491. [Google Scholar] [CrossRef]
Szopinska, M.; Luczkiewicz, A.; Jankowska, K.; Fudala-Ksiazek, S.; Potapowicz, J.; Kalinowska, A.; Bialik, R.J.; Chmiel, S.; Polkowska, Z. First evaluation of wastewater discharge influence on marine water contamination in the vicinity of Arctowski Station (Maritime Antarctica). Sci. Total Environ. 2021, 789, 10. [Google Scholar] [CrossRef]
Chen, R.; Zhao, X.; Wu, X.; Wang, J.; Wang, X.; Liang, W. Research progress on occurrence characteristics and source analysis of microfibers in the marine environment. Mar. Pollut. Bull. 2024, 198, 115834. [Google Scholar] [CrossRef]
Rangel-Buitrago, N.; Galgani, F.; Neal, W.J. Addressing the global challenge of coastal sewage pollution. Mar. Pollut. Bull. 2024, 201, 4. [Google Scholar] [CrossRef]
Ahmad, F.; Morris, K.; Law, G.T.W.; Taylor, K.G.; Shaw, S. Fate of radium on the discharge of oil and gas produced water to the marine environment. Chemosphere 2021, 273, 11. [Google Scholar] [CrossRef]
Paliaga, P.; Felja, I.; Budiša, A.; Ivančić, I. The impact of a fish cannery wastewater discharge on the bacterial community structure and sanitary conditions of marine coastal sediments. Water 2019, 11, 2566. [Google Scholar] [CrossRef]
Magnuson, J.T.; Sydnes, M.O.; Ræder, E.M.; Schlenk, D.; Pampanin, D.M. Transcriptomic profiles of brains in juvenile Atlantic cod (Gadus morhua) exposed to pharmaceuticals and personal care products from a wastewater treatment plant discharge. Sci. Total Environ. 2024, 912, 169110. [Google Scholar] [CrossRef]
Wu, Q.; Ma, H.; Su, Z.; Lu, W.; Ma, B. Impact of marine aquaculture wastewater discharge on microbial diversity in coastal waters. Reg. Stud. Mar. Sci. 2022, 56, 102702. [Google Scholar] [CrossRef]
Lenzi, M.; Leporatti Persiano, M.; Ciarapica, M.; D’Agostino, A. Use of Zeolite (Chabazite) Supplemented with Effective Microorganisms for Wastewater Mitigation of a Marine Fish Farm. Sustainability 2024, 16, 1353. [Google Scholar] [CrossRef]
Alamgir, A.; Khan, M.A.; Fatima, N.; Fatima, S.U. Impact of domestic and industrial effluent on marine environment at Karachi Port Trust (KPT) coastal area, Pakistan. Environ. Monit. Assess. 2023, 195, 1308. [Google Scholar] [CrossRef]
Zhu, X.; Liu, S.; Gao, X.; Gu, Y.; Yu, Y.; Li, M.; Chen, X.; Fan, M.; Jia, Y.; Tian, L. Typical emerging contaminants in sewage treatment plant effluent, and related watersheds in the Pearl River Basin: Ecological risks and source identification. J. Hazard. Mater. 2024, 476, 135046. [Google Scholar] [CrossRef]
Conde Molina, D.; Di Gregorio, V. Enhancing biodegradation of vegetable oil-contaminated soil with soybean texturized waste, spent mushroom substrate, and stabilized poultry litter in microcosm systems. World J. Microbiol. Biotechnol. 2024, 40, 237. [Google Scholar] [CrossRef]
Yi, S.; Song, Z.; Lin, J.; Liu, W.; Li, B. Distribution, sources and influencing factors of heavy metals in the Ledong Sea, South China Sea. Mar. Pollut. Bull. 2024, 202, 116396. [Google Scholar] [CrossRef]
Ameen, F.; Alsarraf, M.J.; Abalkhail, T.; Stephenson, S.L. Tannery effluent treatments with mangrove fungi, grass root biomass, and biochar. World J. Microbiol. Biotechnol. 2024, 40, 249. [Google Scholar] [CrossRef]
Liu, B.-J.; Ou, X.-M.; Ye, K.-M.; Wei, N.; Lu, Y.; Sun, K.-F. Occurrence and Risk Assessment of Microplastics in the Coastal Seawater of Guangdong Province. Huan Jing Ke Xue= Huanjing Kexue 2024, 45, 3911–3918. [Google Scholar]
Wu, G.; Zhu, F.; Zhang, X.; Ren, H.; Wang, Y.; Geng, J.; Liu, H. PBT assessment of chemicals detected in effluent of wastewater treatment plants by suspected screening analysis. Environ. Res. 2023, 237, 116892. [Google Scholar] [CrossRef] [PubMed]
Markert, N.; Guhl, B.; Feld, C.K. Linking wastewater treatment plant effluents to water quality and hydrology: Effects of multiple stressors on fish communities. Water Res. 2024, 260, 121914. [Google Scholar] [CrossRef] [PubMed]
Cheng, R.; Wang, S.; Sun, L.; Gao, Y. A Study of the Marine Environment Monitoring Technology. J. Coast. Res. 2020, 107, 189–192. [Google Scholar] [CrossRef]
Xu, J.; Pan, X.; Jia, B.; Wu, X.; Liu, P.; Li, B. Oil spill detection using LBP feature and K-means clustering in shipborne radar image. J. Mar. Sci. Eng. 2021, 9, 65. [Google Scholar] [CrossRef]
Doney, S.C.; Wolfe, W.H.; McKee, D.C.; Fuhrman, J.G. The science, engineering, and validation of marine carbon dioxide removal and storage. Annu. Rev. Mar. Sci. 2024. [Google Scholar] [CrossRef]
Ciappa, A.C. Marine plastic litter detection offshore Hawai’i by Sentinel-2. Mar. Pollut. Bull. 2021, 168, 112457. [Google Scholar] [CrossRef]
Harringmeyer, J.P.; Kaiser, K.; Thompson, D.R.; Gierach, M.M.; Cash, C.L.; Fichot, C.G. Detection and sourcing of CDOM in urban coastal waters with UV-visible imaging spectroscopy. Front. Environ. Sci. 2021, 9, 647966. [Google Scholar] [CrossRef]
Gancheva, I.; Peneva, E.; Slabakova, V. Detecting the surface signature of riverine and effluent plumes along the bulgarian black sea coast using satellite data. Remote. Sens. 2021, 13, 4094. [Google Scholar] [CrossRef]
Mamidisetti, H.; Vijay, R. Dynamics of sewage outfall plumes based on Landsat-8-derived sea surface salinity and tidal characteristics. Environ. Sci. Pollut. Res. 2023, 30, 82311–82325. [Google Scholar] [CrossRef] [PubMed]
Faria, B.; Mendes, R.; Lopes, C.L.; Picado, A.; Sousa, M.; Dias, J.M. Insights for sea outfall turbid plume monitoring with high-spatial-resolution satellite imagery application in Portugal. Remote. Sens. 2023, 15, 3368. [Google Scholar] [CrossRef]
Joshi, P.; Sandhu, K.S.; Dhillon, G.S.; Chen, J.; Bohara, K. Detection and monitoring wheat diseases using unmanned aerial vehicles (UAVs). Comput. Electron. Agric. 2024, 224, 109158. [Google Scholar] [CrossRef]
Bakirci, M. Utilizing YOLOv8 for enhanced traffic monitoring in intelligent transportation systems (ITS) applications. Digit. Signal Process. 2024, 152, 104594. [Google Scholar] [CrossRef]
Ramadan, M.N.; Basmaji, T.; Gad, A.; Hamdan, H.; Akgün, B.T.; Ali, M.A.; Alkhedher, M.; Ghazal, M. Towards early forest fire detection and prevention using AI-powered drones and the IoT. Internet Things 2024, 27, 101248. [Google Scholar] [CrossRef]
Lyu, M.; Zhao, Y.; Huang, C.; Huang, H. Unmanned Aerial Vehicles for Search and Rescue: A Survey. Remote. Sens. 2023, 15, 3266. [Google Scholar] [CrossRef]
Cao, Z.; Kooistra, L.; Wang, W.; Guo, L.; Valente, J. Real-time object detection based on uav remote sensing: A systematic literature review. Drones 2023, 7, 620. [Google Scholar] [CrossRef]
Redmon, J. You only look once: Unified, real-time object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
Wei, L.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.-Y.; Berg, A.C. Ssd: Single shot multibox detector. In Proceedings of the Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; Proceedings, Part I 14. pp. 21–37. [Google Scholar]
Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed]
Huang, Y.; Wu, C.; Yang, H.; Zhu, H.; Chen, M.; Yang, J. An improved deep learning approach for retrieving outfalls into rivers from uas imagery. IEEE Trans. Geosci. Remote. Sens. 2021, 60, 1–14. [Google Scholar] [CrossRef]
Huang, Y.; Wu, C. Evaluation of Deep Learning Benchmarks in Retrieving Outfalls into Rivers with UAS Images2023. Available online: https://ieeexplore.ieee.org/abstract/document/10185073 (accessed on 17 July 2023).
Huang, Q.; Fan, J.; Xu, H.; Han, W.; Huang, X.; Chen, Y. AFENet: Attention-guided feature enhancement network and a benchmark for low-altitude UAV sewage outfall detection. Array 2024, 22, 100343. [Google Scholar] [CrossRef]
Yu, M.; Zhang, J.; Zhu, L.; Liang, S.; Lu, W.; Ji, X. An Intelligent System for Outfall Detection in UAV Images Using Lightweight Convolutional Vision Transformer Network. IEEE J. Sel. Top. Appl. Earth Obs. Remote. Sens. 2024, 17, 6265–6277. [Google Scholar] [CrossRef]
Xu, H.; Huang, Q.; Yang, Y.; Li, J.; Chen, X.; Han, W.; Wang, L. UAV-ODS: A real-time outfall detection system based on UAV remote sensing and edge computing. In Proceedings of the 2022 IEEE International Conference on Unmanned Systems (ICUS), Guangzhou, China, 28–30 October 2022; pp. 01–09. [Google Scholar]
Ahmed, I.; Khalil, A.; Ahmed, I.; Frnda, J. Sparse signal representation, sampling, and recovery in compressive sensing frameworks. IEEE Access 2022, 10, 85002–85018. [Google Scholar] [CrossRef]
Wu, Y.; Meng, Z.; Palaiahnakote, S.; Lu, T. Compressing YOLO network by compressive sensing. In Proceedings of the 2017 4th IAPR Asian Conference on Pattern Recognition (ACPR), Nanjing, China, 26–29 November 2017; pp. 19–24. [Google Scholar]
Bayındır, C.; Namlı, B. Efficient sensing of von Kármán vortices using compressive sensing. Comput. Fluids 2021, 226, 104975. [Google Scholar] [CrossRef]
Baranwal, N.; Singh, K.N.; Singh, A.K. YOLO-based ROI selection for joint encryption and compression of medical images with reconstruction through super-resolution network. Future Gener. Comput. Syst. 2024, 150, 1–9. [Google Scholar]
Yu, G.; Zhong, Y.; Fu, D.; Chen, F.; Chen, C. Remote sensing estimation of δ15NPN in the Zhanjiang Bay using Sentinel-3 OLCI data based on machine learning algorithm. Front. Mar. Sci. 2024, 11, 1366987. [Google Scholar] [CrossRef]
Jiao, J.; Tang, Y.-M.; Lin, K.-Y.; Gao, Y.; Ma, A.J.; Wang, Y.; Zheng, W.-S. Dilateformer: Multi-scale dilated transformer for visual recognition. IEEE Trans. Multimed. 2023, 25, 8906–8919. [Google Scholar] [CrossRef]
Qi, Y.; He, Y.; Qi, X.; Zhang, Y.; Yang, G. Dynamic snake convolution based on topological geometric constraints for tubular structure segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 2–6 October 2023; pp. 6070–6079. [Google Scholar]
Zhang, H.; Xu, C.; Zhang, S. Inner-IoU: More effective intersection over union loss with auxiliary bounding box. arXiv 2023, arXiv:2311.02877. [Google Scholar]
Ma, S.; Xu, Y. Mpdiou: A loss for efficient and accurate bounding box regression. arXiv 2023, arXiv:2307.07662. [Google Scholar]

Figure 1. Zhanjiang city outlets point map. (a) “gully”, (b) “weir”, (c) “pipe”, (d) “culvert”, (e) “gully”, (f) “weir”, (g) “pipe”, (h) “culvert”.

Figure 2. YOLOv8 model structure.

Figure 3. MSDA mechanism structure. The red points represent the key positions of the convolutional kernel, the yellow area shows the dilation of the kernel at

r = 1

, the blue area shows the dilation at

r = 2

, and the green area shows the dilation at

r = 3

Figure 3. MSDA mechanism structure. The red points represent the key positions of the convolutional kernel, the yellow area shows the dilation of the kernel at

r = 1

, the blue area shows the dilation at

r = 2

, and the green area shows the dilation at

r = 3

Figure 4. C2f module structure.

Figure 5. DSConv selectable receptive fields. The blue line represents the continuous shift of the convolutional kernel in the horizontal direction, while the red line represents the continuous shift of the convolutional kernel in the vertical direction.

Figure 6. Inner-MPDIoU diagram.

Figure 7. (a) Anchor box category number statistics, (b) Anchor box position statistics. The color of Anchor box in (b) belongs to the same category as that in (a).

Figure 8. (a) Normalized confusion matrices for YOLOv8 model, (b) normalized confusion matrices for YOLOv8+MSDA model.

Figure 9. (a) YOLOv8 model’s predicted results, (b) our model’s predicted results.

Figure 10. (a) P–R curve of the improved model, (b) P–R curve of the improved model after transfer learning.

Figure 11. Model training process.

Table 1. Hyperparameter settings.

Hyperparameter Options	Setting
epoch	200
lr0	0.01
lrf	0.01
patience	20
batch	4
optimizer	SGD (Stochastic Gradient Descent)

Table 2. Comparative experimental results of different versions of models.

Model	mAP50/%	Params	ONNX/MB	GFLOPS
YOLOv8n	83.6	3,157,200	11.6	8.9
YOLOv8s	86.2	11,166,560	42.6	28.8
YOLOv8m	87.6	25,902,640	98.7	79.3
YOLOv8l	87.7	43,691,520	166	165.7
YOLOv8x	88.1	68,229,648	260	258.5

Table 3. Ablation experiment results.

Model	mAP50 (Pipe)	mAP50 (Gully)	mAP50 (Culvert)	mAP50 (Weir)	mAP50 (All)
YOLOv8n	78	84.5	83.3	88.8	83.6
+DSConv	79	85.4	86.6	89.5	85.1
+Inner-MPDIoU	80.3	85.1	86	88.8	85
+MSDA	79.2	83.6	86.1	90.1	84.8
+ALL	79.6	85.3	87.7	90.3	85.7

Table 4. Lightweight model comparison experiments.

Model	P/%	R/%	mAP50/%	Params	FPS
RT-DETR	92.5	78.1	84.1	15,492,984	116
YOLOv3n	90.8	76.7	84.2	4,054,580	238
YOLOv5n	86.6	75.7	82.3	2,503,724	286
YOLOv6n	84.9	75.8	81.1	4,234,140	270
YOLOv8n	89.2	76.1	83.6	3,006,428	322
Ours	91.6	77.3	85.7	3,881,640	172
Ours(transfer)	90.4	80.6	87.0	3,881,640	222

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhao, S.; Zhou, H.; Yang, H. Smart Monitoring Method for Land-Based Sources of Marine Outfalls Based on an Improved YOLOv8 Model. Water 2024, 16, 3285. https://doi.org/10.3390/w16223285

AMA Style

Zhao S, Zhou H, Yang H. Smart Monitoring Method for Land-Based Sources of Marine Outfalls Based on an Improved YOLOv8 Model. Water. 2024; 16(22):3285. https://doi.org/10.3390/w16223285

Chicago/Turabian Style

Zhao, Shicheng, Haolan Zhou, and Haiyan Yang. 2024. "Smart Monitoring Method for Land-Based Sources of Marine Outfalls Based on an Improved YOLOv8 Model" Water 16, no. 22: 3285. https://doi.org/10.3390/w16223285

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Smart Monitoring Method for Land-Based Sources of Marine Outfalls Based on an Improved YOLOv8 Model

Abstract

1. Introduction

2. Materials and Methods

2.1. Study Area and Data Set

2.2. Methods and Model Establishment

2.2.1. YOLOv8 Model

2.2.2. Multi-Scale Dilated Attention

2.2.3. Dynamic Snake Convolution

2.2.4. Inner-MPDIoU

3. Results Analysis and Discussion

3.1. Parameter Selection

3.2. Evaluation Metrics

3.3. Baseline Model

3.4. Ablation Experiment

3.5. Comparison Experiments

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI