1. Introduction
Oil spills are typically defined as the discharge of liquid petroleum hydrocarbons into the environment as a result of outside effects, and they mainly occur in aquatic environments, on ice, or on land during various stages of the oil lifecycle, including exploration, production, transportation, refining, storage, and distribution [
1,
2]. Marine oil spills represent a significant threat to marine ecosystems, posing severe risks to the environment, fisheries, wildlife, and various socio-economic interests. Effective and timely monitoring of marine oil spills is crucial for mitigating these adverse impacts. In 2023, the global maritime community recorded one major oil spill exceeding 700 tons and nine medium spills ranging between 7 and 700 tons. Statistical recordings indicate that the major incident, which involved heavy fuel oil, occurred in Asia in February. Also, the other major incidents involved spills of very low-sulfur fuel oil, crude oil, and gasoline. These incidents highlight the persistent risk posed by oil spills, maintaining the decade average at approximately 6.8 spills per year involving over 7 tons of oil. Marine oil spill detection and monitoring have traditionally relied on satellite imagery, aerial surveillance, and manual reporting. The total volume of oil lost to the environment from tanker spills in 2023 was estimated to be around 2000 tons, underscoring the need for improving spill response and monitoring technologies [
3].
Oil products entering the marine environment can have a wide array of long-term environmental impacts, influenced by their chemical and physical properties, as well as their concentration and other environmental factors. Key properties of oil, such as surface tension, specific gravity, and viscosity, alongside variables related to the timing and location of spills, the volume of oil released, and atmospheric conditions, significantly affect the behavior of oil in water. When different types of oil are spilled on land or water, numerous physical, chemical, and biological degradation processes are initiated. Upon the release of crude oil, it spreads to form a thin film on the water surface, known as an oil slick.
Marine oil spill detection and monitoring have traditionally relied on satellite imagery, aerial surveillance, and manual reporting. However, these methods can be limited by factors such as weather conditions, coverage area, and response time. Recent advancements in deep learning and computer vision offer promising alternatives for enhancing the detection accuracy and timeliness of oil spill monitoring systems.
Previous studies have demonstrated that synthetic aperture radar (SAR), hyperspectral remote sensing, and multispectral remote sensing technologies are effective in detecting oil slicks [
4,
5,
6]. SAR technology, in particular, is capable of distinguishing oil-contaminated seawater from uncontaminated seawater due to the differences in their backscattering coefficients. However, SAR images are prone to false alarms, as they cannot effectively differentiate oil slicks from other phenomena such as planktonic algae, shallow sea areas, and current trajectories. Moreover, the detection performance of SAR is significantly influenced by wind speed, which can affect the accuracy and reliability of the observations. In recent years, the advancement of polarimetric synthetic aperture radar (PolSAR) technology has significantly enhanced the capabilities of remote sensing. PolSAR images are acquired by transmitting and receiving electromagnetic (EM) waves across multiple polarizations, providing more detailed information about the target surface. This multi-polarization approach improves the ability to characterize and distinguish different surface materials, making PolSAR methods a valuable tool in environmental monitoring, including oil spill detection. So, in recent years, PolSAR oil spill detection has become a popular research topic, and many studies have conducted the effectiveness of PolSAR method performance in detecting oil spills at sea surface [
7,
8]. Despite the advancements in PolSAR method applications in oil spill detection, the automatic extraction and selection of features for object detection and classification still remain significant challenges. These complexities arise due to the intricate nature of PolSAR data, which requires sophisticated algorithms to accurately interpret the varied polarimetric signatures. In general, the accuracy of classification is highly dependent on the quality of the extracted features. Some researchers contributed significantly to feature extraction-based oil spill detection. For example, Wenguang et al. [
9] proposed integration for four commonly used polarimetric features to develop a novel feature specifically designed for extracting oil spill information from PolSAR images. Similarly, Song et al. [
10] analyzed multiple responsiveness to small targets at sea. Skrunes et al. [
11] conducted an extensive analysis of various polarimetric features, including geometric intensity and the real part of the co-polarization cross product (
), to enhance oil spill detection through feature combinations. However, the process of artificial feature extraction is both complex and time-consuming, often demanding substantial effort and extensive domain knowledge.
Recently, deep learning has experienced rapid development and has been extensively applied across various domains of computer vision. Its robust feature extraction capabilities allow deep learning techniques to directly derive higher-level features from raw data, demonstrating greater efficiency compared to traditional machine learning methods and enabling it to automatically extract more descriptive and representative features. This enhances performance, making deep learning a powerful tool for improving the accuracy and efficiency of object detection and classification tasks [
12]. Deng et al. [
13] proposed an ocean oil spill detection model utilizing the AlexNet architecture. Their approach involved cropping SAR remote sensing images into smaller segments and applying small block classification instead of semantic segmentation. Similarly, Yang et al. [
14] explored the application of the Yolov4 algorithm for marine oil spill detection, particularly under challenging conditions such as shadows and insufficient lighting. However, deep learning methods used in these studies require the pre-setting of anchors, as they are based on anchor-driven algorithms. The marine environment’s complexity and variability pose significant challenges for detecting oil spills as the shapes and sizes of oil slicks can vary dramatically. This variability makes it difficult to determine suitable anchor sizes, potentially leading to suboptimal detection performance. Moreover, ref. [
14] mentions that since the deep learning approaches can effectively mine the rich features of PolSAR images, it cannot be avoided the issues related to effective integration of multi-layer features.
Therefore, to enhance the detection of oil spills in marine environments, we developed a custom dataset composed of high-resolution images distinct from conventional SAR imagery, sourced from various internet repositories. Each image in this dataset was meticulously labeled using a semantic segmentation approach, ensuring precise delineation of oil spills. We then fine-tuned the YOLOv8 segmentation model with the dataset and trained the model to achieve optimal detection performance.
This paper makes three significant contributions to the field of marine oil spill detection:
Creation of a custom oil spill dataset.
A deep YOLO-v8-based SOTA oil spill detection model is constructed.
Optimization of oil spill detection with integration of K-means and Truncated Linear Stretching methods.
This research provides a detailed account of the dataset preparation, model training, and evaluation processes, offering valuable insights into the application of oil spill detection by segmentation model for ocean environment monitoring and disaster management. We compiled a comprehensive dataset of high-resolution images depicting oil spill images, which in comparison outperform SAR images-based oil spill detection if we compare with [
13]. This dataset was systematically labeled using semantic segmentation techniques, which ensures high accuracy in the representation of oil spill regions. Fine-tuning the YOLOv8 segmentation model with a custom dataset, model training involved extensive training and validation to ensure the model’s robustness and accuracy in diverse marine environments. We further enhanced the detection capabilities of our model by incorporating advanced image processing techniques. These optimizations are mainly focused on improving mode’s performance by avoiding noisy input data, such as different lighting scenarios and varying oil spill shapes and sizes, ensuring high and reliable detection accuracy in real-world applications.
The remainder of this paper is structured as follows:
In
Section 2, we provide a comprehensive review of related research works, detailing the various methods, datasets, and algorithms previously employed for detecting oil spills in marine environments. This section critically examines the strengths and limitations of existing approaches, highlighting the advancements in technology and methodologies relevant to our work. In
Section 3, we elaborate on the contributions of our research. This includes a detailed account of our data collection process, emphasizing the creation of a high-quality, custom oil spill dataset from various internet sources. We also describe the labeling process using semantic segmentation and provide an in-depth explanation of the fine-tuning and training procedures for the YOLOv8 segmentation model.
Section 4 presents the experimental results and analysis of our study. We conduct comparative analysis to evaluate the performance of our model against SOTA methods. Also, this section includes quantitative metrics and visual examples to illustrate the effectiveness and robustness of our approach under different conditions. Finally, in
Section 5, we conclude our research by summarizing the key findings and contributions with potential directions for further improvements and expansions of oil spill monitoring systems.
2. Related Works
There are two types of data that most of the research works extensively use, such as SAR and optical images. Over the past few decades, remotely sensed data have been extensively used to detect and monitor oil spills. Optical images are utilized less frequently in oil spill studies compared to microwave images, primarily due to their dependency on weather conditions and daylight availability. The spectral characteristics of oil spills can vary significantly depending on various factors, including the physical properties of the oil, film thickness, weather and illumination conditions, and the optical properties of the water column. This variability poses challenges for the consistent and reliable detection of oil spills using optical imaging techniques, making microwave imagery a more robust choice for such applications under a broader range of environmental conditions [
15]. The use of multispectral data for oil spill detection is increasingly prevalent, with various satellite datasets of varied resolutions being employed in numerous studies. Researchers have extensively utilized moderate-resolution imaging spectroradiometers, such as those on MODIS [
16,
17,
18,
19], Landsat [
20,
21,
22], KOMPSAT-2 [
23], and Gaofen-1 [
22,
24] satellite datasets. Based on the above, most common dataset various machine learning (ML) models have been developed to detect oil spills and differentiate between oil slicks and lookalikes. These models leverage optical and SAR images to provide efficient monitoring solutions aimed at solving the impact of oil spills. In this review, ML methods for oil spill detection are categorized into traditional ML techniques and deep learning models. Traditional ML classification models widely used for oil detection from optical and SAR images, including artificial neural networks (ANNs), support vector machines (SVMs), k-nearest neighbors (KNNs), random forest, and several other models [
25,
26,
27].
A lot of studies have focused on the detection of marine oil spills using SAR images. Yu et al. [
28] developed an approach that integrates region generation with edge detection and threshold segmentation methods. They introduced an adaptive mechanism based on the Otsu method to enhance detection accuracy. The proposed algorithm was validated using remote sensing images from the Bohai Sea and Dalian Bay captured images, demonstrating the reliability and effectiveness of their approach. Zhang et al. [
29] developed a method to map oil spills in the Gulf of Mexico by calculating the conformity coefficient (
) from fully polarimetric SAR data. This approach was specifically tailored to operate under conditions of low and medium wind speeds. Their results demonstrated a high level of effectiveness in detecting oil spills, indicating that the conformity coefficient derived from polarimetric data can significantly enhance the accuracy of oil spill mapping. Similarly, Frate et al. [
30] pioneered the application of the multi-layer perceptron (MPL) neural network method to SAR images for the recognition and extraction of oil spills. Their approach yielded promising results, demonstrating the efficacy of neural networks in this domain. Chen et al. [
31] investigated SAR target recognition using convolutional neural networks (CNNs). They proposed a novel full convolutional network structure, named A-ConcNets, designed to effectively mitigate the overfitting issues commonly encountered in neural network training. This approach enhances the robustness and generalization capability of the model, making it more reliable for accurately recognizing targets in SAR imagery.
Feature extraction is a crucial stage in the oil spill detection process, enabling the extraction and utilization of a distinctive set of features to distinguish oil spills from natural phenomena such as algae blooms, biogenic slicks, currents, and areas with low wind. The incorporation of features possessing robust discriminatory power significantly enhances the accuracy of classifying oil spills. Marine oil spill detection methodologies can be broadly categorized into traditional approaches and deep learning approaches [
32].
The general framework for oil spill detection encompasses four primary steps, such as preprocessing of remotely sensed data, image segmentation for dark-spot identification, extraction of discriminative features, and classification of image pixels or objects to differentiate between oil spills and lookalikes. ML models are designed to overcome complex classification problems through recursive and iterative analysis of candidate solutions derived from the given training samples and features, without requiring explicit programming for the task.
Traditional methods typically involve manually designed feature extraction and classification algorithms to identify marine oil spills. Li et al. [
33] employed a SVM to determine effective observation locations for marine waves and applied the fuzzy c-means (FCM) algorithm to identify marine waves and oil films. This combination of SVM and FCM algorithms enables the differentiation of oil spills from wave patterns, improving the accuracy of oil spill detection in complex marine environments. Moreover, [
33] proposed another method that integrated SVM with histogram of gradient (HOG) features to enhance accuracy of identifying low-altitude, of-coast oil spills. This approach facilitates all-weather observation capabilities in of-coast waters, leveraging the robustness of HOG features to capture essential gradient information that distinguishes oil spills from other maritime phenomena. Xu et al. [
34] utilized the OTSU algorithm for oil spill observation. This method operates under the assumption that images under analysis contain only background and foreground elements, allowing for images segmentation though threshold computation. This approach facilitates the separation oil regions from the surrounding water by optimizing the threshold that minimizes infra-class variance. These traditional methods, with their reliance on well-established algorithms and manually designed features, provide a foundation for marine oil spill detection. However, they also present limitations in terms of adaptability and scalability, which are increasingly being addressed by the emergence of advanced deep learning related approaches. Notable algorithms include artificial neural networks (ANN) [
35,
36,
37,
38,
39,
40], SVM [
41], KNN [
42,
43], random forest (RF) [
44] algorithms, which include successful application algorithms in a wide array of remote sensing tasks, such as nonparametric supervised and nonparametric ML techniques, rooted in the principle of structural risk minimization from statistical learning.
These classification algorithms operate by leveraging diverse statistical, geometric, texture-based, contextual and polarimetric features extracted from remotely sensed data. The recursive learning processes inherent in these models allow them to adopt and improve their accuracy over time, making them robust tools for effectively monitoring and identifying oil spills in various marine environments.
Eduardo et al. [
45] evaluated a detection performance of the fluorescence index (FI) and the rotation-absorption index (RAI), which emphasized the fluorescence properties oil slicks. They methodologies developed and tested hyperspectral optical imagery. Utilizing imagery from 2010 Deepwater Horizon spill, they demonstrated that optical imagery can effectively differentiate oil from RADAR lookalikes in low wind conditions. Proposed approach was beneficial to reduce false positives in RADAR imagery, and accurate mapping of oil spill extent and thickness. In short, the study demonstrated that both FI and RAI could be effectively applied to map oil slicks in Moderate Resolution Imaging Spectroradiometers (MODIS) images, which indicates infrared imaging is particularly effective for oil slick detection. In 2016, Dubucq et al. [
46] demonstrated that near-infrared (NIR) and shortwave infrared (SWIR) images obtained from multispectral data were highly effective for detecting oil slicks. In the FI formula
and
represent the reflectance values of the blue and red bands of the multi spectral images. Similarly, in the RAI formula,
denote the reflectance values of the blue, infrared and
i-th band of the multispectral images, respectively. The SWIR spectral characteristic for the Landsat OLI image was calculated as the average of band 6 (1609 nm) and band 7 (2201 nm). For the Terra MODIS image, the SWIR spectral characteristic was determined by averaging band 6 (1640 nm) and band 7 (2130 nm) as shown in
Table 1. Recent studies have investigated the utilization of near-infrared (NIR) bands ranging from 750 to 1400 nm in sun-glittered satellite images for detecting oil spills. Adamo et al. [
47] observed that NIR bands from MODIS and Medium Resolution Imaging Spectrometer (MERIS) images exhibit enhanced performance in distinguishing between oil and non-oil classes compared to band within the visible range. Pisano et al. [
48] employed MODIS NIR sun-glittered radiance imagery to successfully detect marine spills. A lot of researchers [
49,
50] have utilized absorption features present in the NIR region as indicators for estimating the thickness of oil spills. The application of NIR bands in these studies underscores their importance in improving the effectiveness and reliability of remote sensing techniques for oil detection and monitoring. Such advancements significantly contribute to enhancing environmental monitoring capabilities and informing timely response actions in marine ecosystems.
2.1. Color Attributes for Object Detection
Object detection represents one of the most challenging tasks in the field of computer vision due to the considerable variability observed among images within the same object category. This variability is influenced by numerous factors, including variance in perspective, scale, and occlusion, which complicate the accurate identification and classification of objects. SOTA methodologies for object detection predominantly rely on intensity-based features, often excluding color information. The primary reason for this exclusion is the substantial variation in color that can arise from changes in illumination, compression artifacts, shadows, and highlights. Such variations introduce significant complexity in achieving robust color descriptions, thereby posing additional challenges to the object detection process. Conversely, in the realm of image classification, color information when combined with shape features has demonstrated substantial efficacy. Studies have shown that the integration of color and shape features can enhance classification performance significantly [
51,
52,
53,
54,
55,
56,
57]. A concept utilized in computer vision that parallels per-pixel classification in the remote sensing community is semantic segmentation. This technique performs pixel-level classification, assigning a specific category to each pixel within a remotely sensed image. Semantic segmentation facilitates the accurate classification of various elements, such as sea surface areas, ships, and oil spill zones. Additionally, it provides a comprehensive understanding of the entire image, enhancing the ability to interpret and analyze the data effectively. Unlike patch-based methods and object detection approaches, semantic segmentation excels in precisely delineating the boundaries and positions of the targets of interest. This precision makes semantic segmentation particularly suitable for processing remote sensing images, as it ensures that each pixel is accurately categorized, thereby improving the granularity and reliability of the classification results. This capability is crucial for applications such as environmental monitoring and disaster response, where detailed and accurate mapping of features is essential [
58]. The current SOTA techniques in object recognition rely on exhaustive search, but to improve performance using more advanced features and classifiers, a selective search strategy is necessary. Vande et al. [
59] proposed the use of hierarchical segmentation as a selective search technique for object detection. By adapting segmentation for selective search, this method generates numerous approximate object locations instead of a few precise ones. This ensures objects are not missed and leverages appearance and nearby context for recognition. This class-independent approach reached 96.7% of objects in the Pascal VOC 2007 test set with only 1536 locations per image.
2.2. You Only Look Once (YOLO)
Yang et al. [
60] evaluated the performance of the YOLO-v4 algorithm for detecting marine oil spills, even in challenging conditions such as shadows and insufficient light. They constructed a specialized oil spill dataset to validate the algorithm’s effectiveness, primarily demonstrating its potential for this application. However, the YOLO-v4 algorithm, being an anchor-based series of methods, requires predefined anchors for detection. The marine environment’s complexity and variability pose significant challenges, as oil spill shapes vary widely, making it difficult to identify suitable anchor sizes. This inherent limitation affects the adaptability and accuracy of the YOLO-v4 algorithm in real-time scenarios. Similarly, Zhang et al. [
61] presented an improved YOLOx-S model for marine oil spill detection, addressing the challenges of inconsistent SAR image contrast. By incorporating a truncated linear stretch module image contrast and using CspDarknet and PANnet networks for feature extraction to effectively identify oil spill areas in marine environments.
3. Proposed Method
In this study, we propose a new optical oil spill dataset and train an oil spill detection model by fine-tuning YOLO-v8. Moreover, we employ a combination of unsupervised machine learning techniques to enhance the accuracy of detecting marine oil spills. From the analysis, it can be seen that traditional methods often struggle with the diverse and challenging visual characteristics of oil spills, such as varying color, textures, and contrast levels, particularly in SAR imagery. To address these challenges, we integrate SOTA algorithms aimed at enhancing image contrast and segmentation, thereby facilitating identification that is more precise in the delineation of oil spill areas. Our approach incorporates K-means clustering for color segmentation, enabling the grouping of pixels with similar color characteristics. This segmentation method aids in distinguishing oil spill regions based on color similarity, thereby refining the detection process. Additionally, the Truncated Linear Stretching (TLS) algorithm is applied to further enhance image contrast, ensuring critical details within the image, such as oil spill areas where colors are predominantly highlighted.
3.1. K-Means Clustering for Color Segmentation
Making color discrimination to detect oil spills is important. Because oil spills often exhibit unique color characteristics that make it difficult for algorithms to differentiate them from the surrounding water. By clustering pixels with similar color, properties can effectively isolate regions in the image and assist in enhancing the discrimination between oil spill areas and background features. Therefore, application of K-means clustering helps to group pixels with similar colors together to segment images based on color similarity. K-means clustering is a widely utilized unsupervised ML algorithm for partitioning a dataset into a predetermined number of clusters. In our case, the primary objective is to group similar pixel values together. As a centroid-based or distance-based algorithm, K-means assigns points to clusters by calculating distance, associating each cluster with centroids.
Moreover, oil spills can appear differently depending on factors such as lighting conditions, weather, and the continuous movement of water waves. By clustering, we adapt these variations by flexibly adjusting the segmentation criteria based on the specific color signatures that fit to the weight of the fine-tuned YOLO-v8 oil spill detection model. Further, we will describe the oil spill data collection, labeling, and model development. Overall, adaptability ensures robust performance across diverse imaging scenarios.
Considering that, we are clustering a dataset of
P points into a
K cluster. Where
K represents the number of clusters, and in this example we set
k = 3, corresponding to the main Red, Green and Blue color channels as represented in oil spill model training. Each cluster has a corresponding centroid, denoted by
,
, …
, where
is the centroid of the k-th cluster. The set of points belonging to each cluster is represented by:
For a given
, it is assigned to the cluster for which the distance to the centroid
is minimized. In other words, point
is assigned to cluster
k* if:
An image in the HSV color space is typically represented as a 3D array with dimensions corresponding to the height, width, and three color channels. To prepare the image for clustering, we need to reshape this 3D array into a 2D array where each row represents a single pixel and the columns represent the HVS values of pixels, as can be seen in
Table 2. K-means clustering commonly expects input data to be of type “float32”; therefore, we convert the pixel values from original type to “float32”.
After clustering, the resulting cluster centers dominant color to be converted back to “uint8” type, which is also helpful for the storage of “uint8” values. The output of K-means clustering gives us a set of labels indicating the cluster each pixel belongs to. To reconstruct segmented images, we use these labels to replace each pixel’s HSV values with the corresponding cluster center values to reshape the 2D array back into the original 3D image shape.
3.2. Truncated Linear Stretching (TLS)
Application of TLS is to enhance the contrast of the image by stretching the intensity values within a specified range. This method involves modifying the pixel intensity values so that the full range of potential values is utilized more effectively. The “truncated” aspect refers to the adjustment of only a subset of the pixel values, typically within a specific range, while ignoring outliers that might otherwise skew the stretching process. By focusing on a specific range of pixel values, truncated linear stretching enhances the visibility of features within that range, making it easier to detect and analyze. Also, by ignoring extreme pixel values, the technique prevents outliers from skewing the contrast adjustment. So, enhanced visibility aids in more accurate feature extraction and classification, improving the trained oil spill detection algorithm’s overall performance.
To calculate percentiles (P), we compute the lower (L) and upper (U) percentiles of the image intensity values to define the range for stretching:
With the above code, we aim to determine the intensity value thresholds that represent the lower and upper percentiles of the image’s intensity distribution. We calculated the 2nd percentile (LP) and the 98th percentile (UP). These percentiles are used to determine the range of intensity values that will be stretched to utilize the full dynamic range.
Applying TLS scales the pixel values between the lower and upper percentiles to utilize the full dynamic range of intensity values. The equation is as follows:
The “np.clip” function is applied to restrict the pixel values to range [LP,UP].
After clipping the pixel values, we apply a linear transformation to scale them to range [0, 255]. This involves subtracting the LP from each pixel value, then multiplying them by the factor after “255/(UP-LP)”. This scaling adjusts the pixel values such that the intensity value LP maps to 0 and UP to 255.
3.3. Oil Spill Detection with YOLO-v8
YOLO-v8 model architecture (Jocher et al., 2023) [
62], developed by Ultralytics as an evaluated version of the YOLO-v5 model. YOLOv8 introduces a significant advancement in the YOLO series, known for real-time object detection capabilities. The YOLO series, initiated by Joseph Redmon and later developed by various contributors, focuses on achieving high-speed and accurate object detection. As shown in
Figure 1, YOLO-v8 employs a decoupled head and an anchor-free model to separately address objectness, classification, and regression tasks. This design enhances the model’s overall accuracy by allowing focused attention on each task. The anchor-free approach directly predicts the object’s position and bounding boxes, offering greater flexibility in adapting to objects of varying sizes, scales, and shapes. In the output of YOLO-v8, the probability that a bounding box contains an object is represented by the objectness score, which is activated using the sigmoid function. The backbone network is responsible for extracting rich feature representations from input images. YOLO-v8 architecture uses a modified version of CSPDDarknet53, which includes Cross-Stage Partial (CSP) connections to improve gradient flow and reduce computation [
63].
Neck network aggregates features from different scales, enabling the detection of objects of various sizes.
Figure 1 is highlighting YOLO-v8 architecture. YOLO-v8 typically uses a PANnet (Path Aggregation Network) structure, which helps to create a more robust feature pyramid by incorporating features from different layers. In the output layer of YOLO-v8, the probability that a bounding box contains an object is represented by the objectness score, which is activated using the sigmoid function. Class probabilities, which indicate the likelihood of objects belonging to different classes, are determined using the softmax function. For loss function, Cloud (Complete Intersection per Union) and DFL (Distribution Focal Loss) algorithms utilized for box regression loss, and Binary Cross-Entropy (BCE) for classification low. CloU, developed by Zheng et al. [
64], is a metric that evaluates the similarity between two bounding boxes by considering their position, size, and aspect ratio. By optimizing the Cloud loss, YOLO-v8 strives to minimize the discrepancy between predicted and actual bounding boxes, resulting in more precise and accurately fitting detections. The DFL, proposed by Li et al. [
65,
66,
67], further refines the box regression process, enhancing the model’s performance. The calculation formula for CloU is as follows:
where IoU represents the intersection over union of the predicted and ground truth boxes.
is the Euclidean distance between the central points of the predicted box
.
is the diagonal length of the smaller enclosing box covering both the predicted and ground truth boxes.
and
are additional parameters to adjust the aspect ratio.
DFL is a loss function designed to address the challenges of class imbalance and sample difficulty imbalance in object detection tasks. Building upon the principles of focal loss, DFL introduces a distribution focal factor to better handle the imbalance between positive and negative samples. This adjustment enhances the robustness and detection capabilities of YOLO-v8 by assigning greater weight to more challenging instances, thus enabling the more to focus on difficult samples. The formula for calculating DFL can be expressed as follows:
where
is a weight factor that balances the importance of positive and negative samples.
is the predicted probability for the target class.
is the focusing parameter that adjusts the rate at which easy samples are down-weighted.
By incorporating DFL, YOLO-v8 improves its ability to detect in complex and imbalanced scenarios. This enhancement allows the model to prioritize difficult-to-detect instances, thereby increasing overall detection accuracy and robustness. CloU and DFL have shown significant improvements in performance for object detection tasks, particularly when dealing with the detection of smaller objects. CloU enhances the precision of bounding box regression by considering not only the overlap between predicted and ground-truth boxes, but also their distance and aspect ratio, as well. This results in more accurate and tightly fitting bounding boxes. On the other hand, DFL addresses class imbalance and the difficulty of detecting certain samples by assigning greater weight to challenging instances. By focusing more on difficult-to detect objects, DFL improves the model’s robustness and accuracy. Together, these advanced loss functions contribute to the superior performance of YOLO-v8 in detecting small and intricate objects, such as marine oil spills, ensuring high accuracy and reliability in complex and varied detection scenarios.
3.4. Data Preparation
The initial phase of this research involved the systematic collection of a comprehensive dataset of oil spill images. A targeted search was conducted to gather publicly available images depicting oil spills. These images were sourced from various internet sources. Also, we extended the dataset by videos. Videos containing footage of oil spills were downloaded, and individual frames were extracted from these videos. This method ensured the dynamic and varied perspectives of oil spills were included in the dataset, enhancing the robustness of the dataset. Example images are shown in
Figure 2 and
Figure 3.
The oil spill dataset was curated to support the development and training of a robust detection model using the YOLOv8 framework. The dataset encompasses a diverse collection of images, ensuring comprehensive coverage of various scenarios and conditions.
These images include thick, viscous back oil films as well as thin, iridescent sheens, highlighting the different physical properties and appearances of oil spills. The thick films often appear as dense, dark patches on the water surface, typically resulting from heavy crude oil or prolonged exposure and accumulation. In contrast, the thin films create a rainbow-like effect due to the interference of light waves, commonly associated with lighter oil products or freshly spilled oil that spreads rapidly.
In oil spill detection, differentiating actual spills and lookalikes is critical for reducing false alarms. Lookalikes often share visual characteristics with oil spills, such as shape and texture. Therefore, to avoid false detection of lookalikes, we developed a wide variety of oil spill images to teach the model more about the smoother texture of oil textures. This helps model learn the subtle differences between true oil spills and other phenomena.
Additionally, our dataset includes various captures from environmental contexts, such as open ocean spills, coastal areas, and harbors, each representing unique challenges for annotation, detection, and classification. The presence of varying weather conditions, such as sea states, background clutter further adds to the complexity. The dataset is divided into two subsets for model training, such as a training set and a validation set, as described in
Table 3.
The dataset consists of a total of 2419 images, which we separated into 1933 oil spill images for training and 484 images for validation. The combination of images encompasses different types of oil spills, varying in size, shape, and environmental conditions. This diversity is crucial for training a model that is capable performing well under different real-world conditions.
Subsequently, the collected image required precise annotation to facilitate the oil spill detection by supervised learning. We used the LabelMe (
https://github.com/labelmeai/labelme, accessed on 10 May 2024) tool, an open-source image annotation tool available on GitHub repository to label oil spill regions and set class name as “Oil spill” as can be seen in
Figure 4.
Table 4 shows the experimental setup for this research work. The software environment is built on Ubuntu 22.04.3 LTS, a 64-bit operating system. CUDA 12.0 is utilized to advantage GPU acceleration for DL tasks, facilitating faster training and model optimization. We run the system on the Linux kernel, ensuring compatibility with the latest hardware drivers and software packages.
4. Experimental Results
Initially, the input image is loaded and resized to 50% of its original dimensions to optimize computational efficiency. Resizing images is a common preprocessing step, particularly when handling large datasets in DL models, as it reduces the computational burden while preserving critical image features. Next, the image is converted from BGR (Blue, Green, Red) color space to the Lab (Luminance, a*, b*) color space. This conversion is significant because the Lab color space is designed to be more perceptually uniform, meaning that small changes in an image color or intensity are more consistently represented. This is particularly useful in detecting subtle features like oil spills. Thereafter, the image is converted to the HSV color space, which is often more effective for distinguishing between different regions based on color, especially in natural scenes. K-means clustering is then applied to segment the image into
k = 3 distinct regions, which correspond to different scenes. This segmentation is crucial in identifying and isolating oil spill regions from the surrounding environment.
Figure 5 shows example of lookalike images in ocean environment and
Figure 6 detailed about how to label oil spills using LabelMe tool.
Moreover, TLS application is a crucial technique that needs to be applied, especially for oil spill detection. Because, by meticulously adjusting the pixel intensity values to accentuate relevant features and mitigate the influence of outliers, TLS lays the groundwork for more effective detection and segmentation, such as when we achieve enhanced feature visibility on the central 96% of the intensity distribution and mitigate noise influence. These preprocessing techniques are essential steps before feeding images into our fine-tuned Yolo-v8 oil segmentation model.
Figure 7 represents image processing results in a single image case. We achieve significant differences before and after image processing. Final output is then ready to be input for Yolo pipeline.
Figure 7 highlights the transformation in the RGB color distribution of an image before and after processing. Initially, the RGB channels exhibit high values, with peaks ranging from 3500 to 4000 in the histogram. This indicates that the original image has intense color levels across all channels, potentially due to uneven lighting, glare, or other factors that can obscure the features of interest, such as oil spills. After processing, the intensity of the RGB channels significantly decreases, with peaks reduced to around 2000 (
Figure 8). This reduction reflects a more balanced and normalized color distribution, which is a crucial outcome of the applied image processing techniques. This achievement means that the image processing has successfully reduced color saturation and intensity variations, enhancing the visibility of subtle features, which are crucial for accurate detection and analysis. The decrease in RGB values suggests that the processing techniques, such as K-means and TLS, effectively improved the image by reducing noise and unnecessary brightness. This normalization is vital for distinguishing between the oil spill and its surrounding environment, thereby improving the model’s ability to accurately identify and segment the spill (
Table 5).
Evaluation Metrics
Performance metrics are essential tools in evaluating the efficacy of a proposed approach or model, especially in the context of specific issues, data characteristics, and analysis objectives. These metrics provide a quantitative basis for assessing how well a model performs by comparing its predictions to the actual outcomes. The general acceptance of a model’s accuracy is often measured through various computation metrics involving correctly and incorrectly classified examples. Key metrics are as follows (
Table 6 and
Table 7):
Figure 9 presents illustrative examples of the model’s detection performance on oil spill images during training phase. The detected oil spill areas are prominently highlighted in red segmented color, providing a clear visual indication of the model’s ability to identify and localize these regions. The figure encompasses a variety of ocean environments, including images sourced from the internet as well as generated frames.
Figure 10 comprises validation examples for the model’s oil spill detection as it performed in the training phase. The detection of these examples can illustrate the model’s effectiveness in controlled environments and validate its capability to generalize learned features to real-world applications. So, these detections are critical for evaluating the model’s sensitivity and specificity.
The plots shown in
Figure 11 represent the distribution and relationships between variables of bounding boxes in x-axes, y-axes, height, and width. The histograms indicate the individual distributions, while scatter plots indicate correlations and clustering patterns between the variables.
Figure 12a depicts a normalized confusion matrix, which is commonly used to evaluate the performance of a classification model. Therefore, we also used it for oil spill detection to compare actual true labels with predicted labels. True positive cell showing that the proportion of correctly predicted oil spill is 92% of time is correctly classified as oil spill. False negative is giving us 8% of oil spill classification wrongly. False Positive cell indicating that the proportion of background instances incorrectly classified and the value is zero percent, which is there were no instances of background being incorrectly classified as oil spills. True negative cell is giving us the proportion of correctly predicted background instances. The value is 100% of the time the model is making a perfect classification in background instances. In terms of color intensity, darker colors indicate higher values of the trained model.
Figure 12b is showing collegram of data distribution in width and height from 0.0 to 1.0 and oil spill instances. In CV applications, this approach is considered to be helpful to see and index recurring visual patterns or motifs across large images.
Figure 13 depicts line graphs of model training in 100 epochs. Training loss of bounding box, segmentation loss, and classification loss went relatively smoother than validations of exact plots. Also, the training stage showed pretty vibrant fluctuations in precision and recall metrics calculations (
Figure 14).
In
Table 8, we included comparison of our approach with similar other methods for oil spill detection. Our methods is outperforming is all metric achievements, and best metrics results are highlighted in bold letters in the table.