Open AccessArticle

MWIRGas-YOLO: Gas Leakage Detection Based on Mid-Wave Infrared Imaging

Key Laboratory of Photoelectronic Imaging Technology and System, Ministry of Education of China, Beijing Institute of Technology, Beijing 100081, China

Author to whom correspondence should be addressed.

Sensors 2024, 24(13), 4345; https://doi.org/10.3390/s24134345

Submission received: 3 June 2024 / Revised: 2 July 2024 / Accepted: 2 July 2024 / Published: 4 July 2024

(This article belongs to the Section Sensing and Imaging)

Download

Browse Figures

Figure 1
Three-layer radiation transfer model and gas infrared imaging detection system. <math display="inline"><semantics> <msub> <mi mathvariant="bold-italic">M</mi> <mrow> <mi>B</mi> <mi>G</mi> </mrow> </msub> </semantics></math> is the background spectral radiation, <math display="inline"><semantics> <msub> <mi mathvariant="bold-italic">M</mi> <mrow> <mi>l</mi> <mi>e</mi> <mi>a</mi> <mi>k</mi> </mrow> </msub> </semantics></math> and <math display="inline"><semantics> <msub> <mi mathvariant="bold-italic">M</mi> <mrow> <mi>n</mi> <mi>o</mi> <mo>_</mo> <mi>l</mi> <mi>e</mi> <mi>a</mi> <mi>k</mi> </mrow> </msub> </semantics></math> are gas path and non-gas path radiation. "> Figure 2
Infrared gas image samples.The red bounding box represents the leaked gas. Below the image are the leak distance, gas type, and leak rate. "> Figure 3
Infrared gas images categorized into three types; the top left of the image shows enlarged views of the gas region image and its corresponding frame difference image. (a) Dense, where the gas plume in the scene is easily observable at a glance. (b) Thin, requiring careful observation of local areas to detect the gas plume. (c) Indiscernible, where the gas plume cannot be directly observed in a single frame and requires observation of multiple frames before and after to identify its position. "> Figure 4
Histogram of mask pixel count. A mask pixel count ranging from 65 to 475 in a single image is considered a small object. "> Figure 5
MWIRGas-YOLO network architecture. "> Figure 6
Channel attention submodule. "> Figure 7
Spatial attention submodule. "> Figure 8
Global attention mechanism. "> Figure 9
Prior model detection results. (a) Visible smoke image detection results. (b) Infrared gas image detection results. "> Figure 10
The feature map before and after embedding GAM, and the red bounding box is the activation of the target feature. (a) Results of backbone comparison. (b) Results of neck comparison. "> Figure 11
Test comparison diagram. (a) YOLOv8-seg test result. (b) MWIRGas-YOLO test result. "> Figure 12
Visual comparison of training process. (a) Precision curve. (b) Recall curve. (c) mAP50 curve. (d) mAP50:95 curve. "> Figure 13
Visualization of detection and segmentation results by different methods on MWIRGas-Seg dataset. ">

Review Reports Versions Notes

Abstract

The integration of visual algorithms with infrared imaging technology has become an effective tool for industrial gas leak detection. However, existing research has mostly focused on simple scenarios where a gas plume is clearly visible, with limited studies on detecting gas in complex scenes where target contours are blurred and contrast is low. This paper uses a cooled mid-wave infrared (MWIR) system to provide high sensitivity and fast response imaging and proposes the MWIRGas-YOLO network for detecting gas leaks in mid-wave infrared imaging. This network effectively detects low-contrast gas leakage and segments the gas plume within the scene. In MWIRGas-YOLO, it utilizes the global attention mechanism (GAM) to fully focus on gas plume targets during feature fusion, adds a small target detection layer to enhance information on small-sized targets, and employs transfer learning of similar features from visible light smoke to provide the model with prior knowledge of infrared gas features. Using a cooled mid-wave infrared imager to collect gas leak images, the experimental results show that the proposed algorithm significantly improves the performance over the original model. The segment mean average precision reached 96.1% (mAP50) and 47.6% (mAP50:95), respectively, outperforming the other mainstream algorithms. This can provide an effective reference for research on infrared imaging for gas leak detection.

Keywords:

mid-wave infrared imaging; gas leak detection; global attention mechanism; small target detection layer

1. Introduction

Industrial gas is widely used in modern society in fields such as steel production, commercial activities, transportation, power generation, and chemical industry [1], making it a crucial component of the energy system and a significant driver of high-quality socio-economic development. However, gas leaks frequently occur during storage, use, and transportation, which can easily lead to fires and explosions [2], causing substantial loss of life and property. Therefore, achieving rapid and effective gas leak detection, along with accurate localization of the leak source and the affected area, are of paramount importance.

Traditional contact-based detection methods, such as acoustic [3], electrochemical [4], and semiconductor techniques [5], can accurately pinpoint leak locations and have low production costs. However, they are significantly affected by environmental interference signals and temperature variations [6,7]. Infrared imaging technology, as a typical non-contact detection method, offers a wide detection range and intuitive visualization, enabling remote dynamic monitoring [8,9]. Considering that industrial gas predominantly exhibits characteristic absorption spectra in the mid-wave and long-wave infrared bands, post-processing with image analysis algorithms allows for the identification of gas leak locations and diffusion trends, thereby enhancing the detection efficiency and saving substantial time and labor. This technology is widely used for industrial gas leak detection. However, the non-solid nature and variable characteristics of gas plumes [10] indicate that the visual effects of gas infrared images are highly influenced by environmental changes. This often results in a gas plume with blurred contours and low contrast against the background in infrared images, making it difficult to observe effectively.

Infrared imaging systems are categorized into active [11] and passive [12] systems based on the radiation source. Unlike active imaging systems, which require additional light sources to provide radiation, passive infrared imaging systems rely solely on ambient background radiation within the field of view. This makes them smaller, more cost-effective, and easier to implement in portable automated detection. Consequently, imaging technologies based on infrared focal plane arrays are more widely used [13]. As the core component of infrared imaging systems, infrared detectors are classified into thermal and photon detectors based on their energy conversion methods. Additionally, they are divided into cooled and uncooled detectors depending on their operating temperature and cooling requirements. Cooled mid-wave infrared detectors [14], which detect infrared radiation through the photoelectric effect generated by the interaction between incoming photons and sensitive materials, offer rapid response and high reliability. These detectors provide high sensitivity and fast response imaging, detecting minute temperature differences and radiation disparities, making them particularly advantageous for gas leak detection tasks.

With the advancement of deep learning, artificial intelligence has been widely applied in the field of computer vision, demonstrating strong vitality in areas such as autonomous driving [15], medical image processing [16], and equipment inspection [17]. Owing to their effective feature extraction capabilities, deep learning-based methods have become a significant direction in gas leak detection research, with models suitable for continuous gas detection. However, several challenges remain: (a) For computer vision tasks, large-scale, high-quality, and well-annotated datasets are crucial for model training, yet there is currently no unified dataset for comparing the performance of different algorithms. (b) Infrared image quality, affected by infrared detectors, significantly impacts the detection work. A low-concentration gas plume is easily confused with image noise, which reduces the detection efficiency. (c) A gas plume in infrared images often has blurred edges, low background contrast, and poor discernibility, resulting in limited usable spatial features and challenging gas feature extraction.

The You Only Look Once (YOLO) series of algorithms, a convolutional neural network-based object detection framework, has demonstrated excellent performance in terms of generalization and real-time capabilities for computer vision tasks. It is widely used in object detection research and can be considered a viable solution for gas leak detection using infrared imaging.

The main contents of this paper are as follows:

A high-sensitivity and high-response imaging effect was achieved using a cooled mid-wave infrared (MWIR) imager. A dataset labeled with gas leak segmentation, MWIRGas-Seg, was collected and underwent visual classification and small target counting.
For the task of gas leak detection in MWIR imaging, an algorithm based on YOLOv8-seg is proposed. This algorithm, named MWIRGas-YOLO, effectively detects and segments gas leaks within a given scene.
A global attention mechanism was introduced during the feature fusion stage to reduce image information dispersion, enhance gas plume localization, and improve the extraction of small target gas plume features. Transfer learning was applied using a visible light smoke dataset with similar characteristics to make the pre-trained model more adept at handling and extracting gas features.
Experimental validation confirms that MWIRGas-YOLO achieves more effective feature extraction and fusion of infrared gas plume targets, outperforming the original YOLOv8-seg and several typical image detection and segmentation algorithms. It is suitable for infrared gas image detection tasks.

2. Related Works

2.1. Gas Leakage Detection

Early gas leak detection algorithms treated the gas plume as a whole, utilizing traditional moving object detection methods such as frame differencing [18], optical flow [19], and background modeling [20,21]. For instance, Lu et al. [22] detected gas leak areas using a mixture of Gaussian models, and Shen et al. [23] investigated the application of three optical flow estimation algorithms to measure fluid flow velocity. Weng et al. [24] proposed a gas plume detection algorithm based on a scale-invariant feature transform and a support vector machine for image sequences filtered by frame differencing. Si et al. [25] developed a method for detecting sulfur hexafluoride (

S F_{6}

) gas by combining gas absorption peaks with binocular filter differentiation to eliminate noise interference. Yan et al. [26] proposed a natural gas pipeline leakage detection method that converts data from multiple sensors into time–frequency images and employs multi-source, multi-modal feature fusion. However, due to the inherently high noise and dynamic background conditions of infrared images, traditional motion object detection methods struggle to adapt to practical gas leak detection tasks, limiting their real-world applicability.

In recent years, with the successful application of deep learning methods in video image processing and object detection, researchers have begun to explore deep learning algorithms for infrared imaging gas leak detection. In 2019, Wang et al. [27] applied CNNs to detect methane leaks, using a FLIR GF320 thermal camera to collect and label the first large-scale methane leak video dataset, GasVid. This dataset covers various leak sources and leak sizes, allowing for the testing of the detection efficiencies of different CNN architectures. The study also compared the fixed background subtraction, moving average background subtraction, and Gaussian mixture background methods. Two years later, the same team developed VideoGasNet [28] for gas leak detection in videos, extending the 2D CNN input of 2D image data to a 3D CNN by incorporating the temporal dimension. Their results demonstrated that the 3D CNN (VideoGasNet) had improved robustness, with leak/no-leak detection accuracy near 100%, and classification accuracy for large, medium, and small leak sizes reaching 78.2%. However, while the study achieved high accuracy in detecting the presence of gas leaks within a scene, it did not specify the exact location or quantify the leaks. Additionally, the dataset was collected with a single clear-sky background and no other scene disturbances, raising concerns about its generalizability.

Shi et al. [29] integrated the two-stage detection model Faster R-CNN with optical gas imaging (OGI) technology, utilizing model transfer methods to achieve real-time automatic hydrocarbon leak detection. Bin et al. [30] proposed a gas detection framework that integrates foreground information based on background subtraction methods and deep neural networks. Park et al. [31] employed U-Net as a fully convolutional network to accomplish semantic segmentation of infrared gas images. Wang et al. [32] used a Plug617 uncooled infrared detector to image SF6 gas and proposed a gas semantic segmentation method based on the DeeplabV3+ model. Bhatt et al. [33] introduced a spatio-temporal U-Net architecture to perform pixel-level gas mask extraction from input image frame sequences. Lin et al. [34] proposed the semantic segmentation model 2.5D-Unet, which enhances the network’s capability to represent the appearance and motion of leaking gas. Gu et al. [35] demonstrated that a two-dimensional Gaussian distribution could better cluster gas plume pixels, and that training on synthetic datasets can be effectively applied to real-world scenarios. Badawi et al. [36] demonstrated that utilizing raw data with deep learning methods allows algorithms to learn and discern features autonomously.

Significant progress has been made in infrared imaging for gas leak detection. However, current detection algorithms primarily focus on single scenarios with a clearly visible gas plume without accounting for real-world conditions where targets have blurred contours and low contrast. Moreover, there is limited research on the use of cooled infrared detectors for gas leak detection. Additionally, the application of classic image segmentation algorithms such as YOLACT [37], SOLO [38], SOLOv2 [39], QueryInst [40], and YOLO to infrared gas images is sparse. Few comparative analyses of various existing algorithms have been conducted on a common dataset.

2.2. YOLO Detection Models

YOLO, as a typical representative of single-stage detection models, is widely applied in real-time object detection. Redmon et al. first proposed YOLOv1 [41] in 2016 to address the high computational complexity of two-stage models, treating object detection as a regression problem where the entire image serves as a network input. A neural network predicts bounding box positions and class labels. Subsequent advancements including YOLOv2 [42] and YOLOv3 [43] incorporated superior backbone networks and feature pyramids, significantly enhancing detection performance.

In 2020, YOLOv4 [44] and YOLOv5 introduced more advanced detection strategies such as data augmentation and replacing the PAN structure with FPN-PAN, improving detection efficiency while maintaining real-time capability. Continuing advancements led to YOLOX [45], YOLOv6 [46], and YOLOv7 [47], progressively enhancing detection performance in general object detection tasks. In 2023, researchers introduced YOLOv8, which not only improved on structure and loss functions but also integrated instance segmentation capabilities inspired by YOLACT. This enhancement delineates object contours in images, thereby diversifying the tasks for object detection. Consequently, applying the YOLOv8 model to gas detection and segmentation tasks is feasible.

3. Datasets

3.1. Data Acquisition

In this paper, a cooled mid-wave infrared imaging system was employed to investigate gas leak detection using the image acquisition process illustrated in Figure 1. The detection principle of infrared imaging for gas leakage involves two main components. The first component is the infrared spectrum of the background radiation/reflection that reaches the front boundary of the gas plume and detector. The second component comprises background infrared radiation, partially absorbed by the gas plume, and the radiation emitted by the gas plume itself, which, together, reach the detector through the atmosphere. The presence of the gas plume causes differential radiation, after gas absorption, to exceed the sensitivity threshold of the detector, resulting in a detectable radiative difference in the relevant infrared absorption bands. This differential radiation is focused onto the infrared detector through optical lenses, producing a grayscale image that represents the radiative contrast of the target infrared scene.

The specific parameters of the cooled mid-wave infrared imaging system are listed in Table 1. Methane and HFC gas were employed in these experiments. Methane, a typical industrial gas, exhibits a significant absorption peak at 3.3 μm [48], which falls within the MWIR range and matches the detection band of the detector used in the experiment. HFC gas is a portable refrigerant that is colorless, odorless, and highly safe. During its release, it undergoes noticeable vaporization and heat absorption, resulting in a radiative contrast within the infrared absorption bands.

The study of image segmentation requires a large number of data samples to train deep learning networks. However, there is currently no open-source dataset for infrared gas image segmentation. Therefore, to apply image segmentation to gas leak detection, this paper established an MWIRGas-Seg dataset for mid-wave infrared gas leak segmentation. This dataset includes multiple indoor and outdoor gas leak scenes with various leak sources, while also recording weather conditions, environmental temperatures, wind speeds, and detection distances.

The collected images encompassed various detection distances (7, 10, 20, 30, 33, 40, and 50 m), leak rates (0.5, 1, 2, 5, and >5 L/min), and 10 experimental scenarios, totaling 35 sets of image sequences. Representative frames of different leak categories are displayed in Figure 2. X-AnyLabeling 2.3.0 [49] was used to annotate 7187 sets of segmentation data based on pixel-level differences between the gas leak areas and background areas.

3.2. Dataset Statistics

Owing to the influence of the gas properties and environmental factors, the grayscale differences observed in the images do not directly correlate with the actual leak distance and volume. Therefore, this paper categorized infrared gas images into three classes—dense, thin, and indiscernible. Figure 3 illustrates the typical representatives of each class.

Chen et al. [50] defined small objects as those with a bounding box area to image area ratio between 0.08% and 0.58%. The pixel information of the mask in our dataset was statistically analyzed, as shown in Figure 4, to calculate the number of small and normal objects in the dataset, as presented in Table 2. The dataset comprised 62.28% normal objects and 37.72% small objects.

4. Methods

4.1. YOLOv8-Seg Model

The YOLO series of methods, as one-stage object detection algorithms, utilize a backbone network to extract features, fuse multi-scale features, and then output target detection boxes through multiple detection heads. This approach achieves an excellent detection accuracy and real-time performance. The YOLOv8 model integrates methods from YOLACT, adding a segmentation branch on top of the existing detection branches to combine detection boxes and segmentation results, thereby achieving instance segmentation of targets. The backbone enhances the feature extraction capabilities using a lightweight C2f module and incorporates an SPPF layer at the end to extract features with different receptive fields, making the network more suited for targets of different scales. The neck section employs a dual-stream Feature Pyramid Network (FPN) structure that aggregates feature information from top to bottom, fusing high- and low-level feature information via upsampling to compute prediction feature maps. A Path Aggregation Network (PAN) introduces lateral connections to enhance semantic feature information and fusion capabilities. The head section adopts an anchor-free decoupling head, which inputs position and category information from the feature map into the detection and classification branches to compute position and category, respectively. Predictions for small-, medium-, and large-scale targets were generated based on the fused P3, P4, and P5 feature maps.

This paper proposes the MWIRGas-YOLO model for mid-wave infrared imaging gas leak detection tasks, which incorporates a global attention mechanism and a small object detection layer based on the YOLOv8-seg model. The network structure is illustrated in Figure 5.

4.2. Global Attention Mechanism

In the context of infrared imaging gas leak detection, weak-concentration gas plume targets are often confused with background noise, posing challenges for detection and segmentation. To address this issue, adding the global attention mechanism can assign different weights to different parts of the input feature map, allowing the network to focus fully on gas plume targets while ignoring irrelevant background information, thus improving the detection accuracy.

The global attention mechanism (GAM) [51] consists of the channel attention submodule

M_{c}

and spatial attention submodule

M_{s}

. The channel attention submodule preserves information in three different dimensions and utilizes a two-layer Multilayer Perceptron (MLP) to amplify spatial information interaction across dimensions, thereby enhancing feature representation capability (as shown in Figure 6). The spatial attention submodule focuses on spatial information, using two convolutional layers for spatial information fusion and fully learning spatial features (as shown in Figure 7).

The entire process is illustrated in Figure 8. For a given input feature map

F_{1}

, after channel attention, the resulting feature is denoted as

F_{2}

and the final output feature is denoted as

F_{3}

; the intermediate state

F_{2}

and output

F_{3}

are defined as follows:

F_{2} = M_{c} (F_{1}) \otimes F_{1}

(1)

F_{3} = M_{s} (F_{2}) \otimes F_{2}

(2)

where ⊗ indicates that multiplication is performed element by element.

In the YOLOv8 model, the backbone is responsible for extracting feature information from images, whereas the neck section focuses on better utilizing the extracted features for feature fusion. For gas leak detection tasks, noise in infrared images can significantly affect the feature extraction of gas plume targets. Adding an attention mechanism at the backbone feature extraction stage amplified the noise impact while reinforcing the gas plume target features. Therefore, this paper adds a global attention mechanism at the neck feature fusion stage to enhance the feature fusion capability of gas plume targets and improve the detection accuracy.

4.3. Small Target Detection Layer

The original YOLOv8 model was equipped with three detection heads, enabling multi-scale object detection. The detection sizes were P3/8, P4/16, and P5/32, which correspond to feature map sizes of 80 × 80, 40 × 40, and 20 × 20, respectively. These feature maps are responsible for detecting objects of sizes 8 × 8, 16 × 16, 32 × 32, and larger However, due to the presence of small targets within gas plume clusters, which are often influenced by environmental interference, deeper feature maps struggled to capture the features of small objects effectively. Consequently, the original model exhibited a poor performance in detecting small targets.

This paper proposes the addition of a small object detection layer to the original network, specifically a P2/4 detection layer with a size of 160 × 160. This modification enhances the semantic information and feature representation capabilities of small objects by incorporating supplementary fusion feature layers and an additional detection step. The 80 × 80 scale layer from the fifth layer (P2) in the backbone was stacked with an upsampled feature layer in the neck. After processing with C2f and upsampling, this results in a deep semantic feature layer containing small object feature information. This layer is then further stacked with the shallow positional feature layer from the third layer in the backbone, creating a comprehensive 160 × 160 scale fusion feature layer that improves the expression of semantic and positional information for small objects. Finally, this enriched feature layer is sent through a C2f module to an additional decoupled head.

The enhancement in the head section allows the small object feature information to continue propagating along the downsampling path to the other three scale feature layers. This strengthens the feature fusion capability of the network, and the introduction of an additional decoupled head expands the detection range for gas plume targets. These improvements in the detection accuracy and range enable the network to more precisely identify small gas plume targets within the scene.

4.4. Transfer Learning

The objective of transfer learning is to apply knowledge or patterns learned from one domain or task to a different but related domain or problem [52]. Given that gas plume targets and smoke share similar uncertain contour characteristics, this paper employed a smoke dataset with similar features for transfer learning pre-training. This approach yields a prior model capable of effectively extracting features from gas plume targets. The smoke images used for this purpose were sourced from publicly available online datasets.

Testing the model, which was trained on a visible light smoke image dataset, directly on infrared gas images revealed that the smoke model could effectively identify some gas regions, as shown in Figure 9. This demonstrates that infrared gas plumes and smoke share similar features and confirms that the prior model has the capability to learn and extract features for gas plume targets.

5. Results and Analysis

5.1. Experimental Configuration and Model Setup

The experiment was conducted on an Ubuntu 20.04.6 LTS operating system using a GeForce RTX 3090 GPU, Python 3.8, and CUDA 12.1. The deep learning framework employed was PyTorch 2.2.2. The training parameters are presented in Table 3. The dataset was divided into a training set and a validation set in an 8:2 ratio, with a training set comprising 5749 images and a validation set comprising 1438 images.

5.2. Evaluation Metrics

To evaluate the performance of the algorithm in infrared imaging gas leak detection and objectively compare the detection effectiveness of different segmentation algorithms, this experiment employed precision, recall, and mean average precision (mAP) as performance metrics.

Precision refers to the probability of correctly predicting positive samples among the predicted positive samples:

P = \frac{T P}{T P + F P}

(3)

where

T P

and

F P

represent the number of true and false positive samples in the predicted positive samples, respectively.

Recall is the probability of correctly predicting a positive sample from all predicted samples:

R = \frac{T P}{T P + F N}

(4)

where

F N

is the number of false negative samples predicted as negative.

m A P

is the average accuracy of all the detection classes:

m A P = \frac{\sum_{0}^{N} A P_{n}}{N}

(5)

N represents the total number of classes and

A P_{n}

is the average accuracy rate of class n, which is obtained by calculating the area under the precision–recall rate curve. The mean value is denoted as [email protected], indicating the average accuracy rate when the threshold of the IoU parameter is set to 0.5, and [email protected]:0.95 indicates the average accuracy rate at different IoU thresholds (from 0.5 to 0.95, step size 0.05).

5.3. Experimental Analysis of Global Attention Mechanism

To validate the effectiveness of incorporating the GAM in the neck section for the feature fusion of gas plume targets, a comparative experiment was conducted by embedding the GAM at different positions in the YOLOv8-seg model’s backbone and neck. The experimental results are listed in Table 4.

The table shows that embedding GAM in the backbone feature extraction part enhances gas plume feature extraction but also increases the interference impact on feature extraction, thereby decreasing the detection accuracy and rate. Feature extraction before and after embedding the GAM is illustrated in Figure 10a. Introducing the GAM in the neck section focuses more on gas plume feature fusion, resulting in an improved algorithm accuracy and mAP. Figure 10b compares the feature maps before and after embedding the GAM in the neck section.

5.4. Experimental Analysis of Small Target Detection Layer

Infrared imaging gas plumes often exhibit uneven concentrations, owing to the influence of environmental wind speeds, making it challenging to visually form a cohesive whole. The addition of a fourfold downsampling detection layer aims to enhance the ability of the model to effectively detect weakly concentrated small gas plumes. Figure 11 illustrates the detection comparison before and after the addition of the small object detection layer. It can be observed that the addition of fourfold downsampling corresponding to the small object detection layer effectively mitigates issues of missed detection caused by small gas plume sizes, thereby improving detection accuracy.

5.5. Experimental Analysis of Transfer Learning

The diffusion of infrared imaging gas plumes and visible light smoke adheres to the principles of fluid mechanics, with similar contour characteristics. This paper leveraged feature learning from visible light smoke images to obtain a prior model capable of learning gas plume features. Figure 12 presents the visualization results of training the infrared gas plume model based on the original YOLOv8-seg model and smoke model as the base models. Training the infrared gas plume model using the smoke model demonstrates a clear prior, leading to a noticeable improvement in model accuracy and precision.

5.6. Ablation Experiment

To further validate the differences in enhancing the model segmentation performance through the GAM, small object detection layer, and smoke image transfer learning guidance, ablation experiments were conducted on the same dataset. The experiments aimed to verify the impact of embedding the three modules on model segmentation accuracy. Six sets of ablation experiments involved single modules or combinations of two modules, with the final set comprising the complete algorithm proposed in this paper. Table 5 presents the results. The experimental results demonstrate that both the GAM attention mechanism and the small object detection layer significantly improve the precision. Additionally, the use of smoke image transfer learning guidance significantly enhances segmentation accuracy.

5.7. Contrast Experiment

To validate the superiority and effectiveness of the proposed algorithm, we selected representative image segmentation algorithms including Mask R-CNN, YOLACT, SOLO, SOLOv2, and QueryInst for comparative experiments with our model. We calculated the target detection and image segmentation of mAP50 and mAP50:95 for each model. To ensure the validity of the results, all the experiments were conducted on a previously described experimental platform using the same training and validation datasets.

As shown in Table 6, compared with the other six models, the proposed model achieved the highest target detection mAP50 and mAP50:95, reaching 97.3% and 59.9%, respectively. In addition, the proposed model achieved the highest image segmentation of mAP50 and mAP50:95, achieving 96.1% and 47.6%, respectively.

To visually compare the robustness of our model in segmenting infrared gas images, we present the segmentation results of the original model, the five comparison models, and the improved model in Figure 13. The first and second rows display the original images and their corresponding ground truth segmentations. Rows three–nine show the visualization results of the various models. From the visualizations, it is evident that in columns two and five, the YOLOv8-seg model incorrectly segmented a single gas plume into two separate plumes, whereas the MWIRGas-YOLO model accurately segmented them. In the image in column four, due to the low contrast between the gas plume and background, the YOLOv8-seg model failed to detect the gas leak effectively. In contrast, the MWIRGas-YOLO model, with the addition of a small target layer and global attention mechanism, more effectively extracted and fused gas features, enabling the detection of faint gas leak. A detailed examination of columns two and five reveals that our method segments the contours of morphologically variable gas plume more accurately, producing results closer to the ground truth. In summary, our designed model demonstrated superior detection and segmentation performance, making it more suitable for detecting and segmenting infrared gas images.

6. Conclusions

In this paper, a cooled mid-wave infrared thermal imager was used to create the MWIRGas-Seg dataset for mid-wave infrared gas leak segmentation. The analysis was conducted based on experimental scenarios and visual judgments. A model specifically designed for mid-wave infrared gas leak detection and segmentation, MWIRGas-YOLO, was proposed to achieve effective detection of leaking gas and segmentation of gas plumes in the scene. Compared with typical image segmentation models, the proposed model demonstrated superior performance in gas target detection and segmentation. Moreover, our method focused directly on studying gas plume features in infrared images, embedding the GAM to enhance the fusion of features, thereby focusing on gas cloud targets. Supplementary small target detection layers effectively address issues of missed small-sized gas cloud targets. The model is pre-trained using smoke images, showing significant prior knowledge in learning infrared gas cloud characteristics. The experimental results indicate that the MWIRGas-YOLO model designed in this paper exhibits superior detection and segmentation performance, making it suitable for infrared gas image detection and segmentation tasks. Importantly, our proposed method is versatile and does not depend on specific infrared detectors; thus, it can be applied broadly in the field of infrared imaging for gas leak detection.

Future studies will continue to explore this technology by expanding the dataset to represent a broader range of real-world leak scenarios. The goal was to make the algorithm more applicable for leak detection in real-world scenarios. Additionally, as gas leak processes are continuous, future efforts will investigate incorporating temporal information from videos and exploring sequence modeling to address gas leak issues, further optimizing the algorithm for enhanced accuracy.

Author Contributions

Conceptualization, S.X.; methodology and validation, S.X.; formal analysis, S.X. and Q.S.; investigation, S.X. and Q.S.; resources, S.X. and K.D.; data curation, S.X. and K.D.; writing—original draft preparation, S.X.; writing—review and editing, S.X., Q.S. and X.W.; visualization, S.X. and K.D.; supervision, S.X. and X.W.; project administration, S.X. and X.W.; funding acquisition, X.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research is supported financially by BeijingWisdom Sharing Technical Co., Ltd.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Li, H.; Luo, D.; Sun, Y.; GholamHosseini, H. Classification and identification of industrial gases based on electronic nose technology. Sensors 2019, 19, 5033. [Google Scholar] [CrossRef] [PubMed]
Zhu, Y.; Qian, X.M.; Liu, Z.Y.; Huang, P.; Yuan, M.Q. Analysis and assessment of the Qingdao crude oil vapor explosion accident: Lessons learnt. J. Loss Prev. Process Ind. 2015, 33, 289–303. [Google Scholar] [CrossRef]
Zipser, L.; Wächter, F.; Franke, H. Acoustic gas sensors using airborne sound properties. Sensors Actuators B Chem. 2000, 68, 162–167. [Google Scholar] [CrossRef]
Sekhar, P.K.; Brosha, E.L. Trace detection of 2, 4, 6-trinitrotoluene using electrochemical gas sensor. IEEE Sensors J. 2014, 15, 1624–1629. [Google Scholar] [CrossRef]
Xue, F.; Adedokun, G.; Xie, D.; Liu, R.; Zhang, Y.; Muhammad, M.; Xu, L.; Wu, F. A low power four-channel metal oxide semiconductor gas sensor array with T-shaped structure. J. Microelectromech. Syst. 2022, 31, 275–282. [Google Scholar] [CrossRef]
Meribout, M.; Khezzar, L.; Azzi, A.; Ghendour, N. Leak detection systems in oil and gas fields: Present trends and future prospects. Flow Meas. Instrum. 2020, 75, 101772. [Google Scholar] [CrossRef]
Meribout, M. Gas leak-detection and measurement systems: Prospects and future trends. IEEE Trans. Instrum. Meas. 2021, 70, 4505813. [Google Scholar] [CrossRef]
Usamentiaga, R.; Venegas, P.; Guerediaga, J.; Vega, L.; Molleda, J.; Bulnes, F.G. Infrared thermography for temperature measurement and non-destructive testing. Sensors 2014, 14, 12305–12348. [Google Scholar] [CrossRef] [PubMed]
Safitri, A.; Gao, X.; Mannan, M.S. Dispersion modeling approach for quantification of methane emission rates from natural gas fugitive leaks detected by infrared imaging technique. J. Loss Prev. Process Ind. 2011, 24, 138–145. [Google Scholar] [CrossRef]
Zhou, K.; Wang, Y.; Lv, T.; Li, Y.; Chen, L.; Shen, Q.; Cao, X. Explore spatio-temporal aggregation for insubstantial object detection: Benchmark dataset and baseline. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 3104–3115. [Google Scholar]
Barber, R.; Rodriguez-Conejo, M.A.; Melendez, J.; Garrido, S. Design of an infrared imaging system for robotic inspection of gas leaks in industrial environments. Int. J. Adv. Robot. Syst. 2015, 12, 23. [Google Scholar] [CrossRef]
Kang, R.; Liatsis, P.; Kyritsis, D.C. Emission quantification via passive infrared optical gas imaging: A review. Energies 2022, 15, 3304. [Google Scholar] [CrossRef]
Rogalski, A. Progress in focal plane array technologies. Prog. Quantum Electron. 2012, 36, 342–473. [Google Scholar] [CrossRef]
Wang, L.; Xu, Z.; Xu, J.; Dong, F.; Wang, F.; Bai, Z.; Zhou, Y.; Chai, X.; Li, H.; Ding, R.; et al. Fabrication and characterization of InAs/GaSb type-II superlattice long-wavelength infrared detectors aiming high temperature sensitivity. J. Light. Technol. 2020, 38, 6129–6134. [Google Scholar] [CrossRef]
Janai, J.; Güney, F.; Behl, A.; Geiger, A. Computer vision for autonomous vehicles: Problems, datasets and state of the art. Found. Trends® Comput. Graph. Vis. 2020, 12, 1–308. [Google Scholar] [CrossRef]
Esteva, A.; Chou, K.; Yeung, S.; Naik, N.; Madani, A.; Mottaghi, A.; Liu, Y.; Topol, E.; Dean, J.; Socher, R. Deep learning-enabled medical computer vision. NPJ Digit. Med. 2021, 4, 5. [Google Scholar] [CrossRef]
Shao, H.; Xia, M.; Wan, J.; de Silva, C.W. Modified stacked autoencoder using adaptive Morlet wavelet for intelligent fault diagnosis of rotating machinery. IEEE/ASME Trans. Mechatronics 2021, 27, 24–33. [Google Scholar] [CrossRef]
Wang, Z.; Liao, K.; Xiong, J.; Zhang, Q. Moving object detection based on temporal information. IEEE Signal Process. Lett. 2014, 21, 1403–1407. [Google Scholar] [CrossRef]
Agarwal, A.; Gupta, S.; Singh, D.K. Review of optical flow technique for moving object detection. In Proceedings of the 2016 2nd International Conference on Contemporary Computing and Informatics (IC3I), Greater Noida, India, 14–17 December 2016; IEEE: Piscataway, NJ, USA, 2016; pp. 409–413. [Google Scholar]
Xue, W.; Jiang, T. An adaptive algorithm for target recognition using Gaussian mixture models. Measurement 2018, 124, 233–240. [Google Scholar] [CrossRef]
Barnich, O.; Van Droogenbroeck, M. ViBe: A powerful random technique to estimate the background in video sequences. In Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing, Taipei, Taiwan, 19–24 April 2009; IEEE: Piscataway, NJ, USA, 2009; pp. 945–948. [Google Scholar]
Lu, Q.; Li, Q.; Hu, L.; Huang, L. An effective Low-Contrast SF₆ gas leakage detection method for infrared imaging. IEEE Trans. Instrum. Meas. 2021, 70, 5009009. [Google Scholar] [CrossRef]
Shen, Z.; Schmoll, R.; Kroll, A. Measurement of Fluid Flow Velocity by Using Infrared and Visual Cameras: Comparison and Evaluation of Optical Flow Estimation Algorithms. In Proceedings of the 2023 IEEE SENSORS, Vienna, Austria, 29 October–1 November 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 1–4. [Google Scholar]
Jing, W.; Pan, Y.; Minghe, W.; Li, L.; Weiqi, J.; Wei, C.; Bingcai, S. Thermal imaging detection method of leak gas clouds based on support vector machine. Acta Opt. Sin. 2022, 42, 0911002. [Google Scholar]
Si, W.; Zhao, Y.; Wang, Y.; Li, B.; Tong, G.; Yu, Y. Monitoring SF6 Gas Leakage Based on a Customized Binocular System. Sensors 2024, 24, 993. [Google Scholar] [CrossRef]
Yan, W.; Liu, W.; Zhang, Q.; Bi, H.; Jiang, C.; Liu, H.; Wang, T.; Dong, T.; Ye, X. Multi-source Multi-Modal Feature Fusion for Small Leak Detection in Gas Pipelines. IEEE Sensors J. 2023, 24, 1857–1865. [Google Scholar] [CrossRef]
Wang, J.; Tchapmi, L.P.; Ravikumar, A.P.; McGuire, M.; Bell, C.S.; Zimmerle, D.; Savarese, S.; Brandt, A.R. Machine vision for natural gas methane emissions detection using an infrared camera. Appl. Energy 2020, 257, 113998. [Google Scholar] [CrossRef]
Wang, J.; Ji, J.; Ravikumar, A.P.; Savarese, S.; Brandt, A.R. VideoGasNet: Deep learning for natural gas methane leak classification using an infrared camera. Energy 2022, 238, 121516. [Google Scholar] [CrossRef]
Shi, J.; Chang, Y.; Xu, C.; Khan, F.; Chen, G.; Li, C. Real-time leak detection using an infrared camera and Faster R-CNN technique. Comput. Chem. Eng. 2020, 135, 106780. [Google Scholar] [CrossRef]
Bin, J.; Bahrami, Z.; Rahman, C.A.; Du, S.; Rogers, S.; Liu, Z. Foreground Fusion-Based Liquefied Natural Gas Leak Detection Framework From Surveillance Thermal Imaging. IEEE Trans. Emerg. Top. Comput. Intell. 2022, 7, 1151–1162. [Google Scholar] [CrossRef]
Park, J.S.; Song, J.K. FCN based gas leakage segmentation and improvement using transfer learning. In Proceedings of the 2019 IEEE Student Conference on Electric Machines and Systems (SCEMS 2019), Busan, Republic of Korea, 1–3 November 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 1–4. [Google Scholar]
Wang, Q.; Xing, M.; Sun, Y.; Pan, X.; Jing, Y. Optical gas imaging for leak detection based on improved deeplabv3+ model. Opt. Lasers Eng. 2024, 175, 108058. [Google Scholar] [CrossRef]
Bhatt, R.; Gokhan Uzunbas, M.; Hoang, T.; Whiting, O.C. Segmentation of Low-Level Temporal Plume Patterns from IR Video. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Long Beach, CA, USA, 15–20 June 2019. [Google Scholar]
Lin, H.; Gu, X.; Hu, J.; Gu, X. Gas Leakage Segmentation in Industrial Plants. In Proceedings of the 2020 Chinese Automation Congress (CAC), Shanghai, China, 6–8 November 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 1639–1644. [Google Scholar]
Xiaojing, G.; Haoqi, L.; Dewu, D.; Xingsheng, G. An infrared gas imaging and instance segmentation based gas leakage detection method. J. East China Univ. Sci. Technol. 2023, 49, 76–86. [Google Scholar]
Badawi, D.; Ayhan, T.; Ozev, S.; Yang, C.; Orailoglu, A.; Cetin, A.E. Detecting gas vapor leaks using uncalibrated sensors. IEEE Access 2019, 7, 155701–155710. [Google Scholar] [CrossRef]
Bolya, D.; Zhou, C.; Xiao, F.; Lee, Y.J. Yolact: Real-time instance segmentation. In Proceedings of the IEEE/CVF international Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 9157–9166. [Google Scholar]
Wang, X.; Kong, T.; Shen, C.; Jiang, Y.; Li, L. Solo: Segmenting objects by locations. In Proceedings of the Computer Vision—ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; Proceedings, Part XVIII 16. Springer: Cham, Switzerland, 2020; pp. 649–665. [Google Scholar]
Wang, X.; Zhang, R.; Kong, T.; Li, L.; Shen, C. Solov2: Dynamic and fast instance segmentation. Adv. Neural Inf. Process. Syst. 2020, 33, 17721–17732. [Google Scholar]
Fang, Y.; Yang, S.; Wang, X.; Li, Y.; Fang, C.; Shan, Y.; Feng, B.; Liu, W. Instances as queries. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 6910–6919. [Google Scholar]
Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
Redmon, J.; Farhadi, A. YOLO9000: Better, faster, stronger. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 7263–7271. [Google Scholar]
Redmon, J.; Farhadi, A. Yolov3: An incremental improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
Bochkovskiy, A.; Wang, C.Y.; Liao, H.Y.M. Yolov4: Optimal speed and accuracy of object detection. arXiv 2020, arXiv:2004.10934. [Google Scholar]
Ge, Z.; Liu, S.; Wang, F.; Li, Z.; Sun, J. Yolox: Exceeding yolo series in 2021. arXiv 2021, arXiv:2107.08430. [Google Scholar]
Li, C.; Li, L.; Jiang, H.; Weng, K.; Geng, Y.; Li, L.; Ke, Z.; Li, Q.; Cheng, M.; Nie, W.; et al. YOLOv6: A single-stage object detection framework for industrial applications. arXiv 2022, arXiv:2209.02976. [Google Scholar]
Wang, C.Y.; Bochkovskiy, A.; Liao, H.Y.M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In Proceedings of the IEEE/CVF Conference On Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 7464–7475. [Google Scholar]
Song, F.; Zheng, C.; Yan, W.; Ye, W.L.; Zhang, Y.; Wang, Y.; Tittel, F.K. Performance enhancement of methane detection using a novel self-adaptive mid-infrared absorption spectroscopy technique. IEEE Photonics J. 2018, 10, 6804512. [Google Scholar] [CrossRef]
CVHub. Advanced Auto Labeling Solution with Added Features. 2023. Available online: https://github.com/CVHub520/X-AnyLabeling (accessed on 24 March 2024).
Chen, C.; Liu, M.Y.; Tuzel, O.; Xiao, J. R-CNN for small object detection. In Proceedings of the Computer Vision—ACCV 2016: 13th Asian Conference on Computer Vision, Taipei, Taiwan, 20–24 November 2016; Revised Selected Papers, Part V 13. Springer: Cham, Switzerland, 2017; pp. 214–230. [Google Scholar]
Liu, Y.; Shao, Z.; Hoffmann, N. Global attention mechanism: Retain information to enhance channel-spatial interactions. arXiv 2021, arXiv:2112.05561. [Google Scholar]
Shao, L.; Zhu, F.; Li, X. Transfer learning for visual categorization: A survey. IEEE Trans. Neural Netw. Learn. Syst. 2014, 26, 1019–1034. [Google Scholar] [CrossRef]

Figure 1. Three-layer radiation transfer model and gas infrared imaging detection system.

M_{B G}

is the background spectral radiation,

M_{l e a k}

and

M_{n o_l e a k}

are gas path and non-gas path radiation.

Figure 1. Three-layer radiation transfer model and gas infrared imaging detection system.

M_{B G}

is the background spectral radiation,

M_{l e a k}

and

M_{n o_l e a k}

are gas path and non-gas path radiation.

Figure 2. Infrared gas image samples.The red bounding box represents the leaked gas. Below the image are the leak distance, gas type, and leak rate.

Figure 3. Infrared gas images categorized into three types; the top left of the image shows enlarged views of the gas region image and its corresponding frame difference image. (a) Dense, where the gas plume in the scene is easily observable at a glance. (b) Thin, requiring careful observation of local areas to detect the gas plume. (c) Indiscernible, where the gas plume cannot be directly observed in a single frame and requires observation of multiple frames before and after to identify its position.

Figure 4. Histogram of mask pixel count. A mask pixel count ranging from 65 to 475 in a single image is considered a small object.

Figure 5. MWIRGas-YOLO network architecture.

Figure 6. Channel attention submodule.

Figure 7. Spatial attention submodule.

Figure 8. Global attention mechanism.

Figure 9. Prior model detection results. (a) Visible smoke image detection results. (b) Infrared gas image detection results.

Figure 10. The feature map before and after embedding GAM, and the red bounding box is the activation of the target feature. (a) Results of backbone comparison. (b) Results of neck comparison.

Figure 11. Test comparison diagram. (a) YOLOv8-seg test result. (b) MWIRGas-YOLO test result.

Figure 12. Visual comparison of training process. (a) Precision curve. (b) Recall curve. (c) mAP50 curve. (d) mAP50:95 curve.

Figure 13. Visualization of detection and segmentation results by different methods on MWIRGas-Seg dataset.

Table 1. Main performance parameters of infrared imaging device.

Performance Metrics	Parameters
Array format	320 × 256
Infrared detector type	Cooled infrared detector
Image size	30 μm
NETD	<20 mK
Spectral range	3.2–3.4 μm
Focal length	100 mm
Frame per second	30 fps

Table 2. Statistics of dataset categories.

Object Statistics	Number of Objects
Small object	2711
Normal object	4476
Total objects	7187

Table 3. Model training hyperparameter settings.

Hyperparameter Options	Parameters
Imgsz	640
Batch_size	16
Epochs	100
Optimizer	SGD
Initial Learning Rate	0.01
Learning Rate Float	0.01
Momentum	0.937

Table 4. Comparison results. The best result is indicated in bold.

Optimized Method	Precision	mAP50	mAP50:95
YOLOv8-seg	91.4	95.3	45.7
+GAM_All	90.8	94.2	44.5
+GAM_Backbone	90.4	94.7	44.6
+GAM_Neck	91.5	95.4	46.1

Table 5. Ablation experiment results. The best result is indicated in bold.

Optimized Method	GAM	Small Target Detection Layer	Transfer Learning	Precision	mAP50	mAP50:95
YOLOv8-seg				91.4	95.3	45.7
YOLOv8-seg-I	✓			91.5	95.4	46.1
YOLOv8-seg-II		✓		92.7	95.8	46.2
YOLOv8-seg-III			✓	93.5	96.0	46.2
YOLOv8-seg-IV	✓	✓		91.8	95.3	46.6
YOLOv8-seg-V	✓		✓	92.2	96.0	46.4
YOLOv8-seg-VI		✓	✓	93.6	95.1	45.4
YOLOv8-seg-VII	✓	✓	✓	92.3	96.1	47.6

Table 6. Contrast experiment results. The best result is indicated in bold.

Model	mAP50_bbox	mAP50:95_bbox	mAP50_mask	mAP50:95_mask
YOLOv8-seg	96.7	58.2	95.3	45.7
Mask R-CNN	89.4	46.2	84.7	34.1
YOLACT	94.8	53.0	92.0	39.2
SOLO	-	-	86.4	33.3
SOLOv2	-	-	90.5	37.6
QueryInst	90.4	47.1	83.8	33.0
MWIRGas-YOLO	97.3	59.9	96.1	47.6

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Xu, S.; Wang, X.; Sun, Q.; Dong, K. MWIRGas-YOLO: Gas Leakage Detection Based on Mid-Wave Infrared Imaging. Sensors 2024, 24, 4345. https://doi.org/10.3390/s24134345

AMA Style

Xu S, Wang X, Sun Q, Dong K. MWIRGas-YOLO: Gas Leakage Detection Based on Mid-Wave Infrared Imaging. Sensors. 2024; 24(13):4345. https://doi.org/10.3390/s24134345

Chicago/Turabian Style

Xu, Shiwei, Xia Wang, Qiyang Sun, and Kangjun Dong. 2024. "MWIRGas-YOLO: Gas Leakage Detection Based on Mid-Wave Infrared Imaging" Sensors 24, no. 13: 4345. https://doi.org/10.3390/s24134345

APA Style

Xu, S., Wang, X., Sun, Q., & Dong, K. (2024). MWIRGas-YOLO: Gas Leakage Detection Based on Mid-Wave Infrared Imaging. Sensors, 24(13), 4345. https://doi.org/10.3390/s24134345

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu