Open AccessArticle

Enhanced Detection of Pipeline Leaks Based on Generalized Likelihood Ratio with Ensemble Learning

Tao Liu

¹,

Xiuquan Cai

^2,3,4,

Wei Zhou

¹,

Kuitao Wang

¹ and

Jinjiang Wang

^2,3,4,*

CNOOC Research Institute Ltd., Beijing 100029, China

College of Safety and Ocean Engineering, China University of Petroleum, Beijing 102249, China

Hainan Institute of China University of Petroleum (Beijing), Sanya 572025, China

⁴

Key Laboratory of Oil and Gas Production Equipment Quality Inspection and Health Diagnosis, State Administration for Market Regulation, Beijing 102249, China

Author to whom correspondence should be addressed.

Processes 2025, 13(2), 558; https://doi.org/10.3390/pr13020558

Submission received: 23 December 2024 / Revised: 5 February 2025 / Accepted: 14 February 2025 / Published: 16 February 2025

(This article belongs to the Special Issue Progress in Design and Optimization of Fault Diagnosis Modelling)

Download

Browse Figures

Versions Notes

Abstract

To address the challenges of insufficient model generalization, high false alarm rates due to the scarcity of leakage data, and frequent minor leakage alarms in traditional weak leakage (the leakage amount is less than 1%) detection methods for gas transmission pipelines, this paper proposes a real-time weak leakage detection framework for natural gas pipelines based on the combination of the generalized likelihood ratio (GLR) and ensemble learning. Compared to traditional methods, the core innovations of this study include the following: (1) For the first time, GLR statistics are integrated with an ensemble learning strategy to construct a dynamic detection model for pipeline operating states through multi-sensor collaboration, significantly enhancing the model’s robustness in noisy environments by fusing pressure data from the pipeline inlet and outlet, as well as outlet flow data. (2) An adaptive threshold selection mechanism that dynamically optimizes alarm thresholds using the distribution characteristics of GLR statistics is designed, overcoming the sensitivity limitations of traditional fixed thresholds in complex operating conditions. (3) An ensemble decision module is developed based on a voting strategy, effectively reducing the high false alarm rates associated with single models. The model’s leakage detection capability under normal and noisy pipeline conditions was validated using a self-built gas pipeline leakage test platform. The results show that the proposed method can achieve the precise detection of pipeline leakage rates as small as 0.5% under normal and low-noise conditions while reducing the false alarm rate to zero. It can also detect leakage rates of 1.5% under strong noise interference. These findings validate its practical value in complex industrial scenarios. This study provides a high-sensitivity, low-false-alarm, intelligent solution for pipeline safety monitoring, which is particularly suitable for early warning of weak leaks in long-distance pipelines.

Keywords:

ensemble learning; gas pipeline; leakage warning; sensors

1. Introduction

With the aging of natural gas transmission pipelines year by year, the energy loss caused by leakage as well as safety and environmental protection issues has received widespread attention. As early weak leakage of the pipeline is not easy to detect, prolonged leakage can result in the continuous accumulation of the leaked medium, which will not only cause huge energy loss but also easily lead to fire, explosion, and other vicious casualty accidents. Therefore, accurate and efficient detection of the occurrence of pipeline leakage is of great significance to ensure the operational safety of gas pipelines [1].

Current pipeline leakage detection methods are categorized into hardware-based and software-based leakage detection methods according to the different methods of acquiring leakage signals. Hardware-based leak detection methods include the negative pressure wave method [2], distributed fiber optics [3], the acoustic method [4,5], the thermal imaging method [6,7], technology, etc. Hardware-based leak detection methods have the advantages of high accuracy and sensitivity, but they also generally have the shortcomings of high cost, the inability to continuously monitor, weak system robustness, alarm flooding, etc., making them difficult to apply to the real-time detection of small leaks in gas pipelines. With the rapid rise in computers, big data and artificial intelligence technology largely promote the development of software-based leak detection methods; typical software-based leak detection methods include model-based methods [8,9,10], statistics-based methods [11], data-driven methods [12], etc. The Pipe Patrol software from Krohne (Duisburg, Germany) was able to successfully detect pipeline leaks of 1.5% of the total pipeline flow by modeling real-time transients [13]. Joel Smith used real-time transient modeling and a deep convolutional neural network (CNN). Using test data, the leak detection system was able to detect leaks of 4% of the nominal pipeline flow rate with 100% accuracy and leaks as small as 1% with 97.4% accuracy [14]. Salvatore Belsito used Artificial Neural Networks (ANNs) for leak sizing and localization, a method that can detect and locate 1% of leaks in pipelines carrying hazardous substances in approximately 100 s [15]. Arifin B. M. S. et al. established a leak detection method by jointly using pipeline pressure and flow data through the Kantorovich distance and compared it with commercial applications, finding that it takes less time from leak detection to alarm [16]. A.J. Willis proposed a modified SPRT test based on the Gaussian Markov process method, which can effectively reduce the false alarm problem of classical SPRT in leak detection [17]. A comparison of specific software-based leak detection methods is shown in Table 1.

In summary, modeling and data-driven methods generally have the problems of difficult model construction, difficulty in acquiring training data, and the flooding of weak leakage alarms in gas pipeline weak leakage detection. However, the statistically based pipeline leakage detection method does not need to construct a complex mechanism model and does not require a large amount of leakage data for model training, making it a good solution to the model construction and training problems of modeling and data-driven methods.

Therefore, to improve the leakage detection accuracy of statistical methods, reduce the impact of field data, and realize the continuous state monitoring of gas transmission pipelines and real-time weak leakage high-precision detection, this paper proposes a pipeline leakage ensemble learning detection method based on the generalized likelihood ratio. The generalized likelihood ratio based on statistics is applied to construct leakage detection test statistics, a real-time analysis of the pipeline inlet and outlet pressure and flow data is conducted, and the idea of ensemble learning is applied to integrate the data characteristics and advantages of each sensor data to solve the problem of weak leakage alarm flooding.

2. Related Work

Pipeline leak detection technology is a core research direction for ensuring the safety of energy transportation. Based on the principles and implementation methods, existing techniques can be categorized into three main types: hardware sensor-based detection methods, software algorithm-driven data-based methods, and statistical model-based detection methods. Each type of method has different principles, advantages, disadvantages, and practical applicability in real-world conditions.

2.1. Hardware Sensor-Based Detection Methods

Hardware-based methods leverage physical sensors to directly capture abnormal signals during pipeline operations, with the core focus being on the integration of high-precision sensing equipment and signal processing technologies. A typical example is Distributed Fiber Optic Sensing (DFOS), which utilizes the sensitivity of optical fibers to temperature, strain, or acoustic waves to monitor physical parameter changes along the pipeline in real time. When a leak occurs, localized temperature or vibration anomalies can be detected by optical fibers. The advantages of this method include a wide detection range (covering tens of kilometers) and high localization accuracy (up to the meter level). However, it is limited by high equipment costs and susceptibility to environmental noise, such as mechanical vibrations. For instance, Zhuo et al. proposed a method utilizing distributed optical fiber sensing for gas pipeline leak detection in sandy soil [18].

Another notable hardware-based approach is the negative pressure wave (NPW) detection method, which relies on the sudden pressure drop inside the pipeline caused by a leak, generating a negative pressure wave that propagates in both directions. Leak localization is achieved by analyzing the time difference of the pressure wave arrival at both ends of the pipeline. This method offers rapid response times (in seconds) and is suitable for detecting sudden leaks. However, it is less sensitive to small leaks and prone to interference from pump or valve operations. Li et al. proposed a novel positioning algorithm based on negative pressure wave attenuation, with the experimental results showing a maximum error of 1.161% and a minimum error of 0.355% [2]. Similarly, acoustic emission (AE) technology, which analyzes high-frequency acoustic signals generated by leaks to identify leak characteristics, is another hardware-based approach. Niamat et al. developed a machine learning-based acoustic signal classification model, achieving an overall classification accuracy of 99% for water and gas leaks under pinhole conditions [5]. In summary, hardware-based pipeline leak detection techniques exhibit superior performance in localization and real-time monitoring, primarily due to the reliance on high-precision equipment. However, they generally face challenges such as high costs, poor environmental adaptability, and limited detection capabilities for minor leaks.

2.2. Data-Driven Methods Based on Software Algorithms

Data-driven methods analyze operational data from pipelines (such as pressure and flow) to construct detection models, eliminating the need for complex mechanistic modeling and making them suitable for complex operating conditions. A typical example is the use of machine learning models for leak detection, which can employ supervised learning techniques (e.g., support vector machines and Random Forests) or unsupervised learning methods (e.g., Isolation Forests) to learn leakage characteristics from historical data. The advantages of these methods include their ability to handle high-dimensional and nonlinear data, as well as their adaptability to multi-variable coupled scenarios. However, they also have limitations, such as a heavy reliance on large amounts of labeled data and the influence of the training set quality on the model’s generalization capability. For instance, Wang et al. used a Principal Component Analysis (PCA) to reduce the dimensionality of extracted feature vectors and applied it to an SVM for leak signal detection [19].

In contrast to machine learning techniques, signal processing and feature extraction methods, such as wavelet transform and Fourier analysis, are also employed in software-based leak detection. These methods extract time-frequency features from leak signals, offering the advantage of high computational efficiency and suitability for real-time detection. However, they depend on prior knowledge for feature selection and perform poorly with non-stationary signals. Xu et al. achieved noise reduction and feature extraction from acoustic signals using wavelet packet decomposition and reconstruction, extracting eight time-domain features (mean, peak, RMS, standard deviation, etc.) to form a feature space, and established a leak diagnosis model based on a Fuzzy Support Vector Machine (FSVM) [20].

To address the issues of low computational efficiency in mechanistic models and high data requirements in data-driven methods, hybrid approaches have been proposed. These methods integrate mechanistic models with data-driven models, balancing physical constraints with data flexibility and enhancing detection robustness with small sample sizes. Zhang et al. developed a nested physics-informed neural network for transient flow analysis in natural gas pipelines [21]. Data-driven methods are highly flexible but face challenges such as data scarcity, high labeling costs, and sensitivity to noise. Their performance also significantly deteriorates in low leak rate scenarios.

2.3. Statistical Model-Based Detection Methods

Statistical model-based leak detection methods rely on establishing hypothesis testing models of data distribution to identify leak anomalies from a probabilistic perspective. Unlike hardware-based methods, which require extensive sensor deployment along pipelines to meet detection accuracy requirements, these methods only necessitate monitoring data from the inlet or outlet of the pipeline, thereby substantially reducing economic costs. In contrast to software-based methods, they offer advantages such as simpler model construction and reduced reliance on large amounts of labeled data. Current typical statistical-based leak detection methods include the following: hypothesis testing, in which leak detection is treated as a binary classification problem, and this method uses likelihood ratio tests (e.g., the generalized likelihood ratio, GLR) to determine whether data deviates from the normal distribution. Wang et al. proposed a pipeline leak detection method based on hypothesis testing, which can accurately locate small leaks and distinguish them from scenarios where the absence of leaks is estimated along with the leak’s position and size [22]. Bayesian Inference, in which prior knowledge is combined with observed data. This method updates posterior probabilities to estimate leak probabilities. Li et al. introduced a probabilistic Bayesian analysis method based on a model, addressing the multi-leak detection problem in reservoir pipeline valve systems [23]. Time Series Analysis, in which autoregressive models or state space models are utilized to capture data dynamics. This method is suitable for analyzing non-stationary signals and detecting slow-developing leaks. Zhang et al. proposed a Hidden Markov Model integrated with deep neural networks for pipeline leak location detection, demonstrating superior detection performance [24].

Statistical-based methods offer notable advantages in cost reduction and simplicity of model construction. They also exhibit robust theoretical completeness and noise resistance. However, they are sensitive to data distribution assumptions, and single models still exhibit relatively high false alarm rates under complex operating conditions.

2.4. Summary

The advantages and disadvantages of the existing methods are different, and all of them are suitable for their leakage conditions; the advantages and disadvantages of the specific methods are shown in Table 2.

Through a comparative analysis of existing pipeline leak detection methods, it can be concluded that several critical research areas need breakthroughs: in the detection of weak leaks, current methods are inadequate in identifying small leak rates, resulting in high false alarm rates; in complex noise suppression, industrial site noise (such as pump and valve disturbances and flow fluctuations) severely affects detection accuracy; and in model generalization, single models struggle to adapt to different pipeline configurations and operational changes.

Therefore, addressing the issues of high false alarm rates in weak leak scenarios, the complexity of constructing mechanistic models, and the reliance of data-driven models on high-quality training data in existing pipeline leak detection methods, this paper proposes a multi-sensor fusion detection framework based on the generalized likelihood ratio (GLR) and ensemble learning. By integrating multi-sensor data and a heterogeneous model voting mechanism, the limitations of single-method approaches are overcome. The integration of heterogeneous models and the construction of the voting mechanism explore data distribution patterns from three dimensions, namely inlet and outlet pressure and outlet flow, significantly reducing the false alarm risk of single models. Compared to existing methods, this framework combines the theoretical rigor of statistical methods with the flexibility of data-driven models, providing a high-precision, low-cost solution for pipeline safety monitoring in complex noise environments, and it has significant engineering application value.

3. Theoretical Background

3.1. Generalized Likelihood Ratio

The generalized likelihood ratio test is a statistical method that compares the likelihood function values of multiple data groups to test a hypothesis. Considering that the monitoring pipeline inlet and outlet pressure and flow rate data are independent and the variable X obeys normal distribution, when the process is under control, the mean value µ₀ and the variance

σ_{0}^{2}

are known or can be estimated from the first stage, and at this time, the pipeline operation state is normal. When leakage occurs, assuming that the variance remains unchanged, the mean value changes under leakage conditions, that is, the mean value is shifted from

μ_{0}

μ_{1}

, while

σ_{0}^{2}

stays unchanged [25,26,27].

It is known that the monitored quantities, such as pipeline pressure and flow, follow a normal distribution with the formulas shown below:

f (x_{i} |μ_{0}, σ_{0}^{2}) = \frac{1}{\sqrt{2 π} σ_{0}} e x p (- \frac{{(x_{i} - μ_{0})}^{2}}{2 σ_{0}^{2}})

(1)

Denote the observation at moment i by

x_{i}

. Up to moment t, a series of observations can be obtained,

x_{1}, x_{2}, x_{t}

; assuming

0 < v < t

moments, the likelihood function is at moment

t

. Assume that a leakage occurs at moment v. The normal state is between 0 and v, and the leakage state is between v and t;

f (x_{i} |μ_{0}, σ_{0}^{2})

is the probability density function of the observation value

x_{i}

under a normal distribution with the mean

μ_{0}

and variance

σ_{0}^{2}

The distribution function of the monitored quantity is shown in Equation (1), and the likelihood function is

L (x, θ) = \prod \begin{array}{l} n \\ i = 1 \end{array} f (x_{i}, θ)

(2)

where

θ

is the variable to be estimated, and in this paper, it is the mean

μ

L (x, θ)

is the likelihood function, representing the joint probability density of observing data

x

under parameter

θ

f (x_{i}, θ)

is the probability density function of observing value

x_{i}

under parameter

θ

Then, the generalized likelihood ratio is calculated as follows:

λ (x) = \frac{S u p (L (x, μ_{0}))}{S u p (L (x, μ_{1}))}

(3)

where

λ (x)

is the generalized likelihood ratio, representing the ratio of the maximum likelihood functions under hypotheses

μ_{0}

and

μ_{1}

;

S u p (L (x, μ_{0}))

is the maximum value of the likelihood function under hypothesis

μ_{0}

; and

S u p (L (x, μ_{1}))

is the maximum value of the likelihood function under hypothesis

μ_{1}

Based on the previous analysis, assuming that the variance is constant before and after the leak, a leak is considered to have occurred if the mean value changes, and no leak is considered to have occurred if the mean value does not change.

When no leakage occurs, the likelihood function is as follows:

L (x_{i}, μ_{0}) = Õ_{i = 1}^{t} f (x_{i} | μ_{0}, σ_{0}^{2}) = {(2 π σ_{0}^{2})}^{- t / 2} e x p (- \frac{1}{2 σ_{0}^{2}} å_{i = 1}^{t} {(x_{i} - μ_{0})}^{2})

(4)

where

L (x_{i}, μ_{0})

is the likelihood function of observing data

x_{i}

under the condition of no leakage.

A is the likelihood function of observing data, D, under the condition of leakage.

When leakage occurs, the likelihood function is as follows:

\begin{array}{l} L (x_{i}, μ_{1}) = Õ_{i = 1}^{v} f (x_{i} | μ_{0}, σ_{0}^{2}) Õ_{i = v + 1}^{t} f (x_{i} | μ_{1}, σ_{0}^{2}) \\ \begin{matrix} = \end{matrix} {(2 π σ_{0}^{2})}^{- t / 2} e x p (- \frac{1}{2 σ_{0}^{2}} (å_{i = 1}^{v} {(x_{i} - μ_{0})}^{2} + å_{i = v + 1}^{t} {(x_{i} - μ_{1})}^{2})) \end{array}

(5)

where

L (x_{i}, μ_{1})

is the likelihood function of observing data

x_{i}

under the condition of leakage.

This is obtained according to Equation (3):

λ (x) = e x p (\frac{(t - v)}{2 σ_{0}^{2}} {({\hat{μ}}_{1} - μ_{0})}^{2})

(6)

The test statistic is

R_{t} = l n λ (x) = \frac{(t - v)}{2 σ_{0}^{2}} {({\hat{μ}}_{1} - μ_{0})}^{2}

(7)

where

{\hat{μ}}_{1} = \frac{1}{t - v} å_{i = v + 1}^{t} x_{i}

is the great likelihood estimator of

μ_{1}

. When Rt is greater than a certain threshold, the two sets of data are considered to have a large offset, and the pipeline is leaking.

Based on the above formula, the essence of applying the generalized likelihood ratio for leak detection is to calculate the generalized likelihood ratio of two adjacent sets of monitoring data with the same data points to determine the occurrence of a leak. If the value of the generalized likelihood ratio exceeds a certain threshold, a leak can be deemed to have occurred; otherwise, no leak is considered to have occurred.

3.2. Ensemble Learning

Ensemble Learning is a machine learning method that enhances model prediction performance by combining multiple base learners (weak learners). Its core idea is to integrate the prediction results of multiple models to reduce the bias, variance, or overfitting issues of a single model, thereby improving overall accuracy and robustness. Ensemble learning techniques primarily include bagging, boosting, stacking, and voting. Bagging involves conducting multiple random sampling of the training data and training multiple base learners, with the final prediction results being obtained by averaging or voting, with typical algorithms such as Random Forests being used. Boosting sequentially trains multiple base learners, with each learner correcting the errors of the previous one, with typical algorithms like AdaBoost and Gradient Boosting being used. Stacking uses the outputs of multiple base learners as new features input into a meta-learner for final prediction. Voting determines the final output by voting or averaging the prediction results of multiple base learners. The advantages of ensemble learning lie in improving accuracy, enhancing robustness, reducing bias and variance, and offering flexibility, making it widely applicable in fields such as classification, regression, and anomaly detection. However, ensemble learning also faces challenges such as high computational complexity, poor model interpretability, and the risk of overfitting. Nevertheless, due to its powerful performance and broad applicability, ensemble learning has become an essential tool in modern data science and machine learning. Compared with traditional machine learning algorithms, ensemble learning algorithms have higher accuracy and stability in problems such as classification, regression, outlier detection, etc. Wang C. et al. combined a sparse self-encoder with an improved support vector machine to establish an ensemble learning framework and simultaneously proposed an LFPSO algorithm to optimize the parameters of the support vector machine. They verified the method’s validity through simulation examples, demonstrating improved classification accuracy in pipeline leakage detection [28]. Rao S. et al. proposed an overlapping information feature selection method (OIFS) and, at the same time, proposed a diagnostic model based on the OIPS with feature combination generation; combined with intelligent voting ensemble learning, the method is compared with stochastic deep forests, gradient-enhanced decision trees, and LightGBM, and the accuracy of the proposed method’s fault type identification is 100% [29].

In this paper, the generalized likelihood ratio leakage detection model based on inlet and outlet pressure data and outlet flow data is used as the base model for ensemble learning, and ensemble voting is performed based on the leakage detection results of the three models; the voting results are used as the final diagnosis results of leakage status. The schematic diagram of ensemble learning is shown in Figure 1. The model refers to the fundamental model that constitutes the ensemble model, which can be any type of learning algorithm, such as decision trees, support vector machines, neural networks, and so on. Ensemble learning improves the overall model’s performance and robustness by combining the predictions of multiple such individual learners.

Figure 1 illustrates the application process of the ensemble learning voting method. It starts by identifying three independent models, which are then trained and tested separately. These trained models are applied to make preliminary judgments on the pipeline’s leak status, yielding individual leak detection conclusions from each model. The ensemble learning voting method is then employed, where a leak is finally output if two out of the three models indicate a leak; otherwise, no leak is output.

4. Theoretical Frameworks

This paper proposes a real-time weak leakage detection method for natural gas pipelines based on the generalized likelihood ratio (GLR) and ensemble learning, utilizing pipeline inlet and outlet pressure and outlet flow rate data for leakage detection. The framework of the method is illustrated in Figure 2. Firstly, pressure sensors and flow sensors are deployed at the inlet and outlet of the pipeline. The inlet sensor can be placed 10 times the pipe diameter from the pipeline inlet, and the outlet sensor can be placed 20 times the pipe diameter from the outlet, based on engineering practices, to avoid turbulence interference. The collected inlet pressure, outlet pressure, and outlet mass flow rate data are preprocessed using the Exponential Moving Average (EMA) to smooth noise and enhance signal quality. Secondly, an adaptive threshold selection method based on the GLR test statistic is proposed. The GLR statistic is calculated using a sliding time window to dynamically adjust the leakage alarm threshold, and the time window size is optimized to balance detection sensitivity and computational efficiency. The GLR framework models the normal operating state of the pipeline as the null hypothesis and deviations caused by leakage as the alternative hypothesis, identifying leaks by comparing likelihood ratios. The core innovation of this method lies in the first-time integration of the GLR with ensemble learning, leveraging three independent learners based on the generalized likelihood ratio leakage detection model using pipeline inlet and outlet pressure and outlet flow rate data. The outputs of these learners are integrated through a majority voting strategy, significantly reducing the false alarm rate of single models and addressing the limitations of traditional fixed thresholds and single models in high-noise environments. This advancement drives technological progress in the field of pipeline leakage detection.

4.1. Data Pre-Processing

In order to ensure detection accuracy, the collected data need to be processed to reduce the noise in the data, and according to the needs of this paper for fast response and effective noise reduction, the EMA denoising algorithm that can efficiently process and retain the historical information is selected, the EMA value is calculated from the first data point, and the EMA value is calculated according to the EMA recursive formula for each data point in turn. The EMA value is calculated using Equation (8) [30,31]:

f_{t} = α (p_{t} + β p_{t - 1} + β^{2} p_{t - 2} + \dots + β^{t - 1} p_{1}) = α p_{t} + (1 - α) p_{t - 1}

(8)

where

α

represents the smoothing parameter, taking values between 0 and 1, and

β

1 - α

; p_t represents the data to be processed, and it represents the data values after processing.

4.2. Time Window Selection

In order to realize real-time leakage detection based on the generalized likelihood ratio, the concept of time window is introduced in the leakage detection process, the leakage judgment is carried out point by point based on the unit of time window, and the data in the time window are evenly divided into two copies according to the time sequence, with the former one being the state before the change, and the latter one being the state after the change, to verify the change in the mean value of the data before and after the change. The schematic diagram of the time window setup is shown in Figure 3. The first n points in time window 1 are used as the state before the change, and n to 2n points are used as the state after the change.

Consequently, the selection of the time window size is crucial for the model’s leak detection accuracy. If the time window is too large, it can lead to significant delays in leak alarms, failing to meet the real-time requirements of pipeline leak detection. Conversely, a time window that is too small can exacerbate the impact of individual outliers on alarms, thereby increasing the likelihood of false alarms. Selecting the appropriate time window size before applying the model is crucial for ensuring accuracy.

4.3. Threshold Selection Based on Test Statistic

To accurately select the leakage alarm threshold, this paper proposes a method for determining the alarm threshold based on test statistics. The schematic for this threshold selection method is depicted in Figure 4. In the laboratory setting, a leakage scenario is established, and detection is conducted under these conditions. The test statistics for the three types of sensor data—the inlet and outlet pressure and the outlet flow rate—are calculated. The onset and conclusion times of the leakage are determined, thereby delineating the scope of the leakage alarm. By observing the image, the optimal leakage alarm threshold is determined.

From Figure 4, it can be seen that under the leakage condition, the threshold set in the red line position can maximally meet the leakage alarm and reduce the model false alarms; if it is set in the blue line position, it can maximally reduce the false alarms, but it does not meet the leakage alarm requirements. If it is set in the cyan line position, it can satisfy the alarm demand when leakage occurs, but a large number of false alarms will occur. Therefore, it is most appropriate to choose the red line for the leakage alarm threshold.

During the application of the above threshold selection method, it is necessary to clearly define the operating conditions of the pipeline before it officially goes into production, including the operating pressure and flow rate. Additionally, under the determined operating conditions, leakage detection tests should be conducted for different leak rates, and based on the test data, a threshold selection graph for the test statistics should be plotted to determine the optimal leak alarm threshold.

4.4. Comparative Validation

To provide a clearer illustration of the model’s leakage detection efficacy, we selected the number of alarm points during leakage occurrences and the number of false alarm points before and after leakage events as comparative indicators. Specifically, the number of alarm points reflects the model’s response at each point during a leak. Since the detection of any single alarm point during a leakage event indicates successful leak detection, the presence of alarm points serves as a metric for assessing the model’s leak detection capability. However, it is important to note that the quantity of alarm points is not utilized as a comparative measure of the model’s leakage detection proficiency. On the other hand, the number of false alarm points serves as an indicator of the model’s proficiency in preventing erroneous alarms both before and after leakage events. A higher number of false alarms suggests reduced model effectiveness, making the number of false alarm points a primary metric for evaluating the model’s leak detection performance.

5. Experimental Study

5.1. Experimental Setup

To accurately analyze the monitoring performance of the model and validate the accuracy of the ensemble learning-based pipeline leakage detection method using the generalized likelihood ratio for weak leakage detection, a leakage detection experimental setup was constructed, as shown in Figure 5. The leak detection device consists of an air compressor; an inlet flow control valve; drying tubes A, B, C, and D; a pressure stabilizer; pressure sensors; flow sensors; a data acquisition system; a three-way valve; and a leak port. The air compressor supplies compressed air, which is regulated by the inlet flow control valve and passes through the drying tubes to remove moisture. The pressure stabilizer ensures stable fluid pressure. Pressure sensors and flow sensors collect data on the pressure and flow at the inlet and outlet of the pipeline. Using this experimental bench, the pressure and flow data of the pipeline’s inlet and outlet were collected under a leakage condition, where the leakage volume was 3% of the pipeline flow. To set the 3% leakage condition, ports A and B were closed initially to ensure stable operation. During this time, the flow data under normal operating conditions were recorded as M1. The test was then paused, and a flow sensor was placed at the position of leakage port B to detect the leakage flow rate. The results for the leakage flow rate were recorded as M2. The inlet conditions remained unchanged throughout the process. The size of the leakage port at position B was adjusted using the needle valve to achieve the desired leakage rate. Finally, the second measurement of the leakage flow data at position B was compared with the first measurement of the pipeline flow data (M1). The M2/M1 ratio was calculated, and this ratio was used to determine the corresponding leakage rate for different leakage conditions. Therefore, in this article, the definition of the leakage rate is the percentage of gas flow in the leaking area relative to the total gas flow in the pipeline.

5.2. Time Window Selection

In order to select the size of the time window based on the generalized likelihood ratio leakage detection, the inlet pressure data under a 3% leakage condition is taken as an example, and 5 × 10⁹ is used as the alarm threshold for the selection of the time window size. A total of 13 groups of different time window sizes were set with 20 as the difference in the selection process, and the numbers of alarm points and false alarm points for the 13 groups of different time window sizes were calculated, as shown in Figure 6.

As observed from Figure 6, it is evident that as the time window increases, the model’s alarm points gradually rise while the false alarm points decrease. Taking into account the model’s alarm level, false alarm rate, and alarm lag, the optimal time window size was determined as 200. Compared to smaller time windows such as 20 and 40, a time window of 200 demonstrates superior alarm effectiveness and resistance to false alarms. Additionally, when compared to larger time windows like 240 and 260, which offer similar anti-false alarm and alarm effects, a time window of 200 exhibits a reduced alarm lag. In summary, a time window of 200 provides an optimal balance between alarm effectiveness, false alarm reduction, and minimal alarm lag.

5.3. Threshold Selection

To determine the alarm threshold size for each sensor, this study uses a 3% leakage scenario as an illustrative example. The threshold selection method based on test statistics, as depicted in Figure 4, was employed. Specifically, the pipeline inlet and outlet pressures, along with the outlet flow rate, were plotted to generate the threshold selection map for test statistics. The corresponding alarm thresholds were then identified based on the observed variations in the test statistics. The threshold selection diagram is shown in Figure 7, Figure 8 and Figure 9.

According to Figure 7, the model alarm point is more centralized. Before and after the leak occurs, only a small number of false alarm points are generated, so the alarm and false alarm indicators should be comprehensively considered, and the red line in Figure 7 is selected as the alarm threshold for 5 × 10⁹. According to the test statistics in Figure 8, when the leak occurs and before and after the occurrence of larger fluctuations, it is difficult to select a threshold that can make the alarm achieve the best effect and that can achieve the lowest number of false alarms; therefore, considering the effects of both, the red line in Figure 8 is selected as the alarm threshold, which is 1.5 × 10⁹. According to Figure 9, the fluctuation in test statistics before and after the leakage is small, and the phenomenon of false alarms is not obvious; therefore, the principle of selection is to maximize the alarm demand to meet the model, and the red line in Figure 9 is selected as the alarm threshold, which is 4 × 10¹².

5.4. Application Validation

The time windows and alarm thresholds selected in Section 4.2 and Section 4.3 are applied to detect leakage for the 3% leakage condition, and finally, the ensemble learning leakage detection model based on the generalized likelihood ratio is applied to complete leakage detection for the 3% leakage condition.

5.4.1. Generalized Likelihood Ratio Leak Detection Based on Inlet Pressure

The inlet pressure-based generalized likelihood ratio leakage detection method is applied to detect leakage for a 3% leakage condition, and the leakage detection schematic is shown in Figure 10. From Figure 10, it can be seen that the inlet pressure-based generalized likelihood ratio leakage detection method can accurately identify the occurrence of leakage and alarms according to the threshold, although it is not possible to activate the alarm for the whole leakage period when the leakage occurs. However, the leakage is accurately detected, the number of inlet pressure-based leakage detection alarm points is 308, and the number of false alarm points is 2.

5.4.2. Generalized Likelihood Ratio Leak Detection Based on Outlet Pressure

The generalized likelihood ratio leakage detection method based on outlet pressure is applied to detect leakage for a 3% leakage condition, and the leakage detection schematic is shown in Figure 11. From Figure 11, it can be seen that the outlet pressure-based method recognizes the occurrence of leakage and triggers an alarm according to the threshold value, but false alarms occur both before and after the real leakage alarms, and the number of outlet pressure-based leakage detection alarms is counted to be 323, and that of false alarms is counted to be 40.

5.4.3. Generalized Likelihood Ratio Leak Detection Based on Outlet Flow Rate

The generalized likelihood ratio leakage detection method based on the outlet flow rate is applied to detect leakage for a 3% leakage condition, and the leakage detection schematic is shown in Figure 12. From Figure 12, it can be seen that the outlet flow-based method recognizes the occurrence of leakage and triggers an alarm according to the threshold value, and there is no false alarm before or after the real leakage alarm. The number of outlet flow-based leakage detection alarm points is 76, and 0 false alarm points are counted.

5.4.4. Ensemble Learning Leak Detection

Based on the analyzed results in Figure 10, Figure 11 and Figure 12, all three models detected the occurrence of leakage, and all of them have better performance in leakage detection. The ensemble learning voting method is applied, and according to the three detection results, if there are two or more models, the alarm occurs, which is the output; otherwise, the output is not an alarm. The integration rules for the results of the three models are used to output ensemble learning after 3% of the final leakage detection, and the results as shown in Figure 13. The ensemble learning model resulted in 176 alarm points after analysis, the number of false alarms points is 2, and the leakage can be detected and compared to the leakage detection based on the outlet pressure, which reduces the number of false alarm points of the model by 38 points. Compared to the leakage detection based on the inlet pressure and flow rate, false alarms are not significantly reduced, but the protection of the two base models with the advantages of a low number of false alarms is maximized.

During experimental validation, the generalized likelihood ratio (GLR) leakage detection based on inlet pressure identified 2 false alarm points, while the GLR detection based on outlet pressure identified 40 false alarm points. In contrast, the GLR detection based on the outlet flow rate did not detect any false alarms. These results indicate significant differences in the detection performance of different sensors, with the outlet pressure data exhibiting a higher false alarm rate due to increased noise and interference. The ensemble learning-based detection method, which integrates information from multiple sensors, reduces the number of false alarms associated with individual sensors, thereby achieving higher overall detection performance and robustness. Although the GLR detection model based on the outlet flow rate did not produce any false alarms, its standalone use may not cover all leakage scenarios. By combining the strengths of multiple models, ensemble learning ensures detection sensitivity while significantly reducing the false alarm rate, making it the optimal detection method.

6. Discussions

6.1. Application Validation

In order to better analyze the application effect of ensemble learning combined with the enhanced leakage detection model in pipeline small leakage detection, the test bench shown in Figure 5 is applied to obtain four different small leakage rates of 0.5%, 1%, 1.5%, and 2%, respectively, based on the inlet and outlet pressures, outlet flow rate, and ensemble learning model leakage detection analysis. The model leakage detection alarm schematic is shown in Figure 14. The alarm point and false alarm point values of each model are calculated under different leakage rates, as shown in Table 3.

From Figure 14 and Table 3, it can be seen that under 0.5%, 1.5%, and 2% leakage conditions, all three models successfully detected the leak. Aside from the 2% flow-based leakage detection, the leakage detection of the three models under the rest of the conditions had a large number of false alarms, with the highest number of false alarms of leakage detection being 196, whereas leakage detection with the integration of ensemble learning ensured that the model leakage detection capability was reduced to 0 in the 0.5%, 1.5%, and 2% leakage conditions. The ensemble learning leakage detection method reduces the number of false alarms to 0 under the 0.5%, 1.5%, and 2% leakage conditions, ensuring the model’s leakage detection capability. Under the 1% leakage condition, the three leakage detection models are still detecting leakage, and at the same time, there is a large number of false alarm points. The 1% condition does not achieve 0 false alarm points but achieves 27 false alarm points, which is the lowest value of the three models. Reduced the number of false alarms by 240 compared to the inlet pressure leak detection method, reduced the number of false alarms by 94 compared to the outlet pressure leak detection method, and reduced the number of false alarms by 1 compared to the outlet flow re-run.

In summary, the ensemble learning leak detection method based on the generalized likelihood ratio can ensure the effectiveness of small leakage detection in pipelines and minimize false alarms. Under the 0.5% condition and other small leakage detection conditions, the maximum false alarm point will be 0, effectively solving the problem of alarm flooding in existing leak detection methods in small leakage detection in pipelines. The ensemble learning leak detection model based on the generalized likelihood ratio offers several advantages. Compared to traditional model-based and data-driven methods, it requires less data and features a simpler model construction, and it has the advantage of a low cost compared to hardware-based leak detection methods, meaning the low-cost, high-precision detection of weak leaks in gas pipelines can be realized.

6.2. The Application of the Model to Different Noise Values

We aimed to verify the effectiveness of the proposed method in the application of small leakage detection in real pipelines. The Gaussian white noise with the mean value of 0 and noise variance values of 0.0002, 0.0004, 0.0006, 0.0008, and 0.001 were added to 0.5%, 1%, 1.5%, 2%, and 3% working conditions, and the corresponding alarm points and false alarm points were calculated to verify the application of the model in real pipelines. The principle of noise setting was determined by multiplying the maximum and minimum differences in the original data by the corresponding percentage. The results of pipeline leakage detection under different working conditions with different noises are shown in Table 4, Table 5, Table 6, Table 7 and Table 8.

The experimental data demonstrate that as the leakage rate decreases, the ability of a single sensor to detect leaks significantly declines, highlighting the necessity of multi-sensor fusion. The proposed leak detection model based on the generalized likelihood ratio (GLR) and ensemble learning successfully detected leak rates of 3%, 2%, and 1.5% under five different noise levels, generating corresponding alerts. Notably, the model stood out in its ability to reduce false alarms, achieving zero false positives under most operating conditions, which shows that it significantly outperforms traditional methods.

At a leakage rate of 1%, the model could detect leaks affected by Noise 1 and Noise 2, while at a leakage rate of 0.5%, the model only detected leaks affected by Noise 1. These results indicate that the model can detect the lowest leakage rate of 1.5% under typical noise conditions; under weak noise conditions, it can detect the lowest leakage rate of 0.5%. These findings validate the model’s robustness in high-noise environments and its high sensitivity for detecting subtle leaks.

Furthermore, the ensemble learning-based generalized likelihood ratio leak detection method excels in minimizing false alarm rates and preventing alarm flooding. Regardless of the leakage rate or noise level, the method maintains stable performance, reducing false alarm rates to zero in most cases, which shows that it significantly outperforms single sensors or traditional detection methods. This performance improvement is of great significance in real industrial scenarios, effectively reducing resource waste and operational interference caused by false alarms. In summary, the experimental results not only validate the effectiveness of the proposed method in detecting subtle leaks but also highlight its significant advantages in reducing false alarm rates and adapting to complex noise environments. These key findings provide strong technical support for the safety monitoring of gas transmission pipelines and lay a solid foundation for future research.

7. Conclusions

To address the issues of high false alarm rates in weak leakage detection, difficulties in constructing mechanistic models, and challenges in obtaining high-quality data for data-driven methods in current gas pipeline leakage detection using single models, this paper proposes an ensemble learning-based leakage detection method utilizing the generalized likelihood ratio (GLR). This method integrates three GLR leakage detection models based on pipeline inlet and outlet pressures and outlet flow rates and employs an ensemble learning voting approach at the decision level, significantly improving detection accuracy and reducing false alarms. The experimental results demonstrate that the method exhibits excellent performance under both normal and noisy conditions, and the specific conclusions as follows:

(1): High Accuracy and Low False Alarm Rate: The GLR-based ensemble learning leakage detection method not only accurately detects weak leaks as low as 0.5% of the pipeline’s delivery volume but also reduces the number of false alarm points to zero. Compared to existing methods, this approach stands out in reducing false alarms, providing a low-cost, high-precision solution for weak leakage detection in gas pipelines.
(2): Robustness in Noisy Environments: By adding noise to the original data, the model’s detection capability was validated at leakage rates of 3%, 2%, 1.5%, and 0.5%. The results show that even under strong noise interference, the model can detect leaks at a 1.5% rate; under weak noise conditions, it can detect leaks at a 0.5% rate, demonstrating excellent noise resistance.
(3): Method Advantages: Compared to existing methods, the proposed approach offers advantages such as simple model construction, no need for leakage data, and high detection accuracy, making it particularly suitable for real-time leakage monitoring in practical industrial scenarios.

Although the GLR-based leakage detection method performs exceptionally well in pipeline leakage detection, it should be noted that the current method can only detect leaks and cannot determine their locations or volumes. Future research will focus on the following directions:

(1): Leak Localization Technology: Developing methods for precise leak localization by combining distributed sensor networks and signal processing techniques.
(2): Leak Volume Estimation: Achieving accurate estimations of leak volumes through multi-source data fusion and machine learning algorithms.
(3): Adaptability to Complex Conditions: Further optimizing the model to adapt to more complex conditions (such as multiphase flow, extreme temperatures, etc.), enhancing the method’s universality and practicality.

This research provides new ideas and methods for weak leakage detection in gas pipelines, advancing technological progress in the field and holding significant theoretical and practical engineering value.

Author Contributions

Conceptualization, T.L. and X.C.; methodology, K.W.; software, J.W.; validation, W.Z., T.L., and X.C.; formal analysis, K.W.; data curation, T.L.; writing—original draft preparation, J.W. All authors have read and agreed to the published version of the manuscript.

Funding

Funding was provided by the National Major Science and Technology Project (2024ZD1403305) and the CNOOC 14th Five-Year Plan Major Science and Technology Project (KJGG-2023-17-01).

Data Availability Statement

The data that support the findings of this study are available upon reasonable request.

Conflicts of Interest

Authors Tao Liu, Wei Zhou and Kuitao Wang were employed by the company CNOOC Research Institute Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Nomenclature

$f (x_{i} \|μ_{0}, σ_{0}^{2})$	Probability density function of observation $x_{i}$ under normal distribution with mean $μ_{0}$ and variance $σ_{0}^{2}$
$L (x_{i}, μ_{1})$	Likelihood function, representing joint probability density of observed data $x_{i}$ under parameter
$λ (x)$	Generalized likelihood ratio, representing ratio of maximum likelihood functions under hypotheses $μ_{0}$ and $μ_{1}$
$S u p (L (x, μ_{0}))$	Maximum value of likelihood function under hypothesis $μ_{0}$
$S u p (L (x, μ_{1}))$	Maximum value of likelihood function under hypothesis $μ_{1}$
$v$	Time point when leakage occurs
$μ_{1}$	Mean value under leakage conditions

References

Li, Q.; Li, Q.; Cao, H.; Wu, J.; Wang, F.; Wang, Y. The Crack Propagation Behaviour of CO₂ Fracturing Fluid in Unconventional Low Permeability Reservoirs: Factor Analysis and Mechanism Revelation. Processes 2025, 13, 159. [Google Scholar] [CrossRef]
Li, J.; Zheng, Q.; Qian, Z.; Yang, X. A novel location algorithm for pipeline leakage based on the attenuation of negative pressure wave. Process Saf. Environ. Prot. 2019, 123, 309–316. [Google Scholar] [CrossRef]
Cheng, L.; Pan, P.; Sun, Y.; Zhang, Y.; Cao, Y. A distributed fibre optic monitoring method for ground subsidence induced by water pipeline leakage. Opt. Fiber Technol. 2023, 81, 103495. [Google Scholar] [CrossRef]
AdAdnan, N.F.; Ghazali, M.F.; Amin, M.M.; Hamat, A.M.A. Leak detection in gas pipeline by acoustic and signal processing—A review. IOP Conf. Ser. Mater. Sci. Eng. 2015, 100, 012013. [Google Scholar] [CrossRef]
Ullah, N.; Ahmed, Z.; Kim, J.M. Pipeline leakage detection using acoustic emission and machine learning algorithms. Sensors 2023, 23, 3226. [Google Scholar] [CrossRef] [PubMed]
Park, S.; Lim, H.; Tamang, B.; Jin, J.; Lee, S.; Park, S.; Kim, Y. A preliminary study on leakage detection of deteriorated underground sewer pipes using aerial thermal imaging. Int. J. Civ. Eng. 2020, 18, 1167–1178. [Google Scholar] [CrossRef]
Jadin, M.S.; Ghazali, K.H. Gas leakage detection using thermal imaging technique. In Proceedings of the 2014 UKSim-AMSS 16th International Conference on Computer Modelling and Simulation, Cambridge, UK, 26–28 March 2014; pp. 302–306. [Google Scholar]
Tajalli, S.A.M.; Moattari, M.; Naghavi, S.V.; Salehizadeh, M.R. A Novel Hybrid Internal Pipeline Leak Detection and Location System Based on Modified Real-Time Transient Modelling. Modelling 2024, 5, 1135–1157. [Google Scholar] [CrossRef]
Bustnes, T.E.; Rousselet, M.; Berland, S. Leak detection performance of a commercial Real Time Transient Model for Troll oil pipeline. In Proceedings of the PSIG Annual Meeting, Napa Valley, CA, USA, 24–27 May 2011; p. PSIG-1114. [Google Scholar]
Malekpour, A.; She, Y. Real-time leak detection in oil pipelines using an Inverse Transient Analysis model. J. Loss Prev. Proc. 2021, 70, 104411. [Google Scholar] [CrossRef]
Zhang, X.J. Statistical leak detection in gas and liquid pipelines. Pipes Pipelines Int. 1993, 38, 26–29. [Google Scholar]
Arifin, B.M.S.; Li, Z.; Shah, S.L.; Meyer, G.A.; Colin, A. A novel data-driven leak detection and localization algorithm using the Kantorovich distance. Comput. Chem. Eng. 2018, 108, 300–313. [Google Scholar] [CrossRef]
Lang, X.M. Pipeline Leak Detection and Localization Based on Feature Extraction and Information Fusion. Ph.D. Thesis, Northwestern Polytechnical University, Xi’an, China, 2018. (In Chinese). [Google Scholar]
Smith, J.; Chae, J.; Learn, S.; Hugo, R.; Park, S. Pipeline rupture detection using real-time transient modelling and convolutional neural networks. Proc. Int. Pipeline Conf. Am. Soc. Mech. Eng. 2018, 51883, V003T04A016. [Google Scholar]
Belsito, S.; Lombardi, P.; Andreussi, P.; Banerjee, S. Leak detection in liquefied gas pipelines by artificial neural networks. AIChE J. 1998, 44, 2675–2688. [Google Scholar] [CrossRef]
Arifin BM, S.; Li, Z.; Shah, S.L. Pipeline leak detection using particle filters. IFAC-PapersOnLine 2015, 48, 76–81. [Google Scholar] [CrossRef]
Willis, A.J. Design of a modified sequential probability ratio test (SPRT) for pipeline leak detection. Comput. Chem. Eng. 2011, 35, 127–131. [Google Scholar] [CrossRef]
Chen, Z.; Zhang, C.C.; Shi, B.; Zhang, Y.; Wang, Z.; Wang, H.; Xie, T. Detecting gas pipeline leaks in sandy soil with fiber-optic distributed acoustic sensing. Tunn. Undergr. Space Technol. 2023, 141, 105367. [Google Scholar] [CrossRef]
Wang, F.; Lin, W.; Liu, Z.; Wu, S.; Qiu, X. Pipeline leak detection by using time-domain statistical features. IEEE Sens. J. 2017, 17, 6431–6442. [Google Scholar] [CrossRef]
Xu, Q.; Zhang, L.; Liang, W. Acoustic detection technology for gas pipeline leakage. Process Saf. Environ. Prot. 2013, 91, 253–261. [Google Scholar] [CrossRef]
Zhang, C.; Shafieezadeh, A. Nested physics-informed neural network for analysis of transient flows in natural gas pipelines. Eng. Appl. Artif. Intell. 2023, 122, 106073. [Google Scholar] [CrossRef]
Wang, X. Pipeline leakage alarm via bootstrap-based hypothesis testing. Mech. Syst. Signal Process. 2022, 179, 109334. [Google Scholar] [CrossRef]
Li, J.; Wu, Y.; Zheng, W.; Lu, C. A model-based bayesian framework for pipeline leakage enumeration and location estimation. Water Resour. Manag. 2021, 35, 4381–4397. [Google Scholar] [CrossRef]
Zhang, M.; Chen, X.; Li, W. A hybrid hidden Markov model for pipeline leakage detection. Appl. Sci. 2021, 11, 3138. [Google Scholar] [CrossRef]
Zhang, Q.; Basseville, M. Statistical detection and isolation of additive faults in linear time-varying systems. Automatica 2014, 50, 2527–2538. [Google Scholar] [CrossRef]
Xia, Y.; Amann, A.; Liu, B. Detection of abrupt changes in electrocardiogram with generalized likelihood ratio algorithm. IET Signal Process. 2010, 4, 650–657. [Google Scholar] [CrossRef]
Borguet, S.; Leonard, O. A Generalised Likelihood Ratio Test for Adaptive Gas Turbine Health Monitoring. Turbo Expo Power Land Sea Air 2008, 43123, 7–17. [Google Scholar]
Wang, C.; Han, F.; Zhang, Y.; Lu, J. An SAE-based resampling SVM ensemble learning paradigm for pipeline leakage detection. Neurocomputing 2020, 403, 237–246. [Google Scholar] [CrossRef]
Rao, S.; Zou, G.; Yang, S.; Barmada, S. A feature selection and ensemble learning based methodology for transformer fault diagnosis. Appl. Soft Comput. 2024, 150, 111072. [Google Scholar] [CrossRef]
Hunter, J.S. The exponentially weighted moving average. J. Qual. Technol. 1986, 18, 203–210. [Google Scholar] [CrossRef]
Winters, P.R. Forecasting sales by exponentially weighted moving averages. Manag. Sci. 1960, 6, 324–342. [Google Scholar] [CrossRef]

Figure 1. Schematic diagram of ensemble learning.

Figure 2. Framework of combined enhanced leakage warning method based on ensemble learning.

Figure 3. Schematic diagram of time window selection.

Figure 4. A schematic diagram of the threshold selection method based on the test statistic.

Figure 5. Pipeline media leakage test rig.

Figure 6. Schematic diagram of time window selection.

Figure 7. Schematic diagram of inlet pressure threshold selection.

Figure 8. Schematic diagram of outlet pressure threshold selection.

Figure 9. Schematic diagram of outlet flow threshold selection.

Figure 10. Schematic diagram of inlet pressure alarm.

Figure 11. Schematic diagram of outlet pressure alarm.

Figure 12. Schematic diagram of outlet flow alarm.

Figure 13. Schematic diagram of ensemble learning alarm.

Figure 14. Comparison of leakage detection of different models.

Table 1. Comparison of software-based leak detection methods.

Methodologies	Vintage	Drawbacks	Sample Method
Modeling approach	Detects small leaks with high accuracy	Difficult model construction and computational inefficiency	Real-time transient method
Data-driven approach	High detection efficiency, poor accuracy	Requires large amounts of leakage data to train model	Neural networks, support vector machines
Statistical methods	Simple model, low cost	Highly influenced by field data	Sequential probability ratio, generalized correlation

Table 2. Advantages and disadvantages of various methods and applicable scenarios.

Methodologies	Vintage	Drawbacks	Adapt to Situation
Hardware Law	Accurate positioning and strong real-time performance	High cost and poor environmental adaptability	Sudden leak detection of long-distance pipelines
Software Law	Flexibility and adaptability to nonlinear data	Dependent on annotated data, noise-sensitive	Multivariate coupling for complex operating conditions
Statistical Methods	Robust theoretical foundation and good noise resistance	Assumption limiting, high false positive rate	Weak leak detection in steady-state conditions

Table 3. Comparison of alarm points and missed alarm points.

Model	Norm	0.5%	1%	1.5%	2%
Inlet pressure	Warning Point	12	68	83	177
Inlet pressure	False Positive Point	27	267	71	116
Outlet pressure	Warning Point	43	57	116	108
Outlet pressure	False Positive Point	34	121	105	196
Outlet flow	Warning Point	88	48	15	63
Outlet flow	False Positive Point	3	28	58	0
Ensemble learning	Warning Point	12	28	63	101
Ensemble learning	False Positive Point	0	27	0	0

Table 4. Comparison of alarm/leakage points for each model at different noise levels at 0.5% leakage rate.

Model	Norm	Noise 1	Noise 2	Noise 3
Inlet pressure	False Positive Point	24	0	0
Inlet pressure	Warning Point	16	21	0
Outlet pressure	False Positive Point	62	0	0
Outlet pressure	Warning Point	14	8	3
Outlet flow	False Positive Point	64	0	0
Outlet flow	Warning Point	0	0	0
Ensemble learning	False Positive Point	24	0	0
Ensemble learning	Warning Point	0	0	0

Table 5. Comparison of alarm/leakage points for each model at different noise levels at 1% leakage rate.

Model	Norm	Noise 1	Noise 2	Noise 3	Noise 4
Inlet pressure	False Positive Point	42	21	0	0
Inlet pressure	Warning Point	153	70	43	13
Outlet pressure	False Positive Point	56	39	26	0
Outlet pressure	Warning Point	200	183	46	20
Outlet flow	False Positive Point	38	33	38	0
Outlet flow	Warning Point	0	0	0	0
Ensemble learning	False Positive Point	16	6	0	0
Ensemble learning	Warning Point	9	19	0	0

Table 6. Comparison of alarm/leakage points for each model at different noise levels at 1.5% leakage rate.

Model	Norm	Noise 1	Noise 2	Noise 3	Noise 4	Noise 5
Inlet pressure	False Positive Point	85	57	68	21	58
Inlet pressure	Warning Point	54	40	0	0	0
Outlet pressure	False Positive Point	122	134	54	73	51
Outlet pressure	Warning Point	126	43	15	2	0
Outlet flow	False Positive Point	0	0	0	0	0
Outlet flow	Warning Point	0	0	0	0	0
Ensemble learning	False Positive Point	68	52	18	3	9
Ensemble learning	Warning Point	0	0	0	0	0

Table 7. Comparison of alarm/leakage points for each model at different noise levels at 2% leakage rate.

Model	Norm	Noise 1	Noise 2	Noise 3	Noise 4	Noise 5
Inlet pressure	False Positive Point	161	120	67	35	103
Inlet pressure	Warning Point	91	47	49	58	0
Outlet pressure	False Positive Point	102	103	88	78	3
Outlet pressure	Warning Point	153	101	53	0	0
Outlet flow	False Positive Point	58	25	35	0	0
Outlet flow	Warning Point	0	0	0	0	0
Ensemble learning	False Positive Point	96	77	66	21	3
Ensemble learning	Warning Point	51	15	20	0	0

Table 8. Comparison of alarm/leakage points for each model at different noise levels at 3% leakage rate.

Model	Norm	Noise 1	Noise 2	Noise 3	Noise 4	Noise 5
Inlet pressure	False Positive Point	315	267	109	60	51
Inlet pressure	Warning Point	0	0	0	0	0
Outlet pressure	False Positive Point	303	218	157	92	85
Outlet pressure	Warning Point	11	22	3	0	0
Outlet flow	False Positive Point	68	51	34	0	0
Outlet flow	Warning Point	0	0	0	0	0
Ensemble learning	False Positive Point	163	143	68	21	39
Ensemble learning	Warning Point	0	0	0	0	0

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Liu, T.; Cai, X.; Zhou, W.; Wang, K.; Wang, J. Enhanced Detection of Pipeline Leaks Based on Generalized Likelihood Ratio with Ensemble Learning. Processes 2025, 13, 558. https://doi.org/10.3390/pr13020558

AMA Style

Liu T, Cai X, Zhou W, Wang K, Wang J. Enhanced Detection of Pipeline Leaks Based on Generalized Likelihood Ratio with Ensemble Learning. Processes. 2025; 13(2):558. https://doi.org/10.3390/pr13020558

Chicago/Turabian Style

Liu, Tao, Xiuquan Cai, Wei Zhou, Kuitao Wang, and Jinjiang Wang. 2025. "Enhanced Detection of Pipeline Leaks Based on Generalized Likelihood Ratio with Ensemble Learning" Processes 13, no. 2: 558. https://doi.org/10.3390/pr13020558

APA Style

Liu, T., Cai, X., Zhou, W., Wang, K., & Wang, J. (2025). Enhanced Detection of Pipeline Leaks Based on Generalized Likelihood Ratio with Ensemble Learning. Processes, 13(2), 558. https://doi.org/10.3390/pr13020558

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Enhanced Detection of Pipeline Leaks Based on Generalized Likelihood Ratio with Ensemble Learning

Abstract

1. Introduction

2. Related Work

2.1. Hardware Sensor-Based Detection Methods

2.2. Data-Driven Methods Based on Software Algorithms

2.3. Statistical Model-Based Detection Methods

2.4. Summary

3. Theoretical Background

3.1. Generalized Likelihood Ratio

3.2. Ensemble Learning

4. Theoretical Frameworks

4.1. Data Pre-Processing

4.2. Time Window Selection

4.3. Threshold Selection Based on Test Statistic

4.4. Comparative Validation

5. Experimental Study

5.1. Experimental Setup

5.2. Time Window Selection

5.3. Threshold Selection

5.4. Application Validation

5.4.1. Generalized Likelihood Ratio Leak Detection Based on Inlet Pressure

5.4.2. Generalized Likelihood Ratio Leak Detection Based on Outlet Pressure

5.4.3. Generalized Likelihood Ratio Leak Detection Based on Outlet Flow Rate

5.4.4. Ensemble Learning Leak Detection

6. Discussions

6.1. Application Validation

6.2. The Application of the Model to Different Noise Values

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Nomenclature

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI