Open AccessArticle

Fast Prediction of Combustion Heat Release Rates for Dual-Fuel Engines Based on Neural Networks and Data Augmentation

Mingxin Wei

Xiuyun Shuai

Zexin Ma

Hongyu Liu

Qingxin Wang

Feiyang Zhao

^* and

Wenbin Yu

School of Energy and Power Engineering, Shandong University, Jinan 250012, China

Authors to whom correspondence should be addressed.

Designs 2025, 9(1), 25; https://doi.org/10.3390/designs9010025

Submission received: 27 December 2024 / Revised: 11 February 2025 / Accepted: 14 February 2025 / Published: 19 February 2025

(This article belongs to the Topic Digital Manufacturing Technology)

Download

Browse Figures

Figure 1
Overall architecture for rapid prediction of combustion heat release rate based on neural networks and data augmentation. "> Figure 2
Architecture diagram of KTW-ReGAN data augmentation method. "> Figure 3
Calculation flowchart of improved GAN model. "> Figure 4
Data parallel architecture diagram based on ensemble learning. "> Figure 5
Probability density maps of different data dimensions under different expanded datasets. "> Figure 6
Correlation heatmap of different datasets in output dimension. "> Figure 7
Schematic diagram of performance results of BNN optimization process. "> Figure 8
Schematic diagram of weight changes under adaptive weight method. "> Figure 9
Schematic diagram of the effect of the adaptive weight method. "> Figure 10
Schematic diagram of CPU logic core utilization. "> Figure 11
Prediction performance display chart (Fp). "> Figure 12
Fitting effect of combustion heat release rate. ">

Versions Notes

Abstract

As emission regulations become increasingly stringent, diesel/natural gas dual-fuel engines are regarded as a promising solution and have attracted extensive research attention. However, their complex combustion processes pose significant challenges to traditional combustion modeling approaches. Data-driven modeling methods offer an effective way to capture the complexity of combustion processes, but their performance is critically constrained by the quantity and quality of the test data. To address these limitations, this study proposes a combustion prediction model framework for dual-fuel engines based on neural networks and data augmentation, aiming to achieve high-quality and fast predictions of the heat release rate curve. First, a hybrid regression data augmentation architecture based on an improved Generative Adversarial Network (GAN) is introduced to enable high-quality dataset augmentation. Subsequently, a Bayesian Neural Network (BNN) is employed to construct a Wiebe parameter prediction model for dual-fuel engines with an accelerated and optimized training model. Meanwhile, an adaptive weight allocation method is proposed based on the model’s precision performance, achieving balanced accuracy distribution across multiple output dimensions and further enhancing the model’s generalization ability. Overall, the proposed modeling approach introduces tradeoff optimizations in both data and model dimensions, enhancing the training and learning efficiency, which offers a valuable direction for data-driven prediction models with practical significance.

Keywords:

BNN; data augmentation; combustion heat release rate

1. Introduction

In recent years, the issues of environmental pollution and energy scarcity have become increasingly severe. Natural gas, with its environmentally friendly, safe, stable, and abundant characteristics, has emerged as one of the most promising alternative energy sources in the field of traditional internal combustion engines [1,2]. Traditional natural gas engines mainly rely on spark plug ignition systems, which limit the engine’s compression ratio and result in poor intake efficiency. In contrast, the in-cylinder direct injection dual-fuel engine adopts an innovative trace diesel ignition strategy combined with natural gas in-cylinder high-pressure direct injection technology, demonstrating higher combustion efficiency and more powerful performance, offering significant advantages [3,4,5]. Therefore, simulating and modeling the combustion process of diesel–natural gas dual-fuel engines is of great practical significance for studying the internal combustion methods and evaluating engine performance.

In engine combustion modeling, the Wiebe combustion model is one of the commonly used methods, and the multi-Wiebe combustion model has been widely studied and applied by scholars due to its simple parameter structure and ability to clearly demonstrate the premixed combustion process, diffusion combustion process, and tail combustion process. Yang et al. [6] proposed a one-dimensional computational model for a compression ignition engine with a single fuel. The Wiebe function was used to predict the combustion process by representing the mass fraction burned (MFB) based on crank angle resolution. A single Wiebe function and two double Wiebe functions were fitted to the MFB data using the least squares method. Although the dual Wiebe function can fit the heat release rate curve, the model has limitations in capturing longer post-flame stages, which can lead to prediction errors. Chen et al. [7] proposed a dynamic combustion model that combines an improved triple Wiebe function with an Artificial Neural Network (ANN), considering the impact of engine variables. This model is able to efficiently simulate complex combustion processes. However, many existing studies have overlooked the influence of dataset characteristics on the construction of combustion models. Therefore, it is hoped that the prediction accuracy of the multi-dimensional Wiebe combustion model can be improved while fully considering the distribution characteristics of the dataset and reducing the requirement for sample size.

In Wiebe combustion modeling, the accuracy of the combustion model depends on the nonlinear mapping relationship between Wiebe parameters and experimental operating parameters. However, due to the long experimental period and high experimental cost, it is difficult to obtain sufficient datasets of experimental operating conditions and Wiebe parameters. The correct acquisition of Wiebe parameters is greatly limited by experiments. In most cases, especially with high-quality engine-related experimental data, the sample size is often limited [8]. To address the issue of insufficient samples, data augmentation methods have been proposed. As one of the classic data augmentation methods, Generative Adversarial Networks (GANs) have been shown to perform well even with small sample datasets [9]. GANs achieve data augmentation through adversarial training between a generator and a discriminator. The generator attempts to create realistic data to deceive the discriminator, while the discriminator strives to differentiate between real and generated data. Both of the processes are continuously optimized in iterations until the generator can produce highly realistic data that are indistinguishable from real data, thus deceiving the discriminator and improving the quality of the generated data [10]. In the latest research on engine combustion modeling, data augmentation techniques such as GAN have been widely discussed by scholars. For example, Grenga et al. [11] used GAN for data augmentation to improve the predictive performance of the model for premixed combustion states. Nista et al. [12] developed a super-resolution GAN, and experimental results showed that the GAN-based technique has good generalization performance in premixed combustion. Cui et al. [13] used Generative Adversarial Networks to enhance the original data in order to solve the problem of fault diagnosis in diesel fuel injection systems with limited samples and used the enhanced data as the source data for transfer learning to ensure its effectiveness. The accuracy of the proposed methods has been improved by 16%. Sok et al. [14] improved the performance of neural networks in predicting engine combustion and emission-related parameters by constructing a GAN model. Chen et al. [15] proposed a deterministic structural model updating method based on improved GAN for aircraft engines combined with convolutional neural networks. This innovative method can correct individual differences and biases in multivariate turbofan engine models and has excellent performance. Bo et al. [16] introduced an attention mechanism into GAN to supplement missing data in aircraft engines. Ultimately, the predictive performance of the model was improved. As GAN is a deep learning model, large, diverse, and representative datasets are more necessary for its effective training. If there are insufficient data available for training, it is easier to overfit the discriminator. Therefore, providing as much initial data as possible and improving the GAN method itself is crucial for enhancing the data augmentation effect [17]. One solution to achieve high-quality regression data augmentation in small sample data is to use some kind of augmentation algorithm for data pre-augmentation and then design an improved GAN model. The pre-expansion algorithm can effectively preserve the statistical features of the original data while initially increasing the size of the dataset, providing a prerequisite for the high-quality performance of GAN in the future. By designing improved GAN models, the ability to extract features can be enhanced, resulting in the generation of synthetic data that are highly similar to real data.

Neural networks, as a commonly used method, can play an important role in dealing with the nonlinear relationship between experimental parameters and Wiebe parameters [18]. In the field of engine modeling and performance evaluation, an increasing number of studies are exploring the application value of neural networks in nonlinear modeling [19,20,21,22]. Neural networks receive multi-dimensional input data through the input layer, and the data undergo nonlinear transformations in the neurons of the hidden layers to extract complex features. Finally, the output layer provides multi-dimensional output predictions through linear or nonlinear activation functions. In the latest developments in engine combustion modeling, scholars have been using neural networks as a method to build predictive models in order to improve the accuracy of nonlinear regression modeling. For example, Hu et al. [23] simplified and reconstructed the combustion process by combining the Wiebe function with neural networks. The proposed method can obtain combustion parameter related indicators such as cylinder pressure. The results indicate that the model has good prediction accuracy and generalization. Araujo et al. [24] used a neural network approach to determine combustion dynamics in engines by using mass combustion fraction data as a function of crankshaft angular position. Windarto et al. [25] constructed a predictive model for cylinder performance parameters of large-diameter compression ignition engines based on neural networks. The results indicate that the prediction of five parameters, including heat release rate, turbulent kinetic energy, rolling ratio, indicated power, and combustion efficiency, all maintain high values. Cesur et al. [26] developed an Artificial Neural Network (ANN)-based model to predict engine performance and exhaust emissions for methanol–gasoline mixtures. Sahin et al. [27] used three machine learning algorithms (ANN, XGBoost, and Random Forest) to simulate engine performance and emission parameters. The results showed that, in most cases, neural networks performed the best. Ebrahimi et al. [28] used Deep Neural Networks to predict the ignition delay and knock tendency of Spark Ignition (SI) engines. They also employed a stochastic Levenberg–Marquardt (SLM) optimization algorithm to train hundreds of Deep Neural Networks (DNNs), reducing the training time and memory requirements for large-scale problems. Siddique et al. [29] used a FOX optimizer to optimize ANN, enhancing its weight adjustment ability and replacing the traditional backpropagation process, effectively improving the accuracy of prediction models. This provides a reliable solution for optimizing prediction models in industrial applications. Quantifying uncertainty in neural networks can enhance the robustness of prediction model performance. Deploying machine learning models with uncertain quantification capabilities in the analysis and decision-making process is crucial for ensuring the reliability of prediction results. Bayesian Neural Networks (BNNs) provide the ability to perform uncertainty analysis by inferring weights as probability distributions instead of single values. While BNNs excel in uncertainty analysis, training BNNs is computationally expensive, requiring improvements to enhance training efficiency. Dharmalingam et al. [30] used Response Surface Methodology (RSM) and BNN to predict the characteristics of a diesel engine powered by a biodiesel–diesel fuel mixture. In the study of Sun et al. [31], BNNs were found to offer excellent overall accuracy in modeling and analyzing diesel engine particulate emissions. Overall, neural network modeling techniques improve efficiency and offer scalability, and BNNs, with their inherent uncertainty capabilities, provide robust and reliable performance. This makes BNNs suitable for supporting more complex studies and for constructing combustion prediction models. However, there is limited research on addressing the issue of precision distribution across different dimensions for high-dimensional regression data and how to simultaneously reduce BNN training costs while building efficient models. This study aims to design an improved BNN prediction model that can maintain balanced prediction accuracy while reducing training resource consumption in the face of high-dimensional data.

The high-quality construction of combustion prediction models plays a crucial role in engine performance evaluation and research. The main research problems currently faced include the following: 1. How to use small sample datasets to generate high-quality data that are similar to the original data and avoid too small sample sizes that make it difficult for data augmentation models to converge. 2. How to avoid uneven prediction accuracy of high-dimensional data in different dimensions and low training efficiency on the basis of building a reliable neural network prediction model. To address these issues, a framework for rapid modeling dual-fuel engine combustion via multi-Wiebe functions based on data augmentation and neural networks is proposed. In terms of data augmentation, a novel GAN method is developed by optimizing the data expansion process. This method, which incorporates physical constraints, self-attention characteristics, and data pre-expansion properties, is specifically designed for augmenting regression data. Next, a BNN is fully utilized, combined with an adaptive weight balancing method. This method is featured with ensemble learning and data parallel CPU training acceleration algorithms.

The main contributions of this article are summarized as follows: 1. proposing a high-performance data augmentation method; 2. designing a high-dimensional regression data prediction model with adaptive weights; 3. utilizing the parallel characteristics of ensemble learning to construct a prediction model data parallel acceleration training scheme.

The structure of this paper is arranged as follows: Section 2 introduces the specific research methods for the engine combustion prediction model, including the data augmentation hybrid architecture and the BNN-based combustion prediction model construction. Section 3 discusses the related results, focusing on the performance of data augmentation and the prediction model’s accuracy metrics. In Section 4, based on the test results, the performance and limitations of the model architecture are discussed, as well as directions worth studying in the future. Finally, Section 5 concludes the paper.

2. Methodology

The overall architecture of the rapid prediction of combustion heat release rate for dual-fuel engines based on neural networks and data augmentation proposed in this study is shown in Figure 1. Based on the original dataset, the KTW-ReGAN method is first applied to obtain an expanded dataset. This method consists of two parts: the data pre-expansion step of MR-CKNN and the improved GAN module. Then, the designed adaptive weight method is introduced into the Bayesian Neural Network prediction model to improve performance. By developing a CPU-based data parallel acceleration method, the training process can be achieved quickly and efficiently. Finally, the model with excellent comprehensive performance will be used for Wiebe parameter prediction, and the heat release rate will be fitted based on Wiebe parameters. The implementation of each specific method will be described in detail below.

2.1. Hybrid Framework for Regression Data Augmentation Based on Improved GAN

This chapter introduces a data augmentation hybrid framework suitable for regression problems, named the KTW-ReGAN method. The framework is illustrated in Figure 2, and it consists of two key steps: First, a multi-regression-constrained K-nearest neighbors pre-expansion algorithm (MR-CKNN) is designed to augment small sample data while preserving its distribution characteristics as much as possible. Second, a regression variant of Transformer with self-attention mechanisms, coupled with Wasserstein distance, weight clipping, and constraint vectors, is introduced to ensure high-quality feature extraction from regression data and a stable training process. This enhances the model’s convergence, thereby constructing an improved regression data augmentation model.

2.1.1. Construction of Data Pre-Augmentation Method

As GAN is a kind of deep learning model, it inherently requires a large amount of diverse and representative data for effective training. When there are insufficient training data, it can lead to overfitting of the discriminator [32]. A data augmentation algorithm that can preserve the initial data features is crucial for achieving high-quality data augmentation of GANs. Therefore, a constrained KNN method for multiple regression (named MR-CKNN) is designed in the data augmentation module. Given a dataset with n points (x₁,y₁), (x₂,y₂), …, (x_n,y_n), where each xi represents the normalized feature and each yi represents the corresponding normalized result.

The traditional KNN classification algorithm calculates the K-nearest samples to the sample to be classified using a certain metric in a given sample set and then uses the majority voting method to obtain the predicted class of the sample. The MR-CKNN data budgeting method proposed in this study is an applicability extension based on the traditional KNN classification reasoning idea. For a high-dimensional regression data sample point x_i, calculate the K-nearest sample data points as shown in Equation (1) to construct the K-nearest neighbors of the current data.

d (x_{i}, x_{j}) = \sqrt{\sum_{k = 1}^{D} (x_{i, k} - x_{j, k})^{2}}

(1)

where

d (x_{i}, x_{j})

is the Euclidean distance between samples x_i and x_j in D-dimensional space.

The selection of hyperparameters k and m is important. If k is too large, it will generate a large and unnecessary computational burden, while if k is too small, it will result in insufficient randomness and poor expansion performance, resulting in suboptimal pre-expansion data. If m is too large, it will cause a deviation between the distribution characteristics of the expanded data and the original data. Conversely, if m is too large, it will be difficult to fully utilize the performance of increasing data volume and provide high-quality initial data for subsequent GANs. This study used parameter optimization algorithms to select the optimal parameter combination and ultimately chose k = 5 and m = 2 as the hyperparameters for MR-CKNN.

x_{n e w} = a x_{i} + (1 - a) x_{j}

(2)

y_{n e w} = a y_{i} + (1 - a) y_{j}

(3)

where α ∈ [0,1] is a randomly generated interpolation weight. In this way, new samples with consistent features and labels are generated. x_new represents the regression-generated data at the input end, which has 7 dimensions. y_new represents the regression-generated data at the input end, which has 9 dimensions.

Finally, each sample generates multiple new samples through its k neighbors. In addition, due to the certain requirements of the combustion model for the numerical value of Wiebe parameters, the main limitation on the output characteristics of the dataset (Wiebe parameters) is introduced, as shown in Equation (4). Specifically, a custom physical constraint layer is designed inside the generator. After the forward transmission process of each batch of data is completed, the obtained data are introduced into the physical constraint layer for processing. Corresponding constraint conditions are added to this layer according to Formula 4. In this way, it can help the generator generate more realistic and easily understandable fake data, thereby improving the classification accuracy of the discriminator. Algorithm 1 demonstrates the computational logic of the data pre expansion step (MR-CKNN).

\begin{array}{r} F_{p} > 0, F_{m} > 0, F_{p} + F_{m} < 1 \\ m_{p} > 0, m_{m} > 0, m_{t} > 0 \\ D_{t} > D_{m} > D_{p} > 0 \end{array}

(4)

Algorithm 1 Constrained KNN Based on Multiple Regression (MR-CKNN)

Require: original input dataset X, original output dataset Y,
original dataset size n, number of nearest neighbors K, augment factor m.
Initialize: create empty expanded datasets X_new and Y_new
begin
for i =1 to n do
begin
find K nearst neighbors
  for j = 1 to K do
  begin
generate random interpolation weights
calculate new sample x_new based on neighboring nodes
calculate new sample y_new based on neighboring nodes
additional physical information constraints
update the expanded dataset X_new and Y_new
  end
end

2.1.2. Design of Improved Regression GAN Model

In the improved module of regression GAN, Transformer is innovatively applied to regression problems, thereby deploying the self-attention mechanism in regression GAN. The core formula of the self-attention mechanism is shown in Equation (5) below. Firstly, by calculating the dot product between the Q matrix and the K matrix and dividing it by

\sqrt{d_{k}}

for scaling, the correlation between various parts of the input data can be captured. This correlation calculation is global, and each position can be associated with other positions in the data sequence. Secondly, the softmax function normalizes the correlation values into attention weights, reflecting the importance of different data positions to the current position, allowing the model to focus more on data points that have a significant impact on the prediction results, thereby improving prediction accuracy and robustness. Finally, multiplying the attention weights with the V matrix yields a weighted sum output, emphasizing that important information weakens secondary information, thereby improving the model’s understanding and expression ability of the data.

A t t e n t i o n (Q, K, V) = s o f t m a x (\frac{{Q K}^{T}}{\sqrt{d_{k}}}) V

(5)

where Q is the query matrix, K is the key matrix, V is the value matrix, and d_k represents the dimension of the key vectors.

GAN is composed of two neural networks: a generator and a discriminator. They are trained alternately in an adversarial process, continuously enhancing the generator’s capability to generate realistic data. The loss function for the GAN discriminator is defined as shown in Equation (6). The two forms of the generator loss function are given by Equation (7).

{- E}_{x \sim p_{r} (x)} [l o g D (x)] - E_{x \sim p_{z} (z)} [l o g (1 - D (x))]

(6)

E_{x \sim p_{g} (x)} [l o g (1 - D (x))] \to E_{x \sim p_{g} (g)} [- l o g (D (x))]

(7)

where

x ~ p_{r}

represents the data sampled from the generated data distribution.

x ~ p_{z}

represents data sampled from a noise distribution (Gaussian random noise).

x ~ p_{g}

represents the data sampled from the generated data distribution.

E (\cdot)

is the expected value of the distribution function.

D (\cdot)

is the posterior probability calculated by the discriminator to determine whether the sample is true.

There is a special case for the first loss function form of the generator: if the discriminator has been trained to be optimal, calculating the generator loss is to minimize the Jensen–Shannon (JS) divergence between the true probability and the generated probability distribution. But the JS divergence is only effective when the two distributions overlap. If they do not intersect, the JS divergence becomes fixed at log2, and the gradient becomes zero, which prevents the generator from learning effectively. In this situation, the discriminator can easily distinguish between real and fake samples. The second type of loss function can lead to gradient instability and mode collapse, which reduces diversity. To address these issues, Wasserstein distance is introduced as formulated in Equation (8), which produces continuously differentiable and gradient smooth results, providing better training signals.

W (P_{r}, P_{g}) = {i n f}_{γ ϵ \prod (P_{r}, P_{g})} E_{(x, y) - γ} [‖x - y‖]

(8)

where

\prod (P_{r}, P_{g})

is the set of common distributions that includes all potential combinations of

P_{r}

and

P_{g}

. x is a real sample, and y is a generated sample.

‖x - y‖

represents the distance between this pair of samples.

The goal of Wasserstein distance is to minimize the workload required to move all the masses of one distribution to another distribution, which can more accurately reflect the difference between the generated distribution and the real distribution and guide the generator to generate samples that are closer to the real data. At the same time, weight clipping is used to restrict the network parameter weights of the discriminator, ensuring that the calculation of Wasserstein distance satisfies the 1-Lipschitz continuity condition, thereby improving convergence.

The training process of the complete regression GAN improvement module is shown in Figure 3. When initializing the generator and discriminator, set the learning rate to 0.002. The generator generates realistic fake data from random noise under the action of constraint vectors. The discriminator calculates the loss based on Wasserstein distance and performs backpropagation to update the internal parameter structure of the discriminator. It is equivalent to training a discriminator with a fixed generator first in order to achieve more accurate discrimination of true and false data. Then, the discriminator is fixed to train the generator and generate indistinguishable fake data. Continuous adversarial iterative training is used in this way, and the internal parameters of the discriminator and generator are updated to improve the adversarial training of GAN.

2.2. Dual-Fuel Engine Combustion Prediction Model Based on BNN

2.2.1. Construction of Combustion Prediction Model

By studying the Multi-Input Multi-Output Bayesian Neural Network (MIMO-BNN) along with the Three Wiebe Parameters combustion function, a predictive model for the dual-fuel engine’s heat release rate is established. Unlike deterministic networks, BNN introduces prior distributions (e.g., Gaussian distributions) over the weight parameters before averaging them rather than directly optimizing them. In the process of constructing the heat release rate prediction model, let the network parameters of the BNN be W, P(W) be the prior distribution of the parameters, and the training data D consist of operating condition parameters X and Wiebe parameters Y (i.e., (X, Y) ∈ D). The Wiebe parameter prediction is expressed as follows:

P (Y| X, D) = \int P (Y| X, W) P (W| D) d W

(9)

where

P (Y| X, D)

represents the predicted Wiebe parameters Y obtained from the given training set D, using the engine operating conditions X.

According to Bayes’ theorem, the posterior probability can be derived using the following formula:

P (W| D) = \frac{P (W) P (D| W)}{P (D)}

(10)

Directly sampling the posterior probability

P (W| D)

to evaluate

P (Y| X, D)

is highly challenging. Here, variational inference is adopted, using a distribution

q (W| θ)

controlled by a set of parameters θ to approximate the true posterior probability.

θ^{*} = \underset{θ}{argmin} K L [q (W| θ) ‖P (W| D)]

(11)

where

θ^{*}

represents a set of parameters that follows a Gaussian distribution when the KL divergence is minimized, which can be rewritten as follows:

θ^{*} = \underset{θ}{argmin} {K L [q (W| θ) ‖P (W)] - E_{q (W| θ)} [l n P (D| W)]}

(12)

Here, the first term represents the KL divergence between

q (W| θ)

and

P (W)

, while the second term reflects the fit between the training data D and the network weights. Combined with Monte Carlo approximation, the objective function can be expressed as follows:

F (D, θ) = K L [q (W| θ) ‖P (W| D) - E_{q (W| θ)} [l n P (D| W)]

\approx \sum_{i} l n q (w_{i}| θ) - \sum_{i} l n P (w_{i}) - \sum_{i} l n P (D| w_{i})

(13)

where

w_{i}

represents the sampled values of the weight distributions for each neuron in the Bayesian Neural Network.

During the training process of a Bayesian Neural Network (BNN), samples of the network parameters W are obtained by sampling from the prior distribution. Through forward propagation, the loss is computed using the aforementioned formula, and the network parameters are optimized through backpropagation.

2.2.2. Development of Multi-Core Parallel Training Acceleration

BNN using variational inference requires storing a large number of intermediate variables and parameters, such as variational parameters, during the optimization process due to considering the prior distribution of unknown parameters, which increases computational resource consumption. This study is based on the idea of ensemble learning training and designs a data parallel scheme to achieve CPU-based multi-core parallel training acceleration, thereby improving the training speed of neural networks. The CPU multi-core parallel acceleration structure is shown in the following Figure 4.

Firstly, multiple different training datasets are generated from the original dataset through repeated sampling. Each of these training datasets is used to train a separate base model. By arithmetic averaging the predictions of all these basic models, the final prediction result is obtained, thus completing the ensemble learning process. The formula for the combination strategy of the averaging method is as follows.

H (x) = \frac{1}{N} \sum_{i = 1}^{N} h_{i} (x)

(14)

Among them,

H (x)

represents the final prediction result; N is the total number of benchmark models;

h_{i} (x)

is the prediction result of the i-th benchmark model.

In ensemble learning, since each base model is trained independently, it naturally has certain parallel processing characteristics. By configuring the training process of each base model in ensemble learning on different cores of the CPU and simultaneously starting multiple cores to train multiple base models, the full utilization of CPU computing resources can be achieved. This approach maximizes the utilization of multiple CPU cores and achieves efficient training by utilizing parallel computing capabilities.

2.2.3. Development of Adaptive Weight Performance Balancing Method

When training a predictive model on a high-dimensional regression dataset, the accuracy performance across different output dimensions may not be identical under the same number of training iterations, and there may even be significant discrepancies. However, it is typically preferred to achieve high accuracy while maintaining a balanced precision across all dimensions. To address this, we have developed an adaptive weight performance balancing method (N-Ada) to achieve improved model accuracy performance.

Firstly, add an additional weight adjustment layer to the final hidden layer of the neural network prediction model. Each dimension of the sample is assigned an initial weight, usually assigned a value of 1/N, where N is the number of sample dimensions.

Then, the base model is trained using the weights of the current sample dimension, and the weighted error rate

ε_{t}

is calculated based on Equation (15).

ε_{t} = \frac{\sum_{i = 1}^{N} w_{i}^{(t)} |y_{i} - h_{t} (x_{i})|}{\sum_{i = 1}^{N} w_{i}^{(t)}}

(15)

where

w_{i}^{(t)}

is the weight of the current sample dimension;

h_{t} (x_{i})

is the predicted value corresponding to each sample dimension x_i; y_i is the true value.

Based on the formula of

ε_{t}

, the current output dimension weight

α_{t}

is calculated as follows:

α_{t} = \frac{1}{2} l n (\frac{1 - ε_{t}}{ε_{t}})

(16)

Finally, calculate the updated weights, where

Z_{t}

is the normalization factor to ensure the sum of the weights is 1.

w_{i}^{(t + 1)} = \frac{w_{i}^{(t)} e x p (- α_{t} |y_{i} - h_{t} (x_{i})|)}{Z_{t}}

(17)

The pseudocode for this method is given in Algorithm 2. The weights of the samples are adjusted based on the predictive performance of the base model for each sample dimension, such that in the next iteration, samples with larger errors have their weights increased, while samples with smaller errors have their weights reduced. This approach achieves a balanced distribution of accuracy performance across the dimensions of the high-dimensional regression data.

Algorithm 2 Precision Balance Method Weight Update Process

Require: true value y in the training set, number of iterations T,
regressor ht (x), sample weight s, sample dimension N.
Initialize: weights = 1/N.
for i =1 to T do
begin
errors ← abs(yi-ht(xi))
weighted error rate (epsilon_t) ← dot(errors,weights)/sum(weights)
regression weight (alpha_t) ← (1/2)*ln((1-epsilon_t)/epsilon_t)
weight update ← weights *= exp(-alpha_t * errors)/Normalization Factor
end

3. Results and Discussion

3.1. Dataset Related Parameters

The dataset used in this study comes from the bench test of turbocharged dual-fuel engines. It includes detailed test data from 205 steady-state operating points. Transient in-cylinder pressure and transient heat release rate data were recorded for the operating points on the external characteristic curve. For other operating points, CA10 and CA50 data, which represent the combustion heat release characteristics, were collected.

Specifically, after organization, the initial regression dataset consists of 54 pieces of data, each consisting of seven-dimensional input data and nine-dimensional output data. Among them, the input features include experimental operating parameters such as engine speed, intake pressure, intake temperature, natural gas injection timing, fuel injection timing, natural gas injection quantity, and fuel injection quantity. The output features are the parameters required for three Wiebe functions, including the gas-phase fuel heat release rate coefficient (Fp), solid-phase fuel heat release rate factor (Fm), gas-phase fuel mass fraction (mp), solid-phase fuel mass fraction (mm), liquid-phase fuel mass fraction (mt), gas-phase fuel diffusion coefficient (Dp), solid-phase fuel diffusion coefficient (Dm), liquid-phase fuel diffusion coefficient (Dt), and combustion start point (SOC).

3.2. Performance Analysis of Data Augmentation

The original dataset contains 42 regression data, each consisting of eight-dimensional input data and nine-dimensional output data (as mentioned in Section 3.1). After the pre-expansion algorithm (MR-CKNN), the number of datasets reached 162 (named MR-CKNN dataset). The pre-expanded dataset was further processed through an improved regression GAN module to obtain the final expanded dataset with 1042 data points (named KTW-ReGAN dataset). In the data pre-expansion stage, due to algorithm limitations, it is not possible to generate a larger amount of pre-expanded data. Only use pre-expanded data as input for the improved GAN module. Here, only the rationality of pre-expanded data is tested for the MR-CKNN dataset.

By comparing the feature distribution of the selected augmented data with that of the original data, as shown in Figure 5 below, it is evident that the augmented dataset exhibits a noticeable similarity in distribution patterns to the original dataset. Figure 5a,b show the probability density images of the input dimension fuel injection timing and the output dimension mt, respectively.

For the output features in the original dataset, MR-CKNN dataset, and KTW-ReGAN dataset, Spearman’s rank correlation coefficient is used for correlation analysis, as shown in Figure 6. The numerical values in the heatmap represent the magnitude of the correlation between features within the dataset. The results indicate that the correlations in both the pre-augmented data and the final augmented data are very similar to those in the original data.

Meanwhile, the column mean correlation coefficients of different datasets were calculated and compared to quantitatively demonstrate the similarity between different datasets, as shown in Table 1 below. A high correlation coefficient indicates that the expanded dataset is highly similar to the original dataset, proving that the proposed data augmentation method can successfully capture the distribution characteristics and potential features of the original data. Therefore, this data augmentation method is effective and reliable.

In order to more clearly demonstrate the statistical characteristics and physical meanings between different dimensions of the dataset, the statistical indicators shown in Table 2 and Table 3 are both data obtained by inverse normalization of the corresponding dataset (labeled as X1 to X7 for the input seven-dimensional operating parameters). It can be seen that the proposed regression data augmentation method has similar mean and variance in different feature dimensions before and after data augmentation. This proves that the proposed data augmentation method can capture the distribution characteristics of the original data and provide high-quality augmentation data that are highly similar in statistical features to the original data.

3.3. Accuracy Performance of Combustion Prediction Model

For the initial small sample dataset, due to the limited amount of data, the prediction model struggles to achieve good performance. Here, the coefficient of determination R², root mean square error RMSE, and mean absolute error MAE are used to evaluate the accuracy performance. R² is usually between 0 and 1. The larger the value, the better the model-fitting effect. There may also be negative numbers, which indicate that the model-fitting effect is very poor. RMSE is used to measure the degree of difference between predicted values and actual observed values, amplifying the gap between larger errors and making their contribution to the overall error more significant, thus more accurately reflecting the accuracy of the prediction model. MAE calculates the average of the absolute sum of prediction errors. The smaller the value of MAE, the smaller the difference between the prediction model and the actual observation value, and the better the prediction performance of the model. The calculation principle of MAE means that it does not consider the direction of the error, only the size of the error, and its sensitivity to outliers is lower. The calculation formulas for the three evaluation indicators introduced above are shown in 18, 19, and 20.

R^{2} = 1 - \frac{\sum_{i = 1}^{n} {(y_{p r e d, i} {- y}_{t r u e, i})}^{2}}{\sum_{i = 1}^{n} {(y_{t r u e, i} {- y}_{t r u e_m e a n, i})}^{2}}

(18)

R M S E = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(y_{p r e d, i} {- y}_{t r u e, i})}^{2}}

(19)

M A E = \frac{1}{n} \sum_{i = 1}^{n} |y_{p r e d, i} {- y}_{t r u e, i}|

(20)

Among them,

y_{p r e d}

is the predicted value,

y_{t r u e}

is the true value,

y_{t r u e_m e a n}

is the mean of the true values, and n is the sample size.

In Table 4, it can be clearly seen that the R² values during the training of the original data are all negative, indicating that the model has not correctly captured the feature patterns in the data, making it difficult for the model to train and learn normally. After applying the conventional GAN data augmentation method, the accuracy metrics improved significantly. However, for some dimensions, R² values are still relatively small, leading to an imbalance in accuracy across dimensions. This results in noticeable discrepancies between the heat release rate predicted based on these Wiebe parameter combinations and the actual heat release rate. From Table 5 and Table 6, it can also be seen that the proposed KTW-ReGAN data maintain smaller RMSE and MAE values compared to conventional GAN data and raw data. At the same time, the phenomenon of slightly larger errors in individual data dimensions still exists. The data in these tables indicate that it is crucial to balance the accuracy distribution of different dimensions during the training process to ensure the accuracy of all Wiebe parameter predictions.

Figure 7 clearly compares the predictive performance of the BNNs on the augmented dataset, with accuracy distributions for the output dimensions Y1 and Y5 significantly lower than the average (as indicated by the black line in Figure 7). After applying the ensemble learning training method, there was a slight improvement in overall accuracy, but the accuracy imbalance issue was not notably alleviated (as shown by the red line in Figure 7). From Figure 8, it can be seen that after applying the newly developed adaptive weight performance balancing method, as the training process progresses, the method can adaptively capture data dimensions (Y1 and Y5) with poor accuracy performance and assign it greater weight. Ultimately, it makes the overall accuracy distribution more balanced (as shown by the blue line in Figure 7).

Based on the proposed data augmentation method, the model is trained with a gradient of 500 samples per 500 samples, ranging from 1000 to 4000. The comparison of training time is shown in Figure 9. It can be seen that under different data volumes, the training time of models using CPU multi-core data parallel methods is always less than that of versions without parallel acceleration. This indicates that CPU multi-core parallel acceleration still has scalability on larger datasets. From the acceleration ratio data, it can also be seen that as the amount of training data increases, the acceleration performance also improves. Figure 10 shows the changes in CPU logic processor utilization over time before and after deploying parallel acceleration methods, indicating that computer hardware resources have been fully utilized.

The performance of the prediction model on the test set after training is shown in Figure 11. Taking Fp as an example, it can be seen that on the test set, the predicted values are relatively close to the true values, indicating that the prediction model has learned the potential features of the data. Despite achieving good accuracy performance, there are still cases of poor prediction performance on some data points. At the same time, although there has been some acceleration in training time, further research is still needed to apply it in a wider range of fields, such as implementing online learning on real-time machine simulations.

After obtaining more accurate prediction parameters, inputting them into the heat release rate calculation Formula 21 of the Wiebe combustion model below can achieve an effective estimation of the heat release rate.

H R R = \sum_{i = p, m, t} a \times F_{i} \times \frac{m_{i} + 1}{D_{i}} {(\frac{φ - S O C}{D_{i}})}^{m_{i}} e^{- a \times {(\frac{φ - S O C}{D_{i}})}^{m_{i} + 1}}

(21)

Among them, HRR represents the heat release rate of combustion, and the values of i represent the premixed combustion, main combustion, and tail combustion stages, respectively;

F_{i}

represents the proportion of combustion in the current stage, and the sum of each stage should be equal to the total heat release;

m_{i}

represents the shape factor of the current stage;

D_{i}

represents the duration of combustion in the current stage;

a

is the combustion efficiency index, generally taken as 6.908; SOC is the starting time of combustion.

Figure 12 shows the combustion heat release prediction results of the optimized BNN model at certain engine speeds in the experimental setup. Among them, the red, green, and blue dashed lines represent the true values of the combustion heat release rates during the pre-combustion, main combustion, and tail combustion stages, respectively. The gray curve represents the heat release rate fitted to the predicted values obtained by the prediction model, and the gray data points represent the true values of the combustion heat release rate under corresponding operating conditions. By analyzing the image features of the gray curve and gray data points, it can be found that the improved BNN model produces better prediction results in fitting the heat release rate. This provides a more efficient approach for constructing predictive models in the field of engine combustion modeling, which, to some extent, solves the problem of low model performance caused by small data samples and poor data quality. The BNN architecture for efficient learning and training has also contributed to the real-time rate of coupled models in subsequent research. At present, the architecture has been successfully deployed to the HPDI dual-fuel engine model. Based on the joint simulation platform of Matlab/Simulink and Python, a virtual testing system that can respond quickly can be built.

4. Discussion

Our research indicates that the proposed combustion prediction model framework based on data augmentation and neural networks can maintain high predictive performance even with small sample data. The constructed prediction model uses a Bayesian Neural Network as the benchmark and employs an augmented regression dataset to establish the relationship between operating condition parameters and Wiebe parameters, achieving highly accurate predictions of the heat release rate. An adaptive weight performance balancing method has been developed to address the issue of uneven accuracy in high-dimensional output features. It can achieve a balanced distribution of accuracy while ensuring accuracy improvement. Specifically, in data dimensions with poor predictive performance, the average accuracy of the R² index has improved by over 20%. Additionally, a CPU multi-core data parallel training acceleration method was designed to reduce time and resource consumption. By utilizing the characteristics of ensemble learning, this framework optimizes the utilization of hardware computing resources and accelerates performance by over 60%. The dataset used for the prediction model is based on the innovative KTW-ReGAN method. It designed a constrained KNN pre-expansion algorithm for multiple regression (MR-CKNN) and introduced a self-attention mechanism. At the same time, this method also couples Wasserstein distance and constraint vectors to improve the convergence of data augmentation algorithms. Ultimately, high-quality augmented data can be generated to improve the accuracy of combustion prediction models. Specifically, the model introduced with data augmentation methods showed an average improvement of 67% in R², an average decrease of 80% in RMSE, and an average decrease of 81% in MAE. When applied to other fields, thanks to the good generality of the proposed method, based on the characteristics of the target domain data, only the relevant restrictions of the physical constraint layer inside the data generator in the architecture need to be deleted or modified to achieve efficient deployment and application of the algorithm.

However, this study has some limitations. The developed data augmentation method and prediction model have only been validated for performance on regression data. The universality of the method has not been validated on other types of data, such as classified image data.

In the future, the mechanism explanation for the poor performance of some dimensions in high-dimensional regression datasets and the extended application of this architecture in classified data are directions worth exploring and researching in engineering practice.

5. Conclusions

This paper introduces the comprehensive development and implementation of a rapid prediction model framework for combustion heat release rate based on neural networks and data augmentation. This architecture couples the designed data augmentation methods with model optimization mechanisms to achieve high predictive performance. The results show that the framework performs excellently in comprehensive accuracy performance (R² improved by 67%, RMSE reduced by 80%, MAE reduced by 81%) and acceleration performance (acceleration performance improved by over 60%). In addition, this architecture has strong flexibility and scalability in regression data.

Author Contributions

Conceptualization, W.Y. and F.Z.; methodology, M.W.; software, M.W.; validation, X.S. and H.L.; formal analysis, Z.M.; investigation, X.S.; resources, Z.M.; data curation, Q.W.; writing—original draft preparation, M.W. and F.Z.; writing—review and editing, M.W. and F.Z.; visualization, Q.W.; supervision, W.Y. and F.Z.; project administration, W.Y. and F.Z.; funding acquisition, W.Y. and F.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Shandong Provincial Natural Science Foundation (No. 2022HWYQ-061) and Open Funds of the State Key Laboratory of Engines and Powertrain System (No. K2023-075).

Data Availability Statement

The data that support the findings of this study are available from the corresponding author upon reasonable request.

Acknowledgments

The authors extend their sincere appreciation for the technical support from AVL List GmbH on data programming.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Mucahit, A. Natural gas consumption and economic growth nexus for top 10 natural gas-consuming counties: A granger causality analysis in the frequency domain. Energy 2018, 165, 179–186. [Google Scholar]
Li, M.; Zhang, Q.; Li, G.; Shao, S. Experimental investigation on performance and heat release analysis of a pilot ignited direct injection natural gas engine. Energy 2015, 57, 1251–1260. [Google Scholar] [CrossRef]
Bao, J.; Qu, P.; Wang, H.; Zhou, C.; Zhang, L.; Shi, C. Implementation of various bowl designs in an HPDI natural gas engine focused on performance and pollutant emissions. Chemosphere 2022, 303, 135275. [Google Scholar] [CrossRef] [PubMed]
Wang, W.; Mai, Z.; Yao, X.; Tang, C.; Huang, Z. Towards optimized excess air ratio and substitution rate for a dual fuel HPDI engine. Appl. Therm. Eng. 2024, 253, 123797. [Google Scholar] [CrossRef]
Yu, S.; Wei, L.; Zhou, S.; Lu, X.; Huang, W. Numerical study on the effects of pilot diesel quantity coupling EGR in a high pressure direct injected natural gas engine. Combust. Sci. Technol. 2024, 196, 1–18. [Google Scholar] [CrossRef]
Yang, R.; Ran, Z.; Ristow Hadlich, R.; Assanis, D. A Double-Wiebe Function for Reactivity Controlled Compression Ignition Combustion Using Reformate Diesel. J. Energy Resour. Technol. 2022, 144, 112301. [Google Scholar] [CrossRef]
Chen, L.; Xu, Z.; Liu, S.; Liu, L. Dynamic modeling of a free-piston engine based on combustion parameters prediction. Energy 2022, 249, 123792. [Google Scholar] [CrossRef]
Xu, G.; Wang, B.; Guan, Y.; Wang, Z.; Liu, P. Early detection of thermoacoustic instability in a solid rocket motor: A generative adversarial network approach with limited data. Appl. Energy 2024, 373, 123776. [Google Scholar] [CrossRef]
Salimans, T.; Goodfellow, I.; Zaremba, W.; Cheung, V.; Radford, A.; Chen, X. Improved Techniques for Training GANs. Technical report. arXiv 2017, arXiv:1606.03498. [Google Scholar]
Athey, S.; Imbens, G.W.; Metzger, J.; Munro, E. Using Wasserstein generative adversarial networks for the design of Monte Carlo simulations. J. Econom. 2021, 240, 105076. [Google Scholar]
Grenga, T.; Nista, L.; Schumann, C.; Karimi, A.N.; Scialabba, G.; Attili, A.; Pitsch, H. Predictive data-driven model based on generative adversarial network for premixed turbulence-combustion regimes. Combust. Sci. Technol. 2023, 195, 3923–3946. [Google Scholar] [CrossRef]
Nista, L.; Schumann, C.D.K.; Grenga, T.; Attili, A.; Pitsch, H. Investigation of the generalization capability of a generative adversarial network for large eddy simulation of turbulent premixed reacting flows. Proc. Combust. Inst. 2023, 39, 5279–5288. [Google Scholar] [CrossRef]
Cui, Z.; Lu, Y.; Yan, X.; Cui, S. Compound fault diagnosis of diesel engines by combining generative adversarial networks and transfer learning. Expert Syst. Appl. 2024, 251, 123969. [Google Scholar] [CrossRef]
Sok, R.; Jeyamoorthy, A.; Kusaka, J. Novel virtual sensors development based on machine learning combined with convolutional neural-network image processing-translation for feedback control systems of internal combustion engines. Appl. Energy 2024, 365, 123224. [Google Scholar] [CrossRef]
Chen, Y.; Zhou, W.; Huang, J. Intelligent Modification Method for Individual Difference of Aero-engine based on CNN-GAN. In Proceedings of the International Conference on Advanced Robotics and Mechatronics (ICARM), Tokyo, Japan, 8–10 July 2024; pp. 400–405. [Google Scholar]
Bo, L.; Zhang, X.; Wang, H. The research on missing data imputation method of aero-engine’s ACARS Based on GAN-Attention. In Proceedings of the China Aeronautical Science and Technology Conference, Wuzhen, China, 26–27 September 2023; pp. 168–175. [Google Scholar]
Frid-Adar, M.; Diamant, I.; Klang, E.; Amitai, M.; Goldberger, J.; Greenspan, H. GAN-based synthetic medical image augmentation for increased CNN performance in liver lesion classification. Neurocomputing 2018, 321, 321–331. [Google Scholar] [CrossRef]
Bengio, Y.; Lecun, Y.; Hinton, G. Deep learning for AI. Commun. ACM 2021, 64, 58–65. [Google Scholar] [CrossRef]
Molina, S.; Novella, R.; Gomez-Soriano, J.; Olcina-Girona, M. New combustion modelling approach for methane-hydrogen fueled engines using machine learning and engine virtualization. Energies 2021, 14, 6732. [Google Scholar] [CrossRef]
Hu, D.; Wang, H.; Yang, C.; Wang, B.; Duan, B.; Wang, Y.; Li, H. Construction of digital twin model of engine in-cylinder combustion based on data-driven. Energy 2024, 293, 130543. [Google Scholar] [CrossRef]
Žvirblis, T.; Matijošius, J.; Kilikevičius, A. An Experimental Selection of Deep Neural Network Hyperparameters for Engine Emission Prognosis. In Proceedings of the IEEE Open Conference of Electrical, Electronic and Information Sciences (eStream), Vilnius, Lithuania, 25 April 2024; pp. 1–6. [Google Scholar]
Yaşar, H.; Çağıl, G.; Torkul, O.; Şişci, M. Cylinder pressure prediction of an HCCI engine using deep learning. Chin. J. Mech. Eng. 2021, 34, 7. [Google Scholar] [CrossRef]
Hu, D.; Wang, H.; Yang, C.; Wang, B.; Yang, Q.; Wang, Y. Construction and verification of dual-fuel engine combustion model. J. Energy Inst. 2024, 112, 101486. [Google Scholar]
Araujo, N.R.S.; Carvalho, F.S.; Amaral, L.V.; Braga, J.P.; Pujatti, F.J.P.; Sebastião, R.C.O. Kinetic study of the combustion process in internal combustion engines: A new methodological approach employing an artificial neural network. Fuel 2025, 382, 133739. [Google Scholar] [CrossRef]
Windarto, C.; Ocktaeck, L. A neural network approach on forecasting spark duration effect on in-cylinder performance of a large bore compression ignition engine fueled with propane direct injection. Fuel Process. Technol. 2024, 257, 108088. [Google Scholar] [CrossRef]
Cesur, I.; Uysal, F. Experimental investigation and artificial neural network-based modelling of thermal barrier engine performance and exhaust emissions for methanol-gasoline blends. Energy 2024, 291, 130393. [Google Scholar] [CrossRef]
Şahin, S.; Torun, A. Comparison of Engine Performance and Emission Values of Biodiesel Obtained from Waste Pumpkin Seeds with Machine Learning. Agriculture 2024, 14, 227. [Google Scholar] [CrossRef]
Ebrahimi, K.; Patidar, L.; Koutsivitis, P.; Fogla, N.; Wahiduzzaman, S. Machine Learning Tabulation Scheme for Fast Chemical Kinetics Computation. SAE Int. J. Engines 2023, 17, 477–492. [Google Scholar] [CrossRef]
Siddique, M.F.; Zaman, W.; Ullah, S.; Umar, M.; Saleem, F.; Shon, D.; Yoon, T.H.; Yoo, D.-S.; Kim, J.-M. Advanced Bearing-Fault Diagnosis and Classification Using Mel-Scalograms and FOX-Optimized ANN. Sensors 2024, 24, 7303. [Google Scholar] [CrossRef]
Dharmalingam, B.; Annamalai, S.; Areeya, S.; Rattanaporn, K.; Katam, K.; Show, P.-L.; Sriariyanun, M. Bayesian Regularization Neural Network-Based Machine Learning Approach on Optimization of CRDI-Split Injection with Waste Cooking Oil Biodiesel to Improve Diesel Engine Performance. Energies 2023, 16, 2805. [Google Scholar] [CrossRef]
Sun, H.; Chen, P. Application of Neural Networks in Automotive Engine Misfire. In Proceedings of the IEEE 4th International Conference on Electronic Communications, Internet of Things and Big Data (ICEIB), Taipei, Taiwan, 19–21 April 2024; pp. 261–264. [Google Scholar]
Karras, T.; Aittala, M.; Hellsten, J.; Laine, S.; Lehtinen, J.; Aila, T. Training generative adversarial networks with limited data. In Proceedings of the 34th Conference on Neural Information Processing Systems (NeurIPS 2020), Vancouver, BC, Canada, 6–12 December 2020; pp. 1–37. [Google Scholar]

Figure 1. Overall architecture for rapid prediction of combustion heat release rate based on neural networks and data augmentation.

Figure 2. Architecture diagram of KTW-ReGAN data augmentation method.

Figure 3. Calculation flowchart of improved GAN model.

Figure 4. Data parallel architecture diagram based on ensemble learning.

Figure 5. Probability density maps of different data dimensions under different expanded datasets.

Figure 6. Correlation heatmap of different datasets in output dimension.

Figure 7. Schematic diagram of performance results of BNN optimization process.

Figure 8. Schematic diagram of weight changes under adaptive weight method.

Figure 9. Schematic diagram of the effect of the adaptive weight method.

Figure 10. Schematic diagram of CPU logic core utilization.

Figure 11. Prediction performance display chart (Fp).

Figure 12. Fitting effect of combustion heat release rate.

Table 1. Column mean correlation coefficients between different datasets.

Paired Datasets	Correlation Coefficient Between Column Means
Original data vs. MR-CKNN data	0.9999
MR-CKNN data vs. KTW-ReGAN data	0.9997
Original data vs. KTW-ReGAN data	0.9998

Table 2. Mean indicators of different datasets in the input dimension.

Datasets	X1_mean	X2_mean	X3_mean	X4_mean	X5_mean	X6_mean	X7_mean
Original dataset	1331.5	116.7	30.9	18.7	10.6	4.3	160.6
MR-CKNN dataset	1331.5	122.7	32.0	19.0	10.9	4.1	169.4
KTW-ReGAN	1314.0	101.3	28.8	18.4	9.9	4.6	140.4

Table 3. Variance indicators of different datasets in input dimensions.

Datasets	X1_std	X2_std	X3_std	X4_std	X5_std	X6_std	X7_std
Original dataset	359.4	60.6	10.2	6.6	5.6	1.4	75.6
MR-CKNN dataset	360.4	56.6	9.5	6.4	5.4	1.2	68.1
KTW-ReGAN	345.2	55.1	10.4	6.2	5.4	1.3	70.8

Table 4. Summary of R² indicators under different datasets.

Datasets	Y1_R²	Y2_R²	Y3_R²	Y4_R²	Y5_R²	Y6_R²	Y7_R²	Y8_R²	Y9_R²
Original dataset	−1.47	−4.48	−26.44	−24.75	−4.76	−100.3	−18.47	−22.74	−4.95
GAN dataset	0.23	0.73	0.65	0.71	0.16	0.71	0.64	0.52	0.62
KTW-ReGAN	0.61	0.83	0.84	0.85	0.69	0.81	0.85	0.87	0.85

Table 5. Summary of RMSE indicators under different datasets.

Datasets	Y1_RMSE	Y2_RMSE	Y3_RMSE	Y4_RMSE	Y5_RMSE	Y6_RMSE	Y7_RMSE	Y8_RMSE	Y9_RMSE
Original dataset	4.05	3.29	6.89	3.87	1.71	4.15	5.87	1.46	2.22
GAN dataset	1.42	1.29	1.33	1.28	1.49	1.25	1.27	1.27	1.49
KTW-ReGAN	0.87	0.76	0.77	0.77	0.81	0.69	0.66	0.62	0.59

Table 6. Summary of MAE indicators under different datasets.

Datasets	Y1_MAE	Y2_MAE	Y3_MAE	Y4_MAE	Y5_MAE	Y6_MAE	Y7_MAE	Y8_MAE	Y9_MAE
Original dataset	3.47	2.86	6.04	3.67	1.61	3.49	5.36	1.35	1.99
GAN dataset	2.29	1.29	1.43	1.33	3.30	1.21	1.23	1.74	1.44
KTW-ReGAN	0.96	0.52	0.58	0.51	0.82	0.59	0.55	0.57	0.51

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wei, M.; Shuai, X.; Ma, Z.; Liu, H.; Wang, Q.; Zhao, F.; Yu, W. Fast Prediction of Combustion Heat Release Rates for Dual-Fuel Engines Based on Neural Networks and Data Augmentation. Designs 2025, 9, 25. https://doi.org/10.3390/designs9010025

AMA Style

Wei M, Shuai X, Ma Z, Liu H, Wang Q, Zhao F, Yu W. Fast Prediction of Combustion Heat Release Rates for Dual-Fuel Engines Based on Neural Networks and Data Augmentation. Designs. 2025; 9(1):25. https://doi.org/10.3390/designs9010025

Chicago/Turabian Style

Wei, Mingxin, Xiuyun Shuai, Zexin Ma, Hongyu Liu, Qingxin Wang, Feiyang Zhao, and Wenbin Yu. 2025. "Fast Prediction of Combustion Heat Release Rates for Dual-Fuel Engines Based on Neural Networks and Data Augmentation" Designs 9, no. 1: 25. https://doi.org/10.3390/designs9010025

APA Style

Wei, M., Shuai, X., Ma, Z., Liu, H., Wang, Q., Zhao, F., & Yu, W. (2025). Fast Prediction of Combustion Heat Release Rates for Dual-Fuel Engines Based on Neural Networks and Data Augmentation. Designs, 9(1), 25. https://doi.org/10.3390/designs9010025

Article Menu

Fast Prediction of Combustion Heat Release Rates for Dual-Fuel Engines Based on Neural Networks and Data Augmentation

Abstract

1. Introduction

2. Methodology

2.1. Hybrid Framework for Regression Data Augmentation Based on Improved GAN

2.1.1. Construction of Data Pre-Augmentation Method

2.1.2. Design of Improved Regression GAN Model

2.2. Dual-Fuel Engine Combustion Prediction Model Based on BNN

2.2.1. Construction of Combustion Prediction Model

2.2.2. Development of Multi-Core Parallel Training Acceleration

2.2.3. Development of Adaptive Weight Performance Balancing Method

3. Results and Discussion

3.1. Dataset Related Parameters

3.2. Performance Analysis of Data Augmentation

3.3. Accuracy Performance of Combustion Prediction Model

4. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI