1. Introduction
Deep neural networks (DNNs) are widely employed in diverse research areas, including physics and engineering. In mechanical engineering, DNNs are often used as a surrogate model [
1,
2,
3]. In general, a DNN learns from data, and the output is not guaranteed to be consistent with physics, even if the data were generated by certain physical models.
Generative adversarial networks (GANs) [
4] constitute a type of DNN. GANs are applied to image generation tasks at first [
5,
6,
7]. Then, GANs are also used to solve inverse design problems [
8,
9,
10,
11]. For example, in [
12,
13], the GAN models were trained using a set of data consisting of airfoil shapes and their aerodynamic coefficients. Then, by inputting the aerodynamic coefficients, the trained model output airfoil shapes associated with those inputted aerodynamic coefficients. The aerodynamic coefficients of the generated shapes were close to the specified label, but some errors were also identified. These errors were due to the fact that the aerodynamic coefficient was calculated from the airfoil shapes, but the physical equations were not inputted into the DNN model.
To consider physical consistency, physics-informed neural networks (PINNs) have been proposed [
14], which add the residual of physical equations to the loss function of the DNN [
2,
15,
16,
17]. PINNs were used to predict various targets, such as lake water levels [
18], surface water levels [
19], and seismic responses [
20]. However, a PINN model needs the gradients of the physical equations, and hence the physical equations must be implemented in the DNN architecture. This causes difficulties from an application point of view. For example, general purpose software and commercial software cannot be used in PINN models. Especially in the inverse design problems, the commercial software are often mandatory to compute the label; e.g., flow computations to calculate aerodynamic coefficients in the airfoil shapes are mandatory. Hence, it is desirable to consider arbitrary physical equations, including general purpose and commercial software in DNN surrogate models and generative models.
The proposed method aims to consider arbitrary physical equations using a GAN architecture. To this aim, the gradients of the physical equations have to be eliminated from the algorithm. GAN consists of a generator network and a discriminator network. The generator network outputs data that mimic training data, whereas the discriminator network distinguishes the generated data from the training data. Training data are referred to as true data, whereas generated data constitute fake data. In the literature, physics-informed adversarial learning [
16,
21] and PID-GAN [
22], which couples PINN and GAN, are proposed, but those methods also use the residual of the physical equations in the same manner as PINNs, and hence the physical models need to be implemented inside the computation graph. In the proposed PG-GAN model, true or fake is defined by physical equations; if the residual is smaller than a specific value, then the generated data are true; otherwise, the generated data are fake. The physical model guides the DNN to learn physical consistency, and is only used to categorize data as true or fake. The physical equations remain outside of the DNN model, and are not implemented inside the DNN. Therefore, arbitrary physical models can be used. It is also noted that, by decreasing the threshold value, the residual of the generated data are decreased. In the PINN model, the residual is added to the loss function and cannot be controlled.
The concept of PG-GAN model is validated using a simple example. The merit of the PG-GAN is that an arbitrary physics model can be used. However, to compare with PINN, Newton’s equation of motion is employed, which can be solved by PINN.
The present paper is organized as follows. GAN and PINN models are explained in
Section 2. The concept of PG-GAN is explained in
Section 3, and its formulation is given in
Section 3. A numerical study is presented in
Section 4. Conclusions are provided in
Section 5.
2. GAN and Physics-Informed GAN
A conditional GAN model consists of a generator network
G and a discriminator network
D, as illustrated in
Figure 1. The input of the generator network is a noise vector
and label
, and the output represents fake data
, expressed as
. The input of the discriminator network is given by real
and fake
data, and the network distinguishes the real data from the fake data. The loss function is defined as
and the generator minimizes
V, whereas the discriminator maximizes
V, i.e.,
The discriminator only considers data
and
; physical reasonableness is not considered.
A PINN can be coupled with a GAN model. The resulting architecture is called PI-GAN in the present article. Suppose that a physical model is expressed as . For a variable , the residual is given by . The PINN adds the residual into the loss function, which is minimized. In the GAN model, the loss function of the generator is modified as , where is a constant. In the numerical example described later on, was set to . The loss function of the discriminator remains the same as the original GAN.
3. PG-GAN Model
3.1. Concept of PG-GAN
When PINN is used, the physics model in the PINN is located inside the computation graph. In addition, if the GAN model is coupled with the PINN, which we call a physics-informed GAN (PI-GAN), the physics model is also located inside the computation graph.
The PG-GAN is designed so that physics model is eliminated from the computation graph as shown in
Figure 2. The generator generates data in the same way as a normal GAN. The generated data are then passed to the physical model to determine their physical validity. If the data are determined to be physically valid, then they are labeled as true data; otherwise, as false data. The input of the discriminator is only the generated data, and the discriminator distinguishes if the data are true or not. In the PG-GAN, the computation graph consists of only the generator and discriminator. The physical model is used only to determine the truth of the data, and is not included in the calculation graph.
3.2. Formulation of PG-GAN
Generated data should be considered as true if they are physically reasonable. To judge the physical reasonableness, a physical model is used. Suppose that the physical model is described as
and the value of
is treated as a residual. We treat datapoint
as true if
. Hence, the input of our discriminator is given by generated data
, and the output is whether data are physically reasonable or not. For a given
, let a set
represent a set of data whose residual is equal to, or less than,
.
In this case, the loss function becomes
The optimization problem for the discriminator is
. The discriminator tries to mimic the physical model to judge physical reasonableness.
If we minimize
with respect to
G, the generator is not trained as desired. In the ordinal GAN loss function, the first term of
is not a function of
G, and the generator optimization problem becomes
However, in Equation (
2), both the first and second terms were functions of
G. Therefore, the generator optimization problem uses only the second term of
, and is
instead of
.
The architecture of the proposed model is illustrated in
Figure 3. The physics-guided GAN model uses a physics model as a referee to judge whether the generated data are physically reasonable or not. The discriminator is a surrogate model of the physical model. If we could use the physical model itself as a discriminator, the generator would be trained much more efficiently. However, if we use the physical model in the architecture, back propagation would stop at the physical model, because we are assuming that the physical model’s software is a black-box. This is also a feature of the PG-GAN: the training data no longer appear in the model. The real data are not necessary, because their being true/false is judged by the physical model.
The PINN can be coupled with the PG-GAN by adding the residual into the loss function of the generator; the resulting architecture is called PG-PI-GAN. The loss function is modified as
Training the PG-GAN without pre-training is not efficient, because in the early epochs, the generator cannot generate physically reasonable data and always becomes an empty set. In such a case, both the generator and discriminator are not well-trained, because the discriminator always outputs 0 (fake), whereas the generator has no clue to generate reasonable data. Hence, it is necessary to start from a pre-trained generator that generates non-empty . To obtain such a pre-trained generator, an ordinal GAN model is used.
Data generated by the pre-trained generator exhibits a large residual of physical equation . must be large enough so that both sets and are not empty sets. However, it is not desirable to terminate with large . Hence, is reduced as training proceeds, until it reaches the target value . In the following numerical example, was constant for 10,000 epochs and then changed. Alternatively, can be gradually reduced.
4. Numerical Study: Newton’s Equations of Motion
Newton’s equation of motion under gravity is expressed as
where
,
,
, and
t denote the coordinates of an initial point, gravitational acceleration, initial velocity vector, and time, respectively. Physical equation
P is formulated as
where
. The task is to output a sequence of coordinates
where the parameter
is given. A dataset for pre-training is first prepared as
The total number of training datapoints was 9000. In the pre-training, the GAN model was trained using dataset
for 10,000 epochs. Then, PG-GAN training was carried out. Threshold
is defined as a function of the number of epochs
e as follows:
The PG-GAN model was trained, and data
were output.
After the training was conducted, the coordinates were obtained by using the generators. To compare the accuracy of the models, the residual was calculated for the output coordinates. The residual was calculated by
An ordinal GAN, physics-informed GAN (PI-GAN), and PG-PI-GAN were also trained and compared. Each model was trained and evaluated three times separately.
Figure 4 shows the boxplots of the residuals of each model. Data outside of the 1.5 interquartile range (IQR) from the first and third quartiles was treated as outliers in the boxplot.
Table 1 shows the median, first quartile, and interquartile range of the residuals. The physics-informed GAN featured the same GAN architecture, except for the residual
r, which was added to the loss function. All network structures were the same in all models. The PG-GAN was characterized by a lower median than the PI-GAN and by the first quartile values. The PG-PI-GAN presented similar median and first quartile values as those of the PG-GAN, but the IQR was lower than that of the PG-GAN. These results show that the PG-GAN effectively reduces the median value, but does not reduce the IQR value. This difference comes from the loss functions of both models. The residual was added in the loss function of the PI-GAN, and hence the residual of all generated data were reduced. By contrast, the PG-GAN considered no residual in the loss function. The undesired data indicated larger residuals. Note that the amount of residuals of undesired data does not affect the loss function. Hence, the residuals of the undesired data tend to become large. Therefore, the proposed PG-PI-GAN, which couples a PI-GAN and a PG-GAN, successfully reduces the median, first quartile, and IQR.
5. Conclusions
This short note describes the concept of a PG-GAN. This PG-GAN uses arbitrary physical models, regardless of differentiability and smoothness, to guide neural networks to output physically reasonable solutions. One advantage of the proposed PG-GAN is that the physical model is outside of the neural network calculation graph; back-propagation in the neural network is not conducted on the physical model. Hence, any physical model can be utilized. For example, commercial software could be used, and one does not need to implement the physical model. Existing PINN models require physics equations to be implemented inside the calculation graph. Hence, arbitrary physics equations cannot be used (e.g., commercial software cannot be used). The proposed PG-GAN network does not need training data. The generator creates data, and the physical model judges whether the output is reasonable or not. However, the PG-GAN model is pre-trained using an ordinal GAN model with training data.
The proposed PG-GAN model was tested using Newton’s equation of motion. The PG-GAN and PG-PI-GAN featured lower median values of residuals. When the PG-GAN was coupled with the PI-GAN, the IQR value also decreased.