Given the diversity of forms in which scientific knowledge is represented in different disciplines and applications, researchers have developed a large variety of methods for integrating physical principles into ML models. This section categorizes them into the following four classes; (i) physics-guided loss function, (ii) physics-guided initialization, (iii) physics-guided design of architecture, and (iv) hybrid modeling.
Choosing between different classes of methods for a given problem can depend on many factors including the availability and performance of existing mechanistic models, and also the general computational objectives that need to be addressed. The general computational objectives of physics-ML methods described throughout this section, as opposed to traditional ML methods, can be placed into three categories. First, prediction performance defined as better matching between predicted and observed values can be improved in a variety of ways including improved generalizability to out-of-sample scenarios, improved general accuracy, or forcing solutions to be physically consistent (e.g., obeying known physics-based governing equations). Second, sample efficiency can be improved by reducing the number of observations required for adequate performance or reducing the overall search space. The third general computational objective is interpretability, where often traditional ML models are a “black-box” and the incorporation of scientific knowledge can shine a light on physical meanings, interpretations, and processes within the ML framework. Even though computational objectives can be categorized within these categories, there is also overlap between them. For example, forcing models to be physically consistent can effectively reduce the solution search space. Improved sample efficiency can also lead to improved prediction performance by getting more value out of each observation. We end this section with a summary and detailed discussion comparing different kinds of methods, their requirements, and the general computational objectives achieved.
3.1 Physics-Guided Loss Function
Scientific problems often exhibit a high degree of complexity due to relationships between many physical variables varying across space and time at different scales. Standard ML models can fail to capture such relationships directly from data, especially when provided with limited observation data. This is one reason for their failure to generalize to scenarios not encountered in training data. Researchers are beginning to incorporate physical knowledge into loss functions to help ML models capture generalizable dynamic patterns consistent with established physical laws.
One of the most common techniques to make ML models consistent with physical laws is to incorporate physical constraints into the loss function of ML models as follows [
141],
where the training loss
\(\text{Loss}_{\text{TRN}}\) measures a supervised error (e.g., RMSE or cross-entropy) between true labels
\(Y_{\text{true}}\) and predicted labels
\(Y_{\text{pred}}\), and
\(\lambda\) is a hyper-parameter to control the weight of model complexity loss
\(R(W)\). The first two terms are the standard loss of ML models. The addition of physics-based loss
\(\text{Loss}_{\text{PHY}}\) aims at ensuring consistency with physical laws and it is weighted by a hyper-parameter
\(\gamma\), where
\(\gamma\) is determined alongside other ML hyperparameters using validation data or a nested cross-validation setup. A comprehensive guide to implementing physics-based loss functions can be found in Ebert-Uphoff et al. [
84].
Steering ML predictions towards physically consistent outputs have numerous benefits. First, this provides the possibility to ensure the consistency with physical laws and therefore reduce the solution search space of ML models. Second, the regularization by physical constraints allows the model to learn even with unlabeled data, as the computation of physics-based loss
\(\text{Loss}_{\text{PHY}}\) does not require observation data. Third, ML models which follow desired physical properties are more likely to be generalizable to out-of-sample scenarios relative to basic ML models [
133,
231]. It is important to note, however, that the physics-guided loss function does not “guarantee” either physical consistency or generalizability as it is fundamentally a weak constraint. Loss function terms corresponding to physical constraints are applicable across many different types of ML frameworks. In addition, this method is extensively used across all application-centric objectives listed in Section
2. In the following paragraphs, we demonstrate the use of physics-based loss functions for different objectives described in Section
2.
Replacing or improving over physical models. The incorporation of physics-based loss has shown great success in improving the prediction ability of ML models. In the context of lake temperature modeling, Karpatne et al. [
140] includes a physics-based penalty that ensures that predictions of denser water are at lower depths than predictions of less dense water, a known monotonic relationship.
Jia et al. [
130] and Read et al. [
231] further extended this work to capture even more complex and general physical relationships that happen on a temporal scale. Specifically, they use a physics-based penalty for energy conservation in the loss function to ensure the lake thermal energy gain across time is consistent with the net thermodynamic fluxes in and out of the lake. A diagram of this model is shown in Figure
2. Note that the recurrent structure contains additional nodes (shown in blue) to represent physical variables (lake energy, etc) that are computed using purely physics-based equations. These are needed to incorporate energy conservation in the loss function. A similar structure can be used to model other physical laws such as mass conservation, and so on. Qualitative mathematical properties of dynamical systems modeling have also shown promise in informing loss functions to improve the prediction beyond that of the physics model. Erichson et al. [
86] penalize autoencoders based on physically meaningful stability measures in dynamical systems to improve prediction of fluid flow and sea surface temperature. They showed an improved mapping of past states to future states for both modeling scenarios in addition to improving generalizability to new data.
Solving PDEs. Another strand of work that involves loss function alterations is solving PDEs for dynamical systems modeling, in which adherence to the governing equations is enforced in the loss function. In Raissi et al. [
228], this concept is developed and shown to create data-efficient spatiotemporal function approximators to both solve and find parameters of basic PDEs like Burgers Equation or Schrodinger Equation. Going beyond a simple feed-forward network, Zhu et al. [
318] propose an encoder-decoder model for predicting transient PDEs with governing PDE constraints. Geneva et al. [
102] extended this approach to deep auto-regressive dense encoder-decoders with a Bayesian framework using stochastic weight averaging to quantify uncertainty.
Discovering Governing Equations. Physics-based loss function terms have also been used in the discovery of governing equations. Loiseau et al. [
178] used constrained least squares [
110] to incorporate energy-preserving nonlinearities or to enforce symmetries in the identified equations for the equation learning process described in Section
2.7. Though these loss functions are mostly seen in common variants of NNs, they are also be seen in architectures such as echo state networks. Doan et al. [
76] found that integrating the physics-based loss from the governing equations in a Lorenz system, a commonly studied system in dynamical systems, strongly improves the echo state network’s time-accurate prediction of the system and also reduces convergence time.
Inverse modeling. For applications in vortex-induced vibrations, Raissi et al. [
224] pose the inverse modeling problem of predicting the lift and drag forces of a system given sparse data about its velocity field. Kahana et al. [
136] uses a loss function term pertaining to the physical consistency of the time evolution of waves for the inverse problem of identifying the location of an underwater obstacle from acoustic measurements. In both cases, the addition of physics-based loss terms made results more accurate and more robust to out-of-sample scenarios.
Parameterization. While ML has been used for parameterization, adding physics-based loss terms can further benefit this process by ensuring physically consistent outputs. Zhang et al. [
310] parameterize atomic energy for molecular dynamics using a NN with a loss function that takes into account atomic force, atomic energy, and terms relating to kinetic and potential energy. Furthermore, in climate modeling, Beucler et al
. show that enforcing energy conservation laws improves prediction when emulating cloud processes [
31,
32].
Downscaling. Super-resolution and downscaling frameworks have also begun to incorporate physics-based constraints. Jiang et al. [
134] use PDE-based constraints for super-resolution problems in computational fluid dynamics where they are able to more efficiently recover physical quantities of interest. Bode et al. [
37] use similar constraint ideas in building generative adversarial networks for super-resolution in turbulence modeling in combustion scenarios, where they find improved generalization capability and extrapolation due to the constraints.
Uncertainty quantification. In Yang et al. [
303] and Yang et al. [
304], the physics-based loss is implemented in a deep probabilistic generative model for uncertainty quantification based on adherence to the structure imposed by PDEs. To accomplish this, they construct probabilistic representations of the system states and use an adversarial inference procedure to train using a physics-based loss function that enforces adherence to the governing laws. This is expanded in Zhu et al. [
318], where a physics-informed encoder-decoder network is defined in conjunction with a conditional flow-based generative model for similar purposes. A similar loss function modification is performed in other works [
102,
144,
299], but for the purpose of solving high dimensional stochastic PDEs with uncertainty propagation. In these cases, physics-guided constraints provide effective regularization for training deep generative models to serve as surrogates of physical systems where the cost of acquiring data is high and the data sets are small [
304].
Another direction for encoding physics knowledge into ML UQ applications is to create a physics-guided Bayesian NN. This is explored by Yang et al. [
300], where they use a Bayesian NN, which naturally encodes uncertainty, as a surrogate for a PDE solution. Additionally, they add a PDE constraint for the governing laws of the system to serve as a prior for the Bayesian net, allowing for more accurate predictions in situations with significant noise due to the physics-based regularization.
Generative models. In recent years, GANs have been used to efficiently generate solutions to PDEs and there is interest in using physics knowledge to improve them. Yang et al. [
301] showed GANs with loss functions based on PDEs can be used to solve stochastic elliptic PDEs in up to 30 dimensions. In a similar vein, Wu et al. [
292] showed that physics-based loss functions in GANs can lower the amount of data and training time needed to converge on solutions of turbulence PDEs, while Shah et al. [
248] saw similar results in the generation of microstructures satisfying certain physical properties in computational materials science.
3.2 Physics-Guided Initialization
Since many ML models require an initial choice of model parameters before training, researchers have explored different ways to physically inform a model starting state. For example, in NNs, weights are often initialized according to a random distribution prior to training. Poor initialization can cause models to anchor in local minima, which is especially true for deep neural networks. However, if physical or other contextual knowledge is used to help inform the initialization of the weights, model training can be accelerated and may require fewer training samples [
132]. One way to inform the initialization to assist in model training and escaping local minima is to use an ML technique known as
transfer learning. In transfer learning, a model is
pre-trained on a related task prior to being fine-tuned with limited training data to fit the desired task. The pre-trained model serves as an informed initial state that ideally is closer to the desired parameters for the desired task than random initialization. One way to achieve this is to use the physics-based model’s simulated data to pre-train the ML model. This is similar to the common application of pre-training in computer vision, where CNNs are often pre-trained with very large image datasets before being fine-tuned on images from the task at hand [
259].
Jia et al
. use this strategy in the context of modeling lake temperature dynamics [
130,
132]. They pre-train their
Physics-Guided Recurrent Neural Network (
PGRNN) models for lake temperature modeling on simulated data generated from a physics-based model and fine-tune the NN with little observed data. They showed that pre-training, even using data from a physical model with an incorrect set of parameters, can still significantly reduce the training data needed for a quality model. In addition, Read et al. [
231] demonstrated that models using both physics-guided initialization and a physics-guided loss function are able to generalize better to unseen scenarios than traditional physics-based models. In this case, physics-guided initialization allows the model to have a physically-consistent starting state prior to seeing any observations.
Another application can be seen in robotics, where images from robotics simulations have been shown to be sufficient without any real-world data for the task of object localization [
267], while reducing data requirements by a factor of 50 for object grasping [
39]. Then, in autonomous vehicle training, Shah et al. [
247] showed that pre-training the driving algorithm in a simulator built on a video game physics engine can drastically lessen data needs. More generally, we see that simulation-based pre-training of applications allows for significantly less expensive data collection than is possible with physical robots.
Physics-guided model initialization has also been employed in chemical process modeling [
180,
181,
298]. Yan et al. [
298] use Gaussian process regression for process modeling that has been transferred and adapted from a similar task. To adapt the transferred model, they used scale-bias correcting functions optimized through maximum likelihood estimation of parameters. Furthermore, Gaussian process models come equipped with uncertainty quantification which is also informed through initialization. A similar transfer and adapt approach is seen in Lu et al. [
180], but for an ensemble of NNs transferred from related tasks. In both studies, the similarity metrics used to find similar systems are defined by considering various common process characteristics and behaviors.
Physics-guided initialization can also be done using a self-supervised learning method, which has been widely used in computer vision and natural language processing. In the self-supervised setting, deep neural networks learn discriminative representations using pseudo labels created from pre-defined pretext tasks. These pretext tasks are designed to extract complex patterns related to our target prediction task. For example, the pretext task can be defined to predict intermediate physical variables that play an important role in underlying processes. This approach can make use of a physics-based model to simulate these intermediate physical variables, which can then be used to pre-train ML models by adding supervision on hidden layers. As an illustration of this approach, Jia et al. [
133] have shown promising results for modeling temperature and flow in river networks by using upstream water variables simulated by a physics-based PRMS-SNTemp model [
265] to pre-train hidden variables in a graph neural network.
3.3 Physics-Guided Design of Architecture
Although the physics-based loss and initialization in the previous sections help constrain the search space of ML models during training, the ML architecture is often still a black-box. In particular, they do not encode physical consistency or other desired physical properties into the ML architecture. A recent research direction has been to construct new ML architectures that can make use of the specific characteristics of the problem being solved. Furthermore, incorporating physics-based guidance into architecture design has the added bonus of making the previously black-box algorithm more interpretable, a desirable but typically missing feature of ML models used in physical modeling. In the following paragraphs, we discuss several contexts in which physics-guided ML architectures have been used. Much of the work in this section is focused largely on neural network architectures. The modular and flexible nature of NNs in particular makes them prime candidates for architecture modification. For example, domain knowledge can be used to specify node connections that capture physics-based dependencies among variables. We also include subsections on multi-task learning and structures of Gaussian processes to show how task interrelationships or informed prior distributions can inform ML models.
Intermediate Physical Variables. One way to embed physical principles into NN design is to ascribe physical meaning to certain neurons in the NN. It is also possible to declare physically relevant variables explicitly. In lake temperature modeling, Daw et al. [
68] incorporate a physical intermediate variable as part of a monotonicity-preserving structure in the LSTM architecture. This model produces physically consistent predictions in addition to appending a dropout layer to quantify uncertainty. Muralidlar et al. [
204] used a similar approach to insert physics-constrained variables as the intermediate variables in the
convolutional neural network (
CNN) architecture which achieved significant improvement over state-of-the-art physics-based models on the problem of predicting drag force on particle suspensions in moving fluids.
An additional benefit of adding physically relevant intermediate variables in an ML architecture is that they can help extract physically meaningful hidden representations that can be interpreted by domain scientists. This is particularly valuable, as standard deep learning models are limited in their interpretability since they can only extract abstract hidden variables using highly complex connected structures. This is further exacerbated given the randomness involved in the optimization process.
Another related approach is to fix one or more weights within the NN to physically meaningful values or parameters and make them non-modifiable during training. A recent approach is seen in geophysics where researchers use NNs for the waveform inversion modeling to find subsurface parameters from seismic wave data. In Sun et al. [
256], they assign most of the parameters within a network to mimic seismic wave propagation during forwarding propagation of the NN, with weights corresponding to values in known governing equations. They show this leads to more robust training in addition to a more interpretable NN with meaningful intermediate variables.
Encoding invariances and symmetries. In physics, there is a deep connection between symmetries and invariant quantities of a system and its dynamics. For example, Noether’s Law, a common paradigm in physics, demonstrates a mapping between conserved quantities of a system and the system’s symmetries (e.g., translational symmetry can be shown to correspond to the conservation of momentum within a system). Therefore, if an ML model is created that is translation-invariant, the conservation of momentum becomes more likely and the model’s prediction becomes more robust and generalizable.
State-of-the-art deep learning architectures already encode certain types of invariance. For example, RNNs encode temporal invariance and CNNs can implicitly encode spatial translation, rotation, and scale invariance. In the same way, scientific modeling tasks may require other invariances based on physical laws. In turbulence modeling and fluid dynamics, Ling et al. [
173] define a
tensor basis neural network to embed rotational invariance into a NN for improved prediction accuracy. This solves a key problem in ML models for turbulence modeling because, without rotational invariance, the model evaluated on identical flows with axes defined in other directions could yield different predictions. This work alters the NN architecture by adding a higher-order multiplicative layer that ensures the predictions lie on a rotationally invariant tensor basis. In a molecular dynamics application, Anderson et al. [
12] show that a rotationally covariant NN architecture can learn the behavior and properties of complex many-body physical systems.
In a general setting, Wang et al. [
281] show how spatiotemporal models can be made more generalizable by incorporating symmetries into deep NNs. More specifically, they demonstrated the encoding of translational symmetries, rotational symmetries, scale invariances, and uniform motion into NNs using customized convolutional layers in CNNs that enforce desired invariance properties. They also provided theoretical guarantees of invariance properties across the different designs and showed additional to significant increases in generalization performance.
Incorporating symmetries, by informing the structure of the solution space, also has the potential to reduce the search space of an ML algorithm. This is important in the application of discovering governing equations, where the space of mathematical terms and operators is exponentially large. Though in its infancy, physics-informed architectures for discovering governing equations are beginning to be investigated by researchers. In Section
2.7, symbolic regression is mentioned as an approach that has shown success. However, given the massive search space of mathematical operators, analytic functions, constants, and state variables, the problem can quickly become NP-hard. Udrescu et al. [
270] design a recursive multidimensional version of symbolic regression that uses a NN in conjunction with techniques from physics to narrow the search space. Their idea is to use NNs to discover hidden signs of “simplicity”, such as symmetry or separability in the training data, which enables breaking the massive search space into smaller ones with fewer variables to be determined.
In the context of molecular dynamics applications, a number of researchers [
28,
310] have used a NN per individual atom to calculate each atom’s contribution to the total energy. Then, to ensure the energy invariance with respect to the possibility of interchanging two atoms, the structure of each NN and the values of each network’s weight parameters are constrained to be identical for atoms of the same element. More recently, novel deep learning architectures have been proposed for fundamental invariances in chemistry. Schutt et al. [
245] proposes continuous-filter convolutional (cfconv) layers for CNNs to allow for modeling objects with arbitrary positions such as atoms in molecules, in contrast to objects described by Cartesian-gridded data such as images. Furthermore, their architecture uses atom-wise layers that incorporate inter-atomic distances that enabled the model to respect quantum-chemical constraints such as rotationally invariant energy predictions as well as energy-conserving force predictions. As we can see, because molecular dynamics often ascribes importance to different important geometric properties of molecules (e.g., rotation), network architectures dealing with invariances can be effective for improving the performance and robustness of ML models.
Architecture modifications incorporating symmetry are also seen extensively in dynamic systems research involving differential equations. In a pioneering work by Ruthotto et al. [
236], three variations of CNNs are proposed to improve classifiers for images. Each variation uses mathematical theories to guide the design of the CNN based on the fundamental properties of PDEs. Multiple types of modifications are made, including adding symmetry layers to guarantee the stability expressed by the PDEs and layers that convert inputs to kinematic eigenvalues that satisfy certain physical properties. They define a parabolic CNN inspired by anisotropic filtering, a hyperbolic CNN based on Hamiltonian systems, and a second-order hyperbolic CNN. Hyperbolic CNNs were found to preserve the energy in the system as intended, which set them apart from parabolic CNNs that smooth the output data, reducing the energy. Furthermore, though solving PDEs with neural networks has traditionally focused on learning on Euclidean spaces, recently Li et al. [
171] proposed a new architecture that includes “Fourier neural operators” to generalize this to functional spaces. They showed it achieves greater accuracy compared to previous ML-based solvers and also can solve entire families of PDEs instead of just one. There is a vast amount of other work using physics-guided architecture towards solving PDEs and other PDE-related applications as well which are not included in this survey (e.g., see ICLR workshop on deep learning for differential equations ([
5]))
A recent direction also relating to conserved or invariant quantities is the incorporation of the Hamiltonian operator into NNs [
64,
112,
268,
317]. The Hamiltonian operator in physics is the primary tool for modeling the time evolution of systems with conserved quantities, but until recently the formalism had not been integrated with NNs. Greydanus et al. [
112] designed a NN architecture that naturally learns and respects energy conservation and other invariance laws in simple mass-spring or pendulum systems. They accomplish this through predicting the Hamiltonian of the system and re-integrating instead of predicting the state of physical systems themselves. This is taken a step further in Toth et al. [
268], where they show that not only can NNs learn the Hamiltonian, but also the abstract phase space (assumed to be known in Greydanus et al. [
112]) to more effectively model expressive densities in similar physical systems and also extend more generally to other problems in physics. Recently, the Hamiltonian-parameterized NNs above have also been expanded into NN architectures that perform additional differential equation-based integration steps based on the derivatives approximated by the Hamiltonian network [
61].
Encoding other domain-specific physical knowledge. Various other domain-specific physical information can be encoded into architecture that does not exactly correspond to known invariances but provides meaningful structure to the optimization process depending on the task at hand. This can take place in many ways, including using domain-informed convolutions for CNNs, additional domain-informed discriminators in GANs, or structures informed by the physical characteristics of the problem. For example, Sadoughi et al. [
239] prepend a CNN with a Fast Fourier Transform layer and a physics-guided convolution layer based on known physical information pertaining to fault detection of rolling element bearings. A similar approach is used in Sturmfels et al. [
255], but the added beginning layer instead serves to segment different areas of the brain for domain guidance in neuroimaging tasks. In the context of generative models, Xie et al. [
296] introduce tempoGAN, which augments a general adversarial network with an additional discriminator network along with additional loss function terms that preserve temporal coherence in the generation of physics-based simulations of fluid flow. This type of approach, though found mostly in NN models, has been extended to non-NN models in Baseman et al. [
24], where they introduce a physics-guided Markov Random Field that encodes spatial and physical properties of computer memory devices into the corresponding probabilistic dependencies.
Fan et al. [
89] define new architectures to solve the inverse problem of electrical impedance tomography, where the goal is to determine the electrical conductivity distribution of an unknown medium from electrical measurements along its boundary. They define new NN layers based on a linear approximation of both the forward and inverse maps relying on the nonstandard form of the wavelet decomposition [
33].
Architecture modifications are also seen in dynamical systems research encoding principles from differential equations. Chen et al. [
58] develop a continuous depth NN based on the Residual Network [
122] for solving ordinary differential equations. They change the traditionally discretized neuron layer depths into continuous equivalents such that hidden states can be parameterized by differential equations in continuous time. This allows for increased computational efficiency due to the simplification of the backpropagation step of training and also creates a more scalable normalizing flow, an architectural component for solving PDEs. This is done by parameterizing the
derivative of the hidden states of the NN as opposed to the states themselves. Then, in a similar application, Chang et al. [
53] uses principles from the stability properties of differential equations in dynamical systems modeling to guide the design of the gating mechanism and activation functions in an RNN.
Currently, human experts have manually developed the majority of domain knowledge-encoded employed architectures, which can be a time-intensive and error-prone process. Because of this, there is increasing interest in automated neural architecture search methods [
20,
85,
126]. A young but promising direction in ML architecture design is to embed prior physical knowledge into neural architecture searches. Ba et al. [
18] add physically meaningful input nodes and physical operations between nodes to the neural architecture search space to enable the search algorithm to discover more ideal physics-guided ML architectures.
Auxiliary Task in Multi-Task Learning. Domain knowledge can be incorporated into ML architecture as auxiliary tasks in a multi-task learning framework. Multi-task learning allows for multiple learning tasks to be solved at the same time, ideally while exploiting commonalities and differences across tasks. This can result in improved learning efficiency and predictions for one or more of the tasks. Therefore, another way to implement physics-based learning constraints is to use an auxiliary task in a multi-task learning framework. Here, an example of an auxiliary task in a multi-task framework might be related to ensuring physically consistent solutions in addition to accurate predictions. The promise of such an approach was demonstrated for a computer vision task by integrating auxiliary information (e.g., pose estimation) for facial landmark detection [
315]. In this paradigm, a task-constrained loss function can be formulated to allow errors of related tasks to be back-propagated jointly to improve model generalization. Early work in a computational chemistry application showed that a NN could be trained to predict energy by constructing a loss function that had penalties for both inaccuracy
and inaccurate energy derivatives with respect to time as determined by the surrounding energy force field [
219]. In particle physics, De Oliveira et al. [
72] uses an additional task for the discriminator network in a GAN to satisfy certain properties of particle interaction for the production of jet images of particle energy.
Physics-guided Gaussian process regression.
Gaussian process regression (
GPR) [
287] is a nonparametric, Bayesian approach to regression that is increasingly being used in ML applications. GPR has several benefits, including working well on small amounts of data and enabling uncertainty measurements on predictions. In GPR, first, a Gaussian process prior must be assumed in the form of a mean function and a matrix-valued kernel or covariance function. One way to incorporate physical knowledge in GPR is to encode differential equations into the kernel [
258]. This is a key feature in Latent Force Models which attempt to use equations in the physical model of the system to inform the learning from data [
10,
182]. Alvarez et al. [
10] draw inspiration from similar applications in bioinformatics [
101,
165], which showed an increase in predictive ability in computational biology, motion capture, and geostatistics datasets. More recently, Glielmo et al. [
108] propose a vectorial GPR that encodes physical knowledge in the matrix-valued kernel function. They show rotation and reflection symmetry of the interatomic force between atoms can be encoded in the Gaussian process with specific invariance-preserving covariant kernels. Furthermore, Raissi et al. [
225] show that the covariance function can explicitly encode the underlying physical laws expressed by differential equations in order to solve PDEs and learn with smaller datasets.
3.5 Requirements and Benefits from Different Physics-ML Methodologies
Methodologies for integrating scientific knowledge in ML described in this section encompass the vast majority of work on this topic. Table
2 summarizes these by listing the requirements needed for different types of methods and the corresponding possible benefits. As we can see, depending on the context of the problem or available resources, different methods can be optimal. Hybrid methods like residual modeling are the simplest case, as they require no process-based knowledge beyond an operational mechanistic model to be used during run time. Physics-guided loss functions require additional domain expertise to determine what terms to add to the loss function, and ML cross-validation techniques are also recommended to weight the different loss function terms. Many of the foundational works on physics-guided loss functions also include open source code that could be adapted to new applications (e.g., Raissi et al. [
226], Read et al. [
231], Wang et al. [
282]). For physics-guided initialization, domain expertise can be used to determine the most relevant synthetic data for the application, but the ML can remain process-agnostic. Physics-guided architecture is often the most complex approach, where both domain and ML expertise is needed, for example, to customize neural networks by establishing physically meaningful connections and nodes. Note that there can also be multiple Physics-ML method options for a given computational benefit. For example, incorporating physical consistency into ML models can be done through weak constraints as in a loss function, hard constraints through new architectures, or indirectly through physically consistent training data from a mechanistic model simulation.
Note that for a given application-centric objective, only some of these methods may be applicable. For example, hybrid methods will not be suitable for solving PDEs since the goal of reduced computational complexity cannot be reached if the existing solver is still needed to produce the output (
\(y_{t}\) in Figure
1). Also in the case of discovering governing equations, there often is not a known physical model to compare to for either creating a residual model or hybrid approach. Data generation applications also do not make sense for residual modeling since the purpose is to simulate a data distribution rather than improve on a physical model.
Many of the physics-ML methods can also be combined. For example, a physics-guided loss function, physics-guided architecture, and physics-guided initialization could all be applied to an ML model. We saw in Section
3.1 that Jia et al. [
130] and Read et al. [
231] in particular combined physics-guided loss functions with physics-guided initialization. Also, Karpatne et al. [
140] combined a physics-guided loss function with a hybrid physics-ML framework. More recently, Jia et al. [
133] combine physics-guided initialization and physics-guided architecture.
An overall goal of physics-ML methods presented in this section is to address resource efficiency issues (i.e., the ability to solve problems with less computational resources in the context of objectives defined in Section 2) while maintaining high predictive performance, sample efficiency, and interpretability relative to traditional ML approaches. For example, physics-ML methods for solving PDEs (Section
2.5) are likely to be more computationally efficient than direct numerical approaches and more physically consistent than traditional ML approaches. As another example, for the objective of downscaling (Section
2.2), physics-ML methods can be expected to provide high-resolution
\(y_{t}\) but at a much smaller computational cost than possible via traditional mechanistic models and provide much better quality output while using fewer training samples relative to traditional ML approaches. Another major utility of physics-ML methods is to reduce the overall solution search space, which has a direct impact on sample efficiency (i.e., reduced number of observations required) and the amount of computation time taken for model training. For example, physics-ML methods for discovering governing equations can be expected to work with much fewer observations and take less computation time relative to traditional ML methods.