Multi-hierarchical surrogate learning for explicit structural dynamical systems using graph convolutional neural networks

Download PDF

Jonas Kneifl ORCID: orcid.org/0000-0003-3934-6968¹,
Jörg Fehr¹,
Steven L. Brunton² &
…
J. Nathan Kutz³

764 Accesses
Explore all metrics

Abstract

Highly nonlinear dynamic finite element simulations using explicit time integration are particularly valuable tools for structural analysis in fields like automotive, aerospace, and civil engineering, or in the study of injury biomechanics. However, such state-of-the-art simulation models demand significant computational resources. Conventional data-driven surrogate modeling approaches address this by evolving the dynamics on low-dimensional embeddings, yet the majority of them operate directly on high-resolution data obtained from numerical discretizations, making them costly and unsuitable for adaptive resolutions or for handling information flow over large spatial distances. We therefore propose a multi-hierarchical framework for the structured creation of a series of surrogate models at different resolutions. Macroscale features are captured on coarse surrogates, while microscale effects are resolved on finer ones, while leveraging transfer learning to pass information between scales. The objective of this study is to develop efficient surrogates for a kart frame model in a frontal impact scenario. To achieve this, its mesh is simplified to obtain multi-resolution representations of the kart. Subsequently, a graph-convolutional neural network-based surrogate learns parameter-dependent low-dimensional latent dynamics on the coarsest representation. Following surrogates are trained on residuals using finer resolutions, allowing for multiple surrogates with varying hardware requirements and increasing accuracy.

Mesh-based GNN surrogates for time-independent PDEs

Article Open access 09 February 2024

Assessing the Quality of Real-Time Hybrid Simulation Tests with Deep Learning Models

Deep Learning Enhanced Snapshot Generation for Efficient Hyper-reduction in Nonlinear Structural Dynamics

Article 30 July 2024

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Physical prototypes can be used in scenario-based testing applications, e.g. for safety considerations in crashworthiness investigations. However, such prototypes alone are not practical as they are prohibitively expensive, inflexible, and time-consuming to build and test. Thus, computer-aided engineering emerged as a key pillar in product development, e.g. in automotive [1]. In this context, explicit structural dynamical simulations are particularly relevant, e.g. for identifying weak points in the design of crash structures and optimizing the use of material.

Such simulations are used in a variety of applications including many-query evaluations, design optimizations, or deployment on low-budget hardware. While many of these applications require high accuracy, the accuracy of classic solvers for high-fidelity simulation models is limited by the resolution of the spatial discretization. Consequently, increased accuracy comes at the cost of increased dimensionality and thus computational effort. This makes modern high-fidelity simulation models prohibitive for certain applications such as large parameter studies, for usage on weak hardware, or for real-time applications. Hence, an urgent need for surrogate models that keep their high-fidelity counterparts’ expressiveness while being more cost-effective to simulate is present.

Fortunately, the intrinsic dimension of a given problem is in many cases much smaller and the actual solution space lies on a low-dimensional manifold. This led to the emerge of data-based model order reduction (MOR) [2,3,4,5,6,7] as a viable solution to the task of creating efficient yet accurate surrogate models by identifying suitable low-dimensional descriptions. In this context, two primary challenges must be addressed: One challenge is to identify expressive coordinates that are simultaneously low-dimensional to ensure computational efficiency, yet still adequate for describing the system. The other is the approximation of the (parameter-dependent) system dynamics in the identified reduced coordinates. Widely used data-driven methods to construct a low-dimensional embedding include linear methods such as the proper orthogonal decomposition (POD) [8] (also known as principal component analysis (PCA)) and its nonlinear counterpart, the autoencoder (AE), which produces nonlinear manifolds. Such data-driven methods are particularly appropriate in the context of commercial simulation software with inaccessible source code, where all that is left is data and limited information about the model itself.

Most of the existing MOR approaches directly operate on the given high-dimensional discretization, i.e. the resulting surrogates always try to approximate a system for a fixed high resolution. However, as already mentioned this resolution usually is neither driven by the underlying problem nor the user’s intended application. In many cases, a coarser or adaptive resolution has advantages if the accuracy does not suffer as a result. For example, considerable computational power can be saved when visualizing complex three-dimensional systems by using coarser resolutions. Moreover, static resolutions cannot react to changes in computational environments, like dynamically changing memory restrictions, or to changes in the desired approximation quality. Consequently, the question arises as to why we should limit data-based surrogate models to these fixed original resolutions when they have the advantage of not being limited by the spatial resolution? They can operate on a coarse subselection of a fine mesh with an accuracy only limited by their expressiveness and the high-fidelity data. One discretization-free approximation scheme for parametrized PDEs can be found in [9] and other mesh-free approaches are given in [10] and [11, 12].

In this work, we develop an approach that is fundamentally mesh-free, i.e., it is not restricted to the underlying high-resolution discretization. Instead, we take advantage of the fact that the surrogates do not require fine spatial resolution by first excluding large parts of the model during model creation while taking into account the recent advantages of graph convolutional neural networks (GCNNs) [13]. For this purpose, we transfer and adjust ideas from multiresolution autoencoders [14] to make them applicable to irregular data. In particular, we present a graph-convolutional hierarchical multiscale approximation scheme for a given system in which the global context is captured in coarse representations. By doing so, we can speed up the learning process, create multiple models with individual hardware requirements, and resolve multiscale issues that often arise in spatio-temporal dynamics of complex systems.

Our fundamental idea is (i) to represent the high-fidelity model in a graph-like structure, (ii) apply mesh simplification to derive coarse representations, (iii) fit a surrogate model on the coarsest representation, (iv) refine the model, and (v) fit another surrogate on the next finer level leveraging transfer learning. The steps (iv) and (v) can be repeated until a performance threshold is met or no more coarse representations are available. An abstracted visual impression of this workflow is given in Fig. 1, where it may be seen that our approach resembles U-NETs [15] in its structure. The individual surrogate models themselves are composed out of graph convolutional autoencoders, which construct low-dimensional coordinates, and out of multi-layer perceptrons (MLPs) that approximate those coordinates based on the time and given parameters.

1.1 State-of-the-art

The proposed method is located in the research area of data-driven MOR under consideration of graph convolutional networks and hierarchical modeling. Reference is therefore made in the following to related work in these areas.

Data-driven MOR Even though data-driven MOR using POD-based surrogate models continues to be widely used and often is able to produce satisfying results for many problems [16,17,18,19,20], autoencoders [21,22,23] have been shown to outperform their linear counterpart for problems with slowly decaying Kolmogorov n-width [24]. There also exist combinations of POD and autoencoders [25]. Of the many different autoencoder architectures, convolutional autoencoders in particular have stood out [21, 22, 26, 27] as they exploit spatial information and can detect local patterns using filters. This makes convolutional neural networks also interesting for other applications in structural dynamics, see e.g. [28, 29]. Meanwhile, MOR methods such as POD are also used to improve convolutional neural networks, for example to reduce the number of layers [30].

Unfortunately, conventional convolutional neural networks (CNNs) face the significant limitation of only being applicable to regular grid-like data (e.g. to images). Irregular data, on the contrary, as it is present in complex three-dimensional discretized crash simulations requires new techniques, thus leading to an increased interest in geometrical deep learning [31]. While some approaches map the irregular domain to a regular one [32] and apply convolutions there, others apply convolutional-like operations on dynamically constructed graphs of point-clouds [33], and still others apply generalizations of CNN architectures to non-Euclidean domains [34].

Graph convolutional neural networks Graph convolutional neural networks [13] can be directly applied to irregular data by transferring the principle of convolutions to geometric problems. They can extract information and relations about features of nodes from their spatial connections. An early version of GCNNs can be found in [35] and an adaption of it in [36]. GCNNs are found in the context of MOR in [37], where a graph convolutional autoencoder using gcn2 convolutions [38] is compared to a classic fully connected AE for the creation of reduced order models. Furthermore, in [39] a spatial graph convolutional autoencoder is used to derive reduced order models and [40] utilizes GCNNs for the approximation of time-dependent PDEs under geometric variability. In contrast to classic convolutions, graph convolutions themselves cannot automatically reduce the dimensionality of the data. To decrease the number of nodes that are processed in the layers, several pooling operations for irregular data are developed [41]. A general overview of GCNNs can be found in [42] and a literature review focused on MOR in [39].

Graph Networks In addition to graph convolutional neural networks, many other exciting applications in geometrical deep learning have emerged like graph networks [43]. Graph network-based simulation models have been used to model a physically informed simulation model in which the graph network outputs state-time derivatives that are then used in ODE integrators for future time predictions [44, 45]. Moreover, symbolic representations of a learned model are discovered by applying symbolic regression to components of its message passing function [46, 47]. Graph networks are also used in generative tasks where graphs are built sequentially based on learned distributions [48]. Another recent development are attention-based graph transformer models [49] that have been used, for example, as neural operators to capture the solutions of PDEs [50].

Hierarchical Structures in Graph Convolutional Networks Including a hierarchical structure in GCNNs is a natural way of proceeding. In Graph U-Nets [51], pooling layers are used to form smaller graphs using a trainable projection vector. Moreover, a multiscale MeshGraphNet that operates on different resolutions is introduced in [52]. A coarse resolution is used to propagate information further and overcome the issue of slow message propagation occurring in fine resolutions. Moreover, [53] use two MeshGraphNets to create surrogate models for FE simulations of latticed structures. The first one captures the dynamics on a reduced graph representation of the structure and the second one maps these results onto the full-scale displacements.

Another application of mesh reduction is applied in [54], where information of a fine graph is encoded in a coarse subset of the nodes. By doing so they can evolve the latent dynamics efficiently in time using an attention model. Note that the hierarchical approaches are also incorporated in other architectures like variational autoencoders [55] and are not only used in the spatial domain but also for the evolution of dynamics in time, as explained in [56].

1.2 Main contributions

Due to the hierarchical structure of our proposed approach, global dynamics can be captured on the coarsest surrogate, whereas finer details are captured in the refined versions. Thus, the framework is naturally suited for multiscale problems, where macro- and microscale dynamics occur at the same time. Modeling such systems poses a particular challenge and is therefore often approached in a special way. For example, in multigrid methods [57, 58], where a coarse-grained model is gradually refined in areas of high inaccuracy in order to achieve the required accuracy. Such methods also have been unified with convolutional neural networks [59].

While other hierarchical approaches use hierarchical structures only to foster the learning process for an approximation of the fine discretized high-fidelity solution, we use coarse representations that are physically and visually interpretable and consequently directly useful. Moreover, we never perform costly training of a surrogate on the fine data and can reduce the latent representation to a vector of the intrinsic dimension instead of a small but comparably larger graph. Additionally, we build the surrogates one after the other enabling the learning process to be stopped if desired. In doing so, we still take advantage of already learned behavior by applying transfer learning from coarser to refined surrogates.

In addition, we consider a numerical example in form of a simplified kart frontal impact simulation with a number of specific challenges: The considered scenarios encompass a multitude of parameter dependencies, which determine the impact and exhibit complex material properties. Furthermore, they are subject to high nonlinearities and contact with plastic deformation. In contrast to other structural dynamical systems often considered in the literature, transient dynamics without stationary system behavior must be approximated instead of the post-transient dynamics on an attracting submanifold. Moreover, the considered example is a finite element model and consequently shares the challenges with other similar systems including (i) the sheer dimensionality of such systems, (ii) the inaccessibility of commercial software code, and (iii) the computationally intensive data generation. Other works considering crashworthiness or impact scenarios in the context of data-driven surrogate modeling can be found in [19, 60, 61].

The highlights of our work can be summarized as follows:

1.
We propose a multi-hierarchical nonlinear model reduction scheme that
1. (a)
  creates adaptive models for different needs (resolution, memory etc.) in visually interpretable domains
2. (b)
  leverages transfer learning to progressively refine surrogate models
2.
The proposed surrogate architecture is especially suited for multiscale problems and complex discretized 3D structural dynamical problems
3.
We provide accurate yet efficient data-driven surrogates for the deformations of a nonlinear finite element kart frame frontal impact simulation in a structured and efficient manner

1.3 Structure

The paper is structured as follows: The proposed multi-hierarchical surrogate modeling approach is explained in Sect. 2 along with the required theory, followed by the presentation of the considered numerical example in form a kart frontal impact simulation in Sect. 3. The surrogate modeling application as well as the results and discussion are presented in Sect. 4. The paper ends with a conclusion in Sect. .

2 Multi-hierarchical surrogate modeling approach

A detailed explanation of the general problem setup, the theory required to follow the explanations and most importantly, the multi-hierarchical surrogate modeling approach itself is given in this section. The following explanations are not especially tailored to the kart example considered later, but follow a general manner so that interested readers can transfer them to their individual problem classes more easily.

2.1 Problem setup

Consider a nonlinear dynamical system

$$\begin{aligned} \frac{\textrm{d}}{\textrm{d}t}\varvec{x}(t, \varvec{\mu })=\varvec{f}(\varvec{x}(t, \varvec{\mu }), t, \varvec{\mu }) \end{aligned}$$

(1)

which is determined by the time $t\in {\mathcal {T}}\subseteq \mathbb {R}^{+}$, the system state $\varvec{x}\in {\mathcal {X}}\subseteq \mathbb {R}^{N}$, and (simulation) parameter $\varvec{\mu }\in {\mathcal {M}}\subseteq \mathbb {R}^{\ell }$. Usually, time-stepping schemes are used to approximate the discrete-time flow map

$$\begin{aligned} \varvec{x}(t, \varvec{\mu })=\varvec{F}(t,\varvec{\mu },\varvec{x}_{0}), \end{aligned}$$

(2)

i.e. the solution of system Eq. (1) at $\eta $ discrete time points $t\in \{t_0,...,t_{\eta -1}\}$. The flow map $\varvec{F}:{\mathcal {T}}\times {\mathcal {M}}\times {\mathcal {X}}\rightarrow {\mathcal {X}}$ describes the mapping from the initial condition $\varvec{x}_0\in \mathbb {R}^{N}$ and parameter $\varvec{\mu }$ to the solution at a given time $t\ge t_0$. In the course of this paper, our major goal is to find a surrogate model $\varvec{\Sigma }$ that approximates the solution Eq. (2) of Eq. (1), i.e.

$$\begin{aligned} \tilde{\varvec{x}}(t, \varvec{\mu })=\varvec{\Sigma }(t,\varvec{\mu },\varvec{x}_{0})\approx \varvec{F}(t,\varvec{\mu },\varvec{x}_{0}) \end{aligned}$$

while significantly reducing computational requirements. This can often be expressed in terms of the computational time so that the computational time of the surrogate $\Delta T_{\varvec{\Sigma }}$ is much faster than that of the original system $\Delta T_{\varvec{F}}$: $\Delta T_{\varvec{\Sigma }} \ll \Delta T_{\varvec{F}}$.

If, however, not all states are of interest but only a subselection, which is the case when coarsening the original discretization of the system, the surrogate can operate on a downsampled state $\varvec{x}_{\text {d}}(t, \varvec{\mu })= \mathcal {\varvec{D}}\varvec{x}\in {\mathcal {X}}_{\text {d}} \subset {\mathcal {X}}$, ${\mathcal {X}}_{\text {d}} \subset \mathbb {R}^{n_{\text {d}}}$, where $\mathcal {\varvec{D}}\in \{0,1\}^{n_{\text {d}}\times N}: {\mathcal {X}}\rightarrow {\mathcal {X}}_\text {d}$ is a binary selection matrix with $n_{\text {d}}< n$. Its entries are $\mathcal {\varvec{D}}(p,q)=1$ when the $q$-th state is kept or $\mathcal {\varvec{D}}(p,q)=0, \ \forall p\in \{1,\dots ,n_{\text {d}}\}$ when the $q$-th state is discarded. Consequently, the surrogate $\varvec{\Sigma }_\text {d}$ only needs to approximate the selected states of the system’s solution

$$\begin{aligned} \tilde{\varvec{x}}_\text {d}(t, \varvec{\mu })=\varvec{\Sigma }_\text {d}(t,\varvec{\mu },\mathcal {\varvec{D}}\varvec{x}_{0})\approx \mathcal {\varvec{D}}\varvec{F}(t,\varvec{\mu },\varvec{x}_{0})=\varvec{x}_\text {d}(t, \varvec{\mu }). \end{aligned}$$

This can on the one hand ease the surrogate modeling process, but is on the other hand especially useful in cases when the fine discretization of the model does not result from user requirements. In order to obtain a suitable downsampling operation, methods from the field of computer vision are useful.

2.2 Down- and upsampling

For the sampling operations, we rely on surface simplification using quadric error metrics [62], which is a method that produces coarse representations of a given mesh maintaining its shape, i.e., its geometrical characteristics. The method does not necessarily preserve the topology of the mesh as topological holes can be closed and unconnected regions can be joined. Classic FE mesh simplification approaches focus on maintaining the topology, but as we are interested in coarse representations of the original system that have a similar visualization the mentioned method is preferable. The same sampling approach is used as a pooling operation in the context of graph convolutional autoencoders in CoMA [63].

Specifically, we assume that the considered model can be interpreted as an undirected graph ${\mathcal {G}}=(\varvec{{\mathcal {N}}}, {\mathcal {E}}, \varvec{A})$ with a set of vertices (nodes) $\varvec{{\mathcal {N}}}\in \mathbb {R}^{n\times 3}$ and edges ${\mathcal {E}}\in \mathbb {R}^{n_{\text {e}}\times 2}$ describing the node connectivity defined by the adjacency matrix $\varvec{A}\in \{0,1\}^{n\times n}$. Note that the adjacency can also be weighted. The high-fidelity model used in this elaboration is a finite element (FE) model so that the representation as a graph corresponds to the model formulation, as FE models are composed out of elements that contain nodes and define neighborhoods through their edges.

Downsampling Operation The downsampling operation of the nodes is defined by

$$\begin{aligned} \varvec{{\mathcal {N}}}_\text {d}=\mathcal {\varvec{D}}\varvec{{\mathcal {N}}}\end{aligned}$$

with the downsampling matrix defined as previously but with dimensions aligning to the number of nodes $\mathcal {\varvec{D}}\in \{0,1\}^{n_{\text {d}}\times n}:\varvec{{\mathcal {N}}}\rightarrow \varvec{{\mathcal {N}}}_\text {d}$ with $n_{\text {d}}< n$ and $\varvec{{\mathcal {N}}}_\text {d}\subset \varvec{{\mathcal {N}}}$. The selection of the nodes to keep follows [62] using iterative vertex pair contraction. In general, for a given pair of nodes $(\varvec{\nu }_p, \varvec{\nu }_q)$, a vertex pair contraction $(\varvec{\nu }_p, \varvec{\nu }_q) \rightarrow \varvec{\nu }$ moves node $\varvec{\nu }_q$ to a new position $\varvec{\nu }$, connects incident edges to $\varvec{\nu }_q$, deletes the second node $\varvec{\nu }_p$, and removes all degenerate edges and faces. As we only consider selected nodes instead of adjusted nodes, the position of the kept node is not changed, and the contraction results in $(\varvec{\nu }_p, \varvec{\nu }_q) \rightarrow \varvec{\nu }_q$. To, introduce a measure which determines which nodes are kept, each node $\varvec{\nu }=[\nu _x, \nu _y, \nu _z, 1]^\intercal $ is associated with an quadratic error

$$\begin{aligned} \Delta (\varvec{\nu })=\varvec{\nu }^\intercal (\varvec{Q})\varvec{\nu } \end{aligned}$$

(3)

which is defined w.r.t $\varvec{Q}\in \mathbb {R}^{4 \times 4}$ describing the distance of a given point to the set of planes on which intersections the node is placed. The procedure can be summarized as follows:

1.
Select valid vertex pairs (either neighbors or close distant nodes)
2.
Select best $\varvec{\nu }$ out of $\{\varvec{\nu }_p, \varvec{\nu }_q\}$ for each valid pair based on cost $\varvec{\nu }^\intercal (\varvec{Q}_p+\varvec{Q}_q)\varvec{\nu }$
3.
Iteratively remove node from pair $(\varvec{\nu }_p, \varvec{\nu }_q)$ of least cost, update costs

Upsampling Operation It is of interest to recover the original representation of the model from every coarse representation given. Unfortunately, a lossless reconstruction of the original mesh based on the simplified one is in general not possible. Consequently, we seek an upsampling matrix $\mathcal {\varvec{U}}\in \mathbb {R}^{n\times n_{\text {d}}}$ with

$$\begin{aligned} \varvec{{\mathcal {N}}}_\text {u}=\mathcal {\varvec{U}}\varvec{{\mathcal {N}}}_\text {d} \end{aligned}$$

(4)

that approximates the original mesh. In this work, we follow the procedure of [63] and generate the upsampling matrix during the downsampling matrix creation process. A node $\varvec{\nu }_q$ that is kept in the downsampling process will lead to an entry in the upsampling matrix that follows $\mathcal {\varvec{U}}(p,q)=1$. A discarded node $\varvec{\nu }_p$, on the contrary, is mapped onto the down-sampled mesh using barycentric coordinates projecting it into the closest triangle (i, j, k) in the down-sampled mesh

$$\begin{aligned} \tilde{\varvec{\nu }}_p={w}_i \varvec{\nu }_i+{w}_j \varvec{\nu }_j+{w}_k \varvec{\nu }_k \end{aligned}$$

with $\varvec{\nu }_i,\varvec{\nu }_j,\varvec{\nu }_k \in \varvec{{\mathcal {N}}}_\text {d}$ and ${w}_i+{w}_j+{w}_k=1$. The upsampling matrix is then updated with the corresponding weighting factors so that $\mathcal {\varvec{U}}(p,i)={w}_i, \ \mathcal {\varvec{U}}(p,j)={w}_j, \ \mathcal {\varvec{U}}(p,k)={w}_k$. Visual examples of the coarsened FE model are given in Fig. 2.

2.3 Surrogate modeling

Having the coarse representations of a model present enables the surrogate modeling process at the different levels. In this work, we rely on graph convolutional neural networks (GCNNs) to create a low-dimensional latent representation of the system state. Note that any other data-driven dimensionality reduction scheme, linear as well as nonlinear, can replace the GCNNs. Nevertheless, GCNNs profit from the proposed framework since the learning process is eased and accelerated, and they provide the best approximation quality among the tested methods.

2.3.1 Graph convolutional neural networks

Graph convolution neural networks generalize convolutional neural networks to irregular discretized domains as it is present in FE models. To understand the underlying principle of graph convolutions, recall a graph ${\mathcal {G}}=\{\varvec{{\mathcal {N}}},{\mathcal {E}},\varvec{A}\}$ as described in Sect. 2.2. A graph signal $\varvec{x}\in \mathbb {R}^{n}$ is a feature vector of all $n$ nodes in the graph. A beneficial technique to calculate a convolution between a filter $\varvec{g}\in \mathbb {R}^{N}$ and signal $\varvec{x}$ is that a convolution is just a multiplication in Fourier space. To receive a Fourier transform $\hat{\varvec{x}}$ of the signal, a Fourier basis can be obtained from an eigenvalue factorization of the normalized Laplacian of a graph. The Laplacian is defined as $\varvec{L}^*=\varvec{D}-\varvec{A}$ with the adjacency matrix $\varvec{A}=\varvec{A}({\mathcal {G}})$ and the diagonal matrix of node degrees $\varvec{D}$ with entries $\varvec{D}_{i,i}=\sum _j \varvec{A}{i,j}$. The normalized version of the Laplacian $\varvec{L}=\varvec{I}_n-\varvec{D}^{-\frac{1}{2}}\varvec{A}\varvec{D}^{\frac{1}{2}}$ is real symmetric positive semidefinite. Hence, the factorization $\varvec{L}=\varvec{U}\varvec{\Lambda }\varvec{U}^\intercal $ exists and the matrix $\varvec{U}=\begin{bmatrix} \varvec{u}_1&\varvec{u}_2&\dots&\varvec{u}_n \end{bmatrix}$ represents eigenvectors of the Laplacian ordered by their corresponding eigenvalues, which are stored in the diagonal matrix $\varvec{\Lambda }$. The eigenvectors $\varvec{u}$ are known as Fourier modes of ${\mathcal {G}}$.

A Fourier transform of a signal $\varvec{x}$ is then given by $\hat{\varvec{x}}={\mathscr {F}}(\varvec{x})=\varvec{U}^\intercal \varvec{x}$ and the inverse Fourier transform by $\varvec{x}={\mathscr {F}}(\hat{\varvec{x}})^{-1}=\varvec{U}\hat{\varvec{x}}$. Given those transformations, the graph convolution between the signal $\varvec{x}$ and a filter $\varvec{g}$ results in

$$\begin{aligned} \varvec{x}*\varvec{g}={\mathscr {F}}^{-1}({\mathscr {F}}(\varvec{x})\odot {\mathscr {F}}(\varvec{g}))=\varvec{U}(\varvec{U}^\intercal \varvec{x}\odot \varvec{U}^\intercal \varvec{g}), \end{aligned}$$

(5)

where $\odot $ represents the Hadamard/elementwise product.

Denoting the filter as $\varvec{g}_{\varvec{{w}}}=\text {diag}(\varvec{U}^\intercal \varvec{g})$ and using the conversion $\varvec{a}\odot \varvec{b}=\text {diag}(\varvec{b})\varvec{a}$, Eq. (5) simplifies to

$$\begin{aligned} \varvec{x}*\varvec{g}_{\varvec{{w}}}=\varvec{U}\varvec{g}_{\varvec{{w}}}\varvec{U}^\intercal \varvec{x}, \end{aligned}$$

(6)

which is a formulation all spectral-based GCNNs follow. The idea in spectral convolutional neural networks is that the filters $\varvec{g}_{\varvec{{w}}}=\varvec{W}_{c_i,c_j}^{(l)}=\text {diag}(\varvec{{w}}_{c_i,c_j}^{(l)})$ are the learnable weights $\varvec{W}_{c_i,c_j}^{(l)}$ in convolutional layers

$$\begin{aligned} \varvec{X}_{:,c_j}^{(l+1)}=\varvec{h}^{(l)}( \sum _{c_i=1}^{n_{c}^{(l)}} \varvec{U}\varvec{W}_{c_i,c_j}^{(l)}\varvec{U}^\intercal \varvec{X}^{(l)}_{:,c} ). \end{aligned}$$

(7)

Here, $l$ denotes the layer index, $c_i$ and $c_j$ are the channel indices, $n_{c}$ is the number of channels in the $l$-th layer, $\varvec{h}$ is the activation function, $\varvec{W}_{c_i,c_j}^{(l)}$ is a diagonal matrix with learnable weights of the $l$-th layer, and $\varvec{X}^{(l)}_{:,c}$ is the $c$-th channel of $\varvec{X}^{(l)}\in \mathbb {R}^{n\times n_{c}^{(l)}}$ where $\varvec{X}^{(0)}=\varvec{X}\in \mathbb {R}^{n\times n_{c}^{(0)}}$ represents the original signal of the graph, in our case the $n$ nodes of the FE model with the coordinates stored in three channels, i.e. $n_{c}^{(0)}=3$. The filter formulation of Eq. (7) is not localized in space and requires a high learning complexity. Hence, the use of polynomial filters $\varvec{g}_{\varvec{{w}}}=\sum _{k}^{K}{w}_k\varvec{\Lambda }^k$ is considered in [35]. As such filters still require costly matrix multiplications with the non sparse Fourier basis $\varvec{U}$ they propose to use polynomials that can be recursively calculated from the Laplacian $\varvec{L}$ resulting in ChebNet [35].

Chebyshev Spectral Convolutional Neural Networks In Chebyshev spectral convolutional neural networks [35], the filter $\varvec{g}_{\varvec{{w}}}$ is approximated by Chebyshev polynomials $T$ of order $K$. By doing so, the costly multiplications with the non sparse Fourier basis are replaced by $K$ multiplications with the sparse Laplacian. In detail, the filter is represented by the Chebyshev polynomials $\varvec{g}_{\varvec{{w}}}(\check{\varvec{\Lambda }})= \sum _{k=0}^{K} \varvec{{w}}_kT_k(\check{\varvec{\Lambda }})$ of the eigenvalue matrix $\check{\varvec{\Lambda }}$ of the scaled Laplacian $\check{\varvec{L}}=2\varvec{L}/ \lambda _{\max } - \varvec{I}_n$. Here, $\varvec{{w}}_k$ are learnable polynomial coefficients, and the scaling ensures that all eigenvalues are within $[-1,1]$.

Substituting, this filter in Eq. (6) and exploiting the transformation $\varvec{g}_{\varvec{{w}}}(\check{\varvec{L}})=\varvec{U}\varvec{g}_{\varvec{{w}}}(\check{\varvec{\Lambda }})\varvec{U}^\intercal $, results in a graph convolution

$$\begin{aligned} \varvec{x}*\varvec{g}_{\varvec{{w}}}&= \varvec{U}\varvec{g}_{\varvec{{w}}}(\check{\varvec{\Lambda }})\varvec{U}^\intercal \varvec{x}= \varvec{U}\left( \sum _{k=0}^{K} \varvec{{w}}_kT_k(\check{\varvec{\Lambda }}) \right) \varvec{U}^\intercal \varvec{x}\nonumber \\ &= \sum _{k=0}^{K}\varvec{{w}}_{k} T_k(\check{\varvec{L}})\varvec{x} \end{aligned}$$

(8)

that gets rid of multiplication with $\varvec{U}$. Chebyshev polynomials themselves can be recursively calculated as $T_k(\varvec{a})=2\varvec{a}T_{k-1}(\varvec{a})-T_{k-2}(\varvec{a})$ with $T_{1}=\varvec{a}$ and $T_{0}=\varvec{0}$.

In [36], the Graph Convolutional Network (GCN) is introduced which represents a first order approximation of ChebNet. Often GCNs face overfitting and oversmoothing, which was mitigated in GCN2 [38], where skip connections are used to propagate information over multiple layers. This approach is used in the context of MOR in [37]. Nevertheless, ChebNet yields better results for our example and is consequently used in the following.

2.3.2 Network architecture: a graph convolutional autoencoder with a multilayer perceptron

The network architecture that we use to create a surrogate model on the lowest level is shown in Fig. 3. It consists of (i) a (graph convolutional) autoencoder which is used to learn a low-dimensional embedding ${\mathcal {Z}}\subseteq \mathbb {R}^{r}$ for the high-dimensional state space ${\mathcal {X}}$ and (ii) a multilayer perceptron (MLP) to capture the parameter- and time-dependencies in the identified low-dimensional latent manifold. The autoencoder consists of an encoder $\varvec{\Psi }_{\text {enc}}: {\mathcal {X}}\rightarrow {\mathcal {Z}}$ with learnable weights $\varvec{W}_{\text {enc}}$ mapping the high-dimensional state to a low-dimensional latent representation, i.e. $\varvec{z}= \varvec{\Psi }_{\text {enc}}(\varvec{x}, \varvec{W}_{\text {dec}})$, and a decoder $\varvec{\Psi }_{\text {dec}}: {\mathcal {Z}}\rightarrow {\mathcal {X}}$ with learnable weights $\varvec{W}_{\text {dec}}$ reconstructing the high-dimensional state from the low-dimensional latent representation, i.e. $\breve{\varvec{x}}= \varvec{\Psi }_{\text {dec}}(\varvec{z}, \varvec{W}_{\text {dec}})$. In case of graph convolutional layers, $\varvec{W}_{\text {enc}}$ and $\varvec{W}_{\text {dec}}$ contain the trainable filters. The multilayer perceptron $\varvec{\Phi }: {\mathcal {M}}\times {\mathcal {T}}\times {\mathcal {Z}}\rightarrow {\mathcal {Z}}$ maps the parameters, time and the encoded initial condition $\varvec{z}_0$ to the corresponding latent state $\tilde{\varvec{z}}= \varvec{\Phi }(\varvec{\mu }, t, \varvec{z}_0, \varvec{W}_{\text {mlp}})$ with trainable weights $\varvec{W}_{\text {mlp}}$. As we only consider simulations starting from the same initial condition in our example, it is neglected in the following.

The complete autoencoder reconstructs a state following

$$\begin{aligned} \breve{\varvec{x}}=\varvec{\Psi }_{\text {dec}}\circ \varvec{\Psi }_{\text {enc}}(\varvec{x}) \end{aligned}$$

(9)

and the surrogate model that captures the (parametric) system dynamics is a function composition of the MLP and the decoder

$$\begin{aligned} \tilde{\varvec{x}}=\varvec{\Psi }_{\text {dec}}\circ \varvec{\Phi }(\varvec{\mu }, t). \end{aligned}$$

(10)

To adjust the weights $\varvec{W}=\{\varvec{W}_{\text {enc}}, \varvec{W}_{\text {dec}}, \varvec{W}_{\text {mlp}}\}$ of the networks given some data, we minimize the loss function

$$\begin{aligned} {\mathcal {L}}=&\, \gamma _\text {approx}(\varvec{x}-\tilde{\varvec{x}})^2 + \gamma _\text {rec}(\varvec{x}-\breve{\varvec{x}})^2 \end{aligned}$$

(11a)

$$\begin{aligned} =&\,\underset{\text {Decoder and MLP}}{\underbrace{\gamma _\text {approx}(\varvec{x}-\varvec{\Psi }_{\text {dec}}(\varvec{\Phi }(\varvec{\mu }, t, \varvec{W}_{\text {mlp}}), \varvec{W}_{\text {dec}}))^2}} \end{aligned}$$

(11b)

$$\begin{aligned}&+ \underset{\text {Encoder and Decoder}}{\underbrace{ \gamma _\text {rec}(\varvec{x}-\varvec{\Psi }_{\text {dec}}(\varvec{\Psi }_{\text {enc}}(\varvec{x}, \varvec{W}_{\text {enc}}), \varvec{W}_{\text {dec}}))^2}} . \end{aligned}$$

(11c)

The first part of the loss Eq. (11b) ensures that the surrogate captures the system behavior for given parameters, the second part of the loss Eq. (11c) ensures that the autoencoder is able to reconstruct the state from the latent space well.

2.4 Transfer learning

Once a surrogate is found on the coarsest level, the surrogate modeling process can be repeated on the next level of refinement. However, instead of learning everything from scratch, the finer surrogate uses the output of the already trained coarser surrogate. To transfer the knowledge from one level to another, we connect the finer and the coarse graph representations of the system via down- and upsampling matrices, fix the already trained coarse surrogate, and add its output in the latent space and the reconstruction of the fine surrogate as can be seen in Fig. 4.

Consequently, the fine model only needs to capture inaccuracies and non-captured system behavior of the coarse one. The general architecture of the finer decoder, encoder, and MLP follow the previous definitions. The following section explains how to construct the multi-hierarchical model.

2.4.1 Multi-hierarchical model

The starting points for creating the multi-hierarchical model are the differently resolved discretizations of the original system. The multi-hierarchical modeling approach starts with creating a surrogate $\varvec{\Sigma }_{\ell }$ on the deepest, i.e. coarsest level $\ell $ and just follows the definitions given in Eq. (9) and Eq. (10) but operates on a downsampled state $\varvec{x}_{\ell }=\mathcal {\varvec{D}}_{\ell } \varvec{x}$ instead of the original state description. Once the coarse surrogate is created, the surrogate modeling process continues to the next finer level.

The weights $\varvec{W}_\ell $ of the already trained coarse surrogate are fixed to train the finer surrogate $\varvec{\Sigma }_{\ell \text {-}1}$. The first adjustment compared to the presented standard modeling scheme takes place in the encoding of the system state. Instead of having a single encoder $\varvec{z}_{\ell } = \varvec{\Psi }_{\text {enc}, \ell }(\varvec{x}_{\ell })$ as previously, the latent state is computed as an addition of two encoders

$$\begin{aligned} \begin{aligned} \varvec{z}_{\ell \text {-}1}&= \varvec{\Psi }_{\text {enc}, \ell \text {-}1} (\varvec{x}_{\ell \text {-}1}) \\&= \varvec{\Psi }_{\text {enc}, \ell }(\mathcal {\varvec{D}}_{\ell \text {-}1}^{\ell } \varvec{x}_{\ell \text {-}1} ) + \varvec{\Psi }_{\text {enc}, \ell \text {-}1}^{*}(\varvec{x}_{\ell \text {-}1} ) \\&= \varvec{\Psi }_{\text {enc}, \ell }(\varvec{x}_{\ell }) + \varvec{\Psi }_{\text {enc}, \ell \text {-}1}^{*}(\varvec{x}_{\ell \text {-}1})\\&= \varvec{z}_{\ell } + \varvec{\Psi }_{\text {enc}, \ell \text {-}1}^{*}(\varvec{x}_{\ell \text {-}1} ) \end{aligned} \end{aligned}$$

(12)

with $\mathcal {\varvec{D}}_{\ell \text {-}1}^{\ell }$ being the downsampling matrix that maps a state from level $\ell \text {-}1$ to $\ell $. In this context $\varvec{\Psi }_{\text {enc}, \ell }$ is the trained and fixed encoder from the coarse level and $\varvec{\Psi }_{\text {enc}, \ell \text {-}1}^*$ is a new trainable encoder.

A similar approach is chosen to reconstruct the state in the physical space. Therefore, we rely on an addition of the already trained decoder $\varvec{\Psi }_{\text {dec}, \ell }$ and a trainable new one $\varvec{\Psi }_{\text {dec}, \ell \text {-}1}^{*}$ resulting in the refined decoder

$$\begin{aligned} \breve{\varvec{x}}_{\ell \text {-}1}&= \varvec{\Psi }_{\text {dec}, \ell \text {-}1}(\varvec{z}_{\ell \text {-}1}) \nonumber \\&= \mathcal {\varvec{U}}_{\ell }^{\ell \text {-}1}\varvec{\Psi }_{\text {dec}, \ell }(\varvec{z}_{\ell \text {-}1}) + \varvec{\Psi }_{\text {dec}, \ell \text {-}1}^{*}(\varvec{z}_{\ell \text {-}1}) \nonumber \\&= \mathcal {\varvec{U}}_{\ell }^{\ell \text {-}1}\breve{\varvec{x}}_{\ell } + \varvec{\Psi }_{\text {dec}, \ell \text {-}1}^{*}(\varvec{z}_{\ell \text {-}1}) \end{aligned}$$

(13)

where $\mathcal {\varvec{U}}_{\ell }^{\ell \text {-}1}$ describes the upsampling matrix from level $\ell $ to $\ell \text {-}1$. This static and error-prone upsampling matrix can be replaced with an adaptive learnable upsampling scheme to further minimize the error. In this work, we decided to use a simple linear fully-connected layer of the form

$$\begin{aligned} \varvec{x}_{\ell \text {-}1} \approx \varvec{\varTheta }^{\ell \text {-}1}_{\ell }(\varvec{x}_{\ell }, \varvec{W}_{\text {up}, \ell \text {-}1}) :=\varvec{W}_{\text {up}, \ell \text {-}1}^{\setminus \varvec{0}}\varvec{x}_{\ell } + \varvec{W}_{\text {up}, \ell \text {-}1}^{\varvec{0}} \end{aligned}$$

(14)

which proved sufficient in experiments to significantly reduce the upsampling error while still maintaining limited computational effort. The trainable parameters consist of the weights $\varvec{W}_{\text {up}, \ell \text {-}1}^{\setminus \varvec{0}}$ and the bias $\varvec{W}_{\text {up}, \ell \text {-}1}^{\varvec{0}}$. Replacing the former upsampling matrix with Eq. (14) in Eq. Eq. (13) leads to the decoder formulation we are using in this work

$$\begin{aligned} \varvec{\Psi }_{\text {dec}, \ell \text {-}1}(\varvec{z}_{\ell \text {-}1}) = \varvec{\varTheta }^{\ell \text {-}1}_{\ell }(\varvec{\Psi }_{\text {dec}, \ell }(\varvec{z}_{\ell \text {-}1})) + \varvec{\Psi }_{\text {dec}, \ell \text {-}1}^{*}(\varvec{z}_{\ell \text {-}1}). \end{aligned}$$

(15)

The refined multilayer perceptron on the contrary uses the previous one’s output as additional input

$$\begin{aligned} \tilde{\varvec{z}}_{\ell \text {-}1}&= \varvec{\Phi }_{\ell \text {-}1}(\varvec{\mu }, t) = \varvec{\Phi }^{*}_{\ell \text {-}1}(\varvec{\mu }, t, \varvec{\Phi }_{\ell }(\varvec{\mu }, t))\nonumber \\&= \varvec{\Phi }^{*}_{\ell \text {-}1}(\varvec{\mu }, t, \tilde{\varvec{z}}_{\ell }(\varvec{\mu }, t)). \end{aligned}$$

(16)

To create the next finer surrogate model $\varvec{\Sigma }_{\ell \text {-}2}$, the same procedure is repeated but this time with $\varvec{\Sigma }_{\ell \text {-}1}$ serving as coarse model.

To enable a comparison among all levels, it is of interest to transform the approximations of the surrogate models back into the original finely discretized state space. For the upsampling of the coarse approximation to the original discretization, the static upsampling matrices $\mathcal {\varvec{U}}_{l}^{0},\ 1\le l\le \ell $ are used

$$\begin{aligned} \tilde{\varvec{x}}_{l}^{0} = \mathcal {\varvec{U}}_{l}^{0}(\tilde{\varvec{x}}_{l}). \end{aligned}$$

The presented framework offers many adjustments to adapt it to one’s own needs, and the presented decision choices represent only one suitable configuration. Some specific variations are mentioned in the following.

2.5 Alternative architectures

One point at which adjustments can be made to the presented architecture are the refined versions of the encoder Eq. (12), decoder Eq. (15), and MLP Eq. (16). Instead of having additive transfer learning for the encoder, the results from several layers could be concatenated, which lead to a higher latent dimension. This drawback, along with the absence of a performance boost in numerical experiments, has led to the abandonment of this idea. Furthermore, adding output of the coarse decoder to the input for the fine one, as we did for the MLP, would significantly increase in the number of input dimensions and is generally difficult for graph convolutions.

Another design modification can be made in the adaptive upsampling Eq. (14). Either by replacing the proposed upsampling mapping, e.g. with another type of layer, by optimizing the sparse upsampling matrix $\mathcal {\varvec{U}}_{l}^{m}$, or by carefully selecting the nodes that require a refinement.

2.5.1 Adaptive refinement

The idea of adaptive refinement is that only areas of certain interest or high error are refined in the surrogate modeling process. That means that only the coarsest surrogate is trained on the precomputed discretizations. All subsequent finer levels will only use the refined version for areas where it is desired.

A possible data-based approach to select those areas is to calculate a suitable error of the coarse surrogate on a validation dataset and then chose those nodes which have the highest error or penalize an error threshold. We refer to them as faulty nodes. For the selected nodes, all neighboring nodes in the next finer graph are added, see Fig. 5. In the next step, the adjacency matrix defining the resulting graph and suitable up- and downsampling matrices need to be computed.

While this approach gives special consideration to areas of interest, e.g., areas with high variability, and is appealing due to the smaller resulting models and the possibility to include error tolerances (for validation data), it also leads to a complicated framework. In addition, it introduces several design decisions, such as how to define the neighborhood of a refined node. Furthermore, no notable performance boost could be observed in our numerical experiments compared to our vanilla version, and the reduction in computational costs is minimal. Consequently, we only present the results of the more comprehensive vanilla approach in the following.

2.5.2 Unified latent representation

The multi-hierarchical representations of the original system not only enable multiple surrogate models to be trained one after another but can also be used for simultaneous training. In such an approach, the different latent representations for every level $\varvec{z}_l, \ 1\le l\le \ell $ could be exchanged for one unified description so that the model distinction only takes place in the decoders. This advantage is bought by the disadvantage of not being able to stop the refinement at any point and to learn global behavior very easily and fast in a simple representation.

3 Numerical example of a racing kart

Insights from physics-based high-fidelity models, like explicit structural dynamical FE simulations, are crucial for scenario-based testing and computer-aided engineering applications. One domain where scenario-based testing is particularly relevant is integral vehicle safety. While we do not aim to conduct an industry-relevant crash simulation in this paper, we aim to follow an approach in which the general procedure is similar to the one that usually occurs in such a setting, i.e under closed-source commercial software using explicit time stepping schemes and with limited data. Accordingly, we do not consider a full-scale vehicle model but a simplified frame of a racing kart that still offers aspects such as scenario variations with multiple parameter dependencies, nonlinear material and contact behavior with plastic deformations while being computationally tractable and easy to comprehend. In contrast to full-scale vehicle models, it lacks different zones like crumble zones for energy absorption, a safety cage around the occupants, or the occupants itself. The considered model, consequently, represents a complex and closer to application example than other frequently used ones in data-driven MOR and bridges the gap between classic academic examples and full-scale industrial models. Overall, the aim of this work is to develop a methodology that is capable of quickly and accurately approximating the dynamic behavior a structural dynamical system in a hierarchical fashion. Please note that a complete investigation of crashworthiness or the creation of surrogate models for such an investigation is accordingly not the subject of this paper.

3.1 A racing kart frontal collision simulation

The high-fidelity model considered in the following experiments represents the frame of a racing kart, which is pictured in Fig. 6a. The frame itself is responsible for the essential dynamic behavior of a kart [64] and is therefore interesting for the investigation of crash behavior. The remaining parts of the kart, like its wheels, vehicle shell, engine, and driver are replaced by point masses to render a more tractable model. Slight variations of this model have already been used in [19, 65]. The frame is realized as a finite element model in the commercial software tool LS-Dyna. It is constructed out of steel pipes which are modeled as thin-walled tubes using shell elements resulting in $n=9314$ nodes, each with $n_{c}^{(0)}=3$ translational degrees of freedom and the same amount of rotational ones.

For the task of creating a surrogate for the kart model, we are interested in approximating the kart’s behavior in a defined scenario parameterized by the simulation parameter $\varvec{\mu }$ and the time $t$. The considered scenario describes a frontal collision of the kart against a rigid wall under varying conditions. It should be noted that other scenarios, such as side- and rear-impact or grazing accidents, fall outside the scope of this work. However, they must be considered when conducting scenario-based testing and in-depth safety evaluations. Moreover, we limit the quantity of interest to the displacements as they define all occurring deformations and can serve as a starting point to generate other quantities of interest like stress (e.g. von Mises stress) using standard FEM tools as is done in [66]. A direct approximation of stress values through data-driven surrogate models is possible as well and has been investigated in a previous study for a continuum-mechanical musculoskeletal system [23]. However, this is outside the scope for the current study just like the approximation of other quantities like decelerations, energy absorption, or forces acting on occupants that are of interest for a thorough crashworthiness investigation. The initial conditions are neglected in the modeling process of the surrogates as all simulations start from the same initial condition. The displacement of the $p$-th node of the $s$-th simulation at time $t$ is denoted by $\varvec{q}_{p}^{s,}(t)=\left[ q_{p}^{s,x}(t), q_{p}^{s,y}(t), q_{p}^{s,z}(t)\right] \in \mathbb {R}^{3}$, where the superscripts $x,\ y,\ z$ represent the corresponding coordinate direction.

The parameters defining the frontal collision are the impact speed $\mu _1\in [5, 35]\,\hbox {m s}^{-1}$, impact angle $\mu _2\in [-45, 45]\,^{\circ }$, and yield stress $\mu _3\in [168, 758]\,\hbox {MPa}$. The impact angle describes the angle between the normal of the wall and the orientation of the kart whereas the yield stress impacts the effective plastic stress–strain curve of the kart’s material. The course of the curve corresponds to that of a typical steel for which the initial value is determined by the individual yield stress $\mu _3$ of each simulation, see Fig. 6c. Each crash simulation covers a simulation time of 30 ms and the simulation results are exported with a sampling time of 0.3 ms resulting in $\eta =101$ samples per simulation while the internal step size during simulation is much smaller and adaptively chosen. In total, $n_s=128$ quasi-random parameter combinations are sampled using Halton sequences. From the resulting high-fidelity simulation results, $n_s^{\text {train}}=96$ are used for the generation of the surrogate models and $n_s^{\text {test}}=32$ serve as test data. The simulation results are concatenated in the data matrix $\varvec{X}\subseteq \mathbb {R}^{N\times n_s^{\text {train}}\eta }$ consisting of the system states $\varvec{x}\in {\mathcal {X}}\subseteq \mathbb {R}^{N}$ at different times and simulations. Two example simulations are showcased in Fig. 6b. All simulation results as well as the kart’s source files are published and freely available under [67]. The major goal of our paper is to derive a surrogate model that can reproduce the high-fidelity simulation results of the kart in multiple resolutions with high accuracy and low computational times. Conventional approaches reach their limits in doing so due to the model’s complexity.

Model complexity To showcase the complexity of the presented kart simulation model in the context of MOR, we consider the course of the normalized singular values of the high-fidelity simulation results $\varvec{X}$ in Fig. 6d. The magnitude of each singular value reflects the importance of the corresponding reduced basis vector for describing the data. If a few singular values are dominant, then the data can be described well with a linear combination of only a few reduced basis vectors. If not, a non-negligible error is introduced or more basis vectors must be used. Accordingly, the singular values can serve as an indicator for the Kolmogorov n-width ( [21, 68])

$$\begin{aligned} d_n(\varvec{F}({\mathcal {M}})) :=\underset{{\mathcal {X}}_{n} \subseteq {\mathcal {X}}}{\inf } \ \underset{\varvec{x}\in {\mathcal {S}}_{{\mathcal {M}}}}{\sup } \ \underset{\tilde{\varvec{x}}_n\in {\mathcal {X}}_{n}}{\inf } \Vert \varvec{x}-\tilde{\varvec{x}}_n \Vert , \end{aligned}$$

which quantifies the optimal linear trial subspace by describing the largest distance between any point in the solution manifold ${\mathcal {S}}_{{\mathcal {M}}}$ for all parameters and all n-dimensional subspaces ${\mathcal {X}}_{n}\subseteq {\mathcal {X}}$. For the considered problem the intrinsic dimension of the solution space is at most equal to the number of parameters plus one for the time resulting in $r=4$. However, since not only the first four but also the subsequent singular values make significant contributions, it can be assumed that linear reduction methods as PCA lead to appreciable errors. Hence, we apply the proposed multi-hierarchic surrogate modeling scheme to the kart model for which it needs to be represented as a graph.

Representation as a Graph To represent the kart as a graph, we directly work with its FE formulation. The nodes of the FE model serve as vertices ${\mathcal {V}}$ of the graph and the element definitions specify the node connectivity, i.e. the edges ${\mathcal {E}}$ of the graph and the adjacency matrix $\varvec{A}$. The displacements serve as node features, i.e. they represent the system states $\varvec{x}:=\varvec{q}$, for the graph convolutional based surrogates. For the other surrogates that don’t operate on graphs, the displacements are vectorized, i.e. $\varvec{x}:=\left[ \varvec{q}_{1}^{s,}, \dots , \varvec{q}_{n}^{s}\right] ^\intercal \in \mathbb {R}^{3n}$. Consequently, the dataset

$$\begin{aligned} {\mathcal {D}}:=\begin{Bmatrix} \begin{bmatrix} t_0 & \dots & t_{\eta -1}\\ \varvec{\mu }_s& \dots & \varvec{\mu }_s\\ \end{bmatrix}, \begin{bmatrix} \varvec{x}^{s}(t_0)&\dots&\varvec{x}^{s}(t_{\eta -1}) \end{bmatrix} \end{Bmatrix}_{s=1}^{n_s^{\text {train}}}, \end{aligned}$$

(17)

contains the time $t$ and the parameters $\varvec{\mu }$ as input for the MLP and the corresponding displacements as target values.

4 Results & discussion

To highlight the performance of the proposed approach, we present numerical results for the aforementioned racing kart frontal collision simulation. The created surrogates are rated regarding their training phase, approximation quality in the coarse and the original representation, and their computing time. All following results, with exception of the finite element simulations were produced on an Apple M1 Max with a 10-Core CPU, 24-Core GPU, and 64 GB of RAM. To compare the proposed framework with more classic approaches, we generate surrogate models that follow the description of Sect. 2.3.2 but operate directly on the original (not downsampled) data and use either proper orthogonal decomposition, fully connected autoencoder or a graph convolutional autoencoder for the reduction step. We refer to them as PODNN, AENN, and GAENN, whereas the surrogates using the multi-hierarchical approach with graph convolutional autoencoders on the different levels are referred to as MH1, MH2, and MH3 (from finest to coarsest surrogate). The chosen architectures are listed in Table 1.

The MH encoders consist of several graph convolutional layers with ELU activation functions. Each graph convolution maintains the signal dimension $n$ but changes the number of channels $n_{c}$. The graph convolutions are followed by a dense layer with linear activation function to map the input to the latent dimension. The decoder follows the same architecture in reverse order. All dimensionality reduction networks (POD, AE, GAE, MH1, MH2, MH3) are combined with similar MLPs to predict the latent state based on the simulation parameters. Each one consists of several fully-connected layers with ELU activation function and a final dense layer with linear activation function. The surrogates are trained for 1500 epochs, and the weights with the lowest total loss Eq. (11a) are then used for subsequent predictions. Another noteworthy aspect showcased in Table 1 is that the graph convolutional networks possess much fewer parameters than a comparable multi-layer fully-connected networked architecture.

Table 1 Model Architectures

Full size table

4.1 Training comparison between fine and coarse models

Before we evaluate the actual performance of the surrogate models, let’s first take a glance into the training phase. For an overview of the data on which the different surrogate models are trained and which variables are optimized in this process, please refer to Table 2. A drawback of the used graph convolution is the associated computational cost that among others arises from the recursive computation of the Chebyshev polynomials. Consequently, the time to train a graph convolutional surrogate model on the full model significantly exceeds the training time of a classic fully-connected autoencoder as shown in Fig. 7a. If the surrogate is created using the multi-hierarchical approach on the contrary the tide turns. The training time is reduced to such an extent that the model operating on the coarsest representations trains even faster than the classical autoencoder on the full model. Even when adding up the training time of all three levels used for the kart example, the time is still in a comparable order of magnitude and is more than ten times faster than the GAENN on the full model.

Table 2 Overview of surrogate models showing which data is used and which parameters are optimized

Full size table

Considering the computing time required $\Delta T$ for one prediction of the surrogates, a similar picture emerges as depicted in Fig. 7b. The GAENN requires by far the most time but our approach can substantially mitigate this effect. A surrogate that just uses POD for the dimensionality reduction outperforms the other models as the computation of the reconstruction to the fine physical space only requires one matrix multiplication. Regarding the MH approach, the training and computational time logically increases with every level as the degree of resolution rises. Noteworthy in this context is that the difference in time to receive a prediction for the fine original representation is not much higher than that one of the coarse representations. This is owed to the fact that the upsampling follows Eq. (4) and consequently only requires a sparse matrix multiplication.

In addition to computation times, training the MH models provides additional insights. For an evaluation of the transfer learning, we consider the progression of the loss during training in Fig. . It becomes apparent that all losses drop significantly lower with each refinement of the models. Consequently, the saved information of the coarser models helps the finer ones to improve their performance and avoids that already known structures have to be learned twice. Especially, the reconstruction benefits greatly from the transfer learning, see Fig. 8c, but the overall approximation gets better with each level as well, see Fig. 8b.

4.2 Evaluation on coarse levels

In the final comparison of the surrogate models, the performance metrics are always measured in the original discretization of the model so that a comparison between the MH models and models that are trained on the original data directly is possible. However, there are two reasons why it is worth considering the performance in the coarse discretizations prior to this final consideration: On the one hand, because the MH models are intended to be evaluated in the coarse representation and on the other hand because in this way the error induced by the upsampling into the original space does not appear in the evaluation. Furthermore, we can compare how the graph convolutional MH models perform in comparison to standard surrogate models on each level and thus justify their use.

For a comparison of the performance we utilize the averaged Euclidean distance between the nodes of the reference FE simulation and their approximation

$$\begin{aligned} e_{2}^{s}(t) = \frac{1}{n}\sum _{p=1}^{n} |q_{p}^{s}(t) - \tilde{q}_{p}^{s}(t) |_2 \end{aligned}$$

at time $t$ of the $s$-th simulation as well as the maximum occurring Euclidean distance among all nodes

$$\begin{aligned} e_{2, \max }^{s}(t) = \max _{p\in \{1,\dots n\}} |q_{p}^{s}(t) - \tilde{q}_{p}^{s}(t) |_2 . \end{aligned}$$

Moreover, the mean value over the time and all test simulations

$$\begin{aligned} \hat{e_{\text {2}}}= \underset{s\in \{1, \dots , n_s^{\text {test}} \}}{\text {mean}}\, \underset{t\in {\mathcal {T}}}{\text {mean}}\, e_{2}^{s}(t) \end{aligned}$$

as well as the corresponding mean maximum error

$$\begin{aligned} {\hat{e}}_{2}^{\max } = \underset{s\in \{1, \dots , n_s^{\text {test}} \}}{\text {mean}}\, \underset{t\in {\mathcal {T}}}{\text {mean}}\, e_{2, \max }^{s}(t) \end{aligned}$$

are used to represent the approximation quality for the complete test data. Please note that capturing other quantities that are of interest in classic crashworthiness investigations is out of scope for this study and the mean node error serves as easy to comprehend measure to compare our proposed method to other surrogate modeling techniques.

The first investigation on the coarse meshes is conducted to emphasize the hypothesis stated at the beginning; that the degrees of freedom result from the necessity of the modeling method. Therefore, finite element models that only differ in their discretization are generated, and the same scenario is simulated for all of them. As shown in Fig. 9, the node distance between the coarse FE models and the reference position of the corresponding selected nodes in the original mesh is far apart. The results reveal qualitatively different dynamic behavior and confirm the need for a fine resolution using the finite element method. Note that the coarse FE models are only produced with the presented downsampling approach and not with a proper mesh simplification method for finite element models. Nevertheless, the results show of how much conventional methods rely on a fine resolution.

Considering the node distance error of the MH models and PODNN surrogates on the different discretizations, two essential points are noteworthy. On the one hand, the all MH models outperform the POD-based surrogate. Accordingly, the graph convolutional architecture works well on the coarsest representation and justifies its use compared to other similarly applicable architectures. On the other hand, the error decreases with every additional level for the MH models. In an error view of the original fine resolution of the kart, this is not surprising, since the upsampling error decreases with each level. On the coarse discretizations, however, this clearly indicates that transfer learning helps to lower the error at each level.

It is important to emphasize which dynamic effects are learned at which level. For a visual illustration, the learned behavior at each level for an example simulation is given in Fig. . It showcases the models approximation subtracting the already existing prediction of the coarse levels, i.e.

$$\begin{aligned} \tilde{\varvec{x}}_{\ell \text {-}1}^{0} - \tilde{\varvec{x}}_{\ell }^{0}&= \\&\mathcal {\varvec{U}}_{l\text {-}1}^{0} \varvec{\Psi }_{\text {dec}, \ell \text {-}1}(\varvec{\Phi }_{\ell \text {-}1}(\varvec{\mu }, t)) - \mathcal {\varvec{U}}_{l}^{0}\mathcal {\varvec{U}}_{\ell }^{\ell \text {-}1} \varvec{\Psi }_{\text {dec}, \ell }(\varvec{\Phi }_{\ell }(\varvec{\mu }, t)) . \end{aligned}$$

Clearly, the global dynamic behavior is already captured in the coarsest surrogate (Level 3) where a strong deflection of the front fork and a rotation of the entire kart occurs. In the the finer ones (Level 2 and Level 1) minor deformations (especially in areas where the coarser levels lack degrees of freedom) are captured to compensate for local errors.

4.3 Approximation quality

In a final comparison, we validate the surrogates’ performance in the original model discretization. We refer to Fig. , where the mean as well as the maximum node distance error over time is shown for the different models. Interestingly, the graph convolutional autoencoder-based surrogate without multi-hierarchical structure fails to capture the dynamics and consequently has the largest error. The surrogate using linear reduction in form of the POD struggles to approximate the intervals of high dynamics as only $r=4$ reduced basis vectors are not expressive enough to describe all complex deformations occurring in the simulations.

The AENN surrogate model relying on a classic autoencoder already shows promising results indicating the benefits of a nonlinear dimensionality reduction. Nevertheless, even the coarsest MH model beats it in average, and the error decreases with each subsequent finer level considering the mean Euclidean distance although the performance increase subsides. For the maximum error, these observations do indeed change. The coarsest model is not able to beat the AENN surrogate model and only the subsequent finer models lead to a superior performance. Interestingly, the performance boost does not stagnate as more levels are added, but the maximum error continues to decrease significantly. This suggests that even if the overall performance does not increase significantly after a certain level of detail at finer resolutions, highly error-prone areas still benefit greatly from adding more detail. The most important performance indicators are provided in Table 3 to provide the main results at a glance.

Table 3 Performance measurements

Full size table

4.4 Discussion

Our results show that the proposed multi-hierarchical surrogate modeling scheme is suitable for creating various reduced order models for the considered kart simulation model. We captured the transient dynamics, including massive plastic deformations of the considered kart’s frame resulting from nonlinear contact under multiple parameter dependencies. In particular, our method outperforms standard approaches regarding accuracy while still maintaining competitive computational costs and possessing less parameters. Moreover, the MH surrogates can directly operate on the coarse (less memory-demanding) representations of the system that are still visually interpretable making them suited for graphical application in hardware-restricted use cases. Nevertheless, their predictions can be lifted into the original system description without adding much computational effort by simple sparse matrix multiplication. We could show that the global dynamic behavior occurring in the investigated crash scenario is already captured in the coarsest surrogate and that the finer ones only need to learn microscale effects. Along with this, we were able to determine that the surrogates accuracy increases with each refinement even in the coarse domains. This effect is also reflected in the course of the loss during the networks’ training phase where the loss dropped significantly faster for the finer models. Those observations lead to the conclusion that the transfer learning helps the models to converge closer to the reference solution. Furthermore, the proposed architecture offers multiple points for adjustments and extensions as stated in Sect. 2.5. However, those benefits are gained at the expense of a few disadvantages and limitations.

Our approach requires knowledge about the internal (geometrical) structure of a given system. Consequently, data alone is not enough. Furthermore, the model simplification is performed based on the spatial properties of a given system so that other quantities of interest might be lost or must be acknowledged in the mesh simplification process. This process itself adds computational effort to the offline phase, which is negligible compared to the training effort for the networks. Additionally, many new design choices and hyperparameters are added by the multi-hierarchical architecture and the use of graph convolutions. This complicates the surrogate modeling process compared to more straightforward approaches like POD in combination with neural networks. However, even without extensive fine tuning the MH models are able to beat the conventional methods.

As the proposed framework is substantially build upon the available high-fidelity data, the results heavily depend on its quality and extrapolation can lead to major challenges. To circumvent this issue, the consideration of low-fidelity data to improve the surrogate models is an interesting future research direction. In our current approach, all hierarchical models are derived from the high-fidelity data only. However, it is considerable to use low-fidelity FE models for parameters living outside the considered training data. Those low-fidelity models could be obtained from the coarse meshes and their results could be incorporated into the surrogate models, similar to multi-fidelity approaches that use cheap low-fidelity models to improve high-fidelity predictions [69, 70] and learn the resulting residual [71], for example.

Another decision worth discussing is the choice of graph convolutions for dimensionality reduction. The multi-hierarchical framework itself works with arbitrary data-driven reduction methods and consequently the GCNNs can be replaced with other methods as well. Nevertheless, as the mesh simplification already operates on graphs, it is an obvious choice to use this structure in the data to gain benefits. As shown in the results, the graph convolutional based surrogate on the coarse mesh beats a linear reduction technique by far, even when, in this coarse representation, no transfer learning takes place. Furthermore, the graph convolutions are an architecture that can benefit a lot from the multi-hierarchical approach. The computational time savings have a much greater impact as the used convolutions are computationally expensive per se. As the convolutions use parameter sharing along the filters, the networks using them require less trainable parameters but still reveal a better expressiveness of the data within our framework. Interestingly, a graph convolutional-based surrogate operating on the original fine mesh failed to capture the system adequately which may be caused by oversmoothing issues [72, 73], the difficulty to transport information over distant nodes in such a fine mesh [74], and the spectral bias [75]. Consequently, the MH approach not only facilitates a successful learning process but makes it possible in the first place.

5 Conclusion

In this paper, we derived a structured surrogate modeling scheme producing efficient yet accurate models for a kart simulation model in a frontal collision scenario despite its complexity and inaccessible source code. The surrogates require only as many parameters as other state-of-the-art linear counterparts, while outperforming even conventional nonlinear data-driven competitors. To achieve this, our scheme operates on various representations of the kart model with different resolutions instead of relying on a single high-resolution discretization. This naturally facilitates the approximation of multiscale effects as global dynamics can be learned at coarse resolutions, while microscale dynamics are captured at finer versions. In addition, we use low-resolution approximations to ease the learning process and improve accuracy of medium- and fine-resolution approximators by transferring knowledge across levels so that finer models only need to capture residuals. Sparse matrix multiplications or adaptive upsampling networks are used to switch between resolutions.

The surrogates on a single level are built of graph convolutional autoencoders for discovery of suitable low-dimensional representations of the data and fully connected neural networks that cover the parameter-dependent latent dynamics. In doing so, the resulting surrogate models have a satisfying accuracy despite the comparably low number of parameters. The hierarchical approach also speeds up the learning process for the graph convolutional surrogates as it eliminates the need to work with the original fine resolution data and creates multiple models with varying memory and computational demands, all operating in visually and physically interpretable domains.

However, the involved mesh simplification process is based on spatial criteria, and thus, other information may be lost in the process. Moreover, similar to other nonlinear reduction techniques, it shows its advantages, especially when the system is reduced to its intrinsic size. For large latent spaces, conventional linear methods can still achieve competitive results. For a thorough investigation of crashworthiness, the suitability of the presented method for full-scale car crash simulation models must be investigated in the future and at the same time attention must be paid to more holistic surrogates. This means that a wider range of scenarios as well as more quantities such as decelerations and forces must be considered in addition to the deformations. Another limitation it shares with data-driven reduced order models is the lack of extrapolation quality. To remedy this disadvantage, low-fidelity FE models, which may come directly from coarser discretizations, can be embedded in the future for parameter combinations outside the training data. This eliminates the drawback of the current framework to being only based on expensive high-fidelity data. Moreover, to continue this promising path, more recent graph convolutional architectures can be used and all hierarchical models can be covered with a single latent variable.

Data availability

The data that support the findings of this study are openly available in DaRUS [67].

References

Kramer F, Franz U (2023) Integrale Sicherheit von Kraftfahrzeugen: Biomechanik—Unfallvermeidung—Insassenschutz—Sensorik—Sicherheit im Entwicklungsprozess. Springer Fachmedien Wiesbaden
Noack BR, Morzynski M, Tadmor G (2011) Reduced-order modelling for flow control. vol. 528. Springer Science & Business Media
Benner P, Gugercin S, Willcox K (2015) A survey of projection-based model reduction methods for parametric dynamical systems. SIAM Rev 57(4):483–531
Article MathSciNet Google Scholar
Taira K, Brunton SL, Dawson S, Rowley CW, Colonius T, McKeon BJ et al (2017) Modal analysis of fluid flows: an overview. AIAA J 55(12):4013–4041
Article Google Scholar
Taira K, Hemati MS, Brunton SL, Sun Y, Duraisamy K, Bagheri S et al (2020) Modal analysis of fluid flows: applications and outlook. AIAA J 58(3):998–1022
Article Google Scholar
Brunton SL, Noack BR, Koumoutsakos P (2020) Machine learning for fluid mechanics. Annual Rev Fluid Mech 52:477–508
Article MathSciNet Google Scholar
Brunton SL, Kutz JN (2022) Data-driven science and engineering: machine learning, dynamical systems, and control. 2nd ed. Cambridge University Press
Volkwein S (2022) Proper orthogonal decomposition: theory and reduced-order modelling. http://www.math.uni-konstanz.de/numerik/personen/volkwein/teaching/POD-Book.pdf. Accessed 04. August 2022
Chen PY, Xiang J, Cho DH, Chang Y, Pershing GA, Maia HT, et al. CROM: continuous reduced-order modeling of PDEs using implicit neural representations. arXiv. 2206.02607
Rodriguez SN, Iliopoulos AP, Carlberg KT, Brunton SL, Steuben JC, Michopoulos JG (2022) Projection-tree reduced-order modeling for fast N-body computations. J Comput Phys 459:111141
Article MathSciNet Google Scholar
Li Z, Kovachki N, Azizzadenesheli K, Liu B, Bhattacharya K, Stuart A, et al (2020) Fourier neural operator for parametric partial differential equations. arXiv preprint arXiv:2010.08895
Li Z, Kovachki N, Azizzadenesheli K, Liu B, Bhattacharya K, Stuart A, et al (2020) Neural operator: graph kernel network for partial differential equations. arXiv preprint arXiv:2003.03485
Zhou J, Cui G, Hu S, Zhang Z, Yang C, Liu Z et al (2020) Graph neural networks: a review of methods and applications. AI Open 1:57–81. https://doi.org/10.1016/j.aiopen.2021.01.001
Article Google Scholar
Liu Y, Ponce C, Brunton SL, Kutz JN (2023) Multiresolution convolutional autoencoders. J Comput Phys 2(474):111801. https://doi.org/10.1016/j.jcp.2022.111801
Article MathSciNet Google Scholar
Ronneberger O, Fischer P, Brox T (2015) U-net: convolutional networks for biomedical image segmentation. In: Medical image computing and computer-assisted intervention–MICCAI 2015: 18th international conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18. Springer; p. 234–241
Hesthaven JS, Ubbiali S (2018) Non-intrusive reduced order modeling of nonlinear problems using neural networks. J Comput Phys 6(363):55–78. https://doi.org/10.1016/j.jcp.2018.02.037
Article MathSciNet Google Scholar
Wang Q, Hesthaven JS, Ray D (2019) Non-intrusive reduced order modeling of unsteady flows using artificial neural networks with application to a combustion problem. J Comput Phys 5(384):289–307. https://doi.org/10.1016/j.jcp.2019.01.031
Article MathSciNet Google Scholar
Guo M, Hesthaven JS (2019) Data-driven reduced order modeling for time-dependent problems. Comput Methods Appl Mech Eng 345:75–99. https://doi.org/10.1016/j.cma.2018.10.029
Article MathSciNet Google Scholar
Kneifl J, Grunert D, Fehr J (2021) A nonintrusive nonlinear model reduction method for structural dynamical problems based on machine learning. Int J Numer Methods Eng 122(17):4774–4786. https://doi.org/10.1002/nme.6712
Article Google Scholar
Kneifl J, Hay J, Fehr J (2022) Real-time human response prediction using a non-intrusive data-driven model reduction scheme. IFAC-PapersOnLine 55(20):283–288. https://doi.org/10.1016/j.ifacol.2022.09.109
Article Google Scholar
Lee K, Carlberg KT (2020) Model reduction of dynamical systems on nonlinear manifolds using deep convolutional autoencoders. J Comput Phys 404:108973. https://doi.org/10.1016/j.jcp.2019.108973
Article MathSciNet Google Scholar
Fresca S, Dede’ L, Manzoni A (2021) A comprehensive deep learning-based approach to reduced order modeling of nonlinear time-dependent parametrized PDEs. J Sci Comput. https://doi.org/10.1007/s10915-021-01462-7
Kneifl J, Rosin D, Avci O, Rohrle O, Fehr J (2023) Low-dimensional data-based surrogate model of a continuum-mechanical musculoskeletal system based on non-intrusive model order reduction. Arch Appl Mech 93(9):3637–3663. https://doi.org/10.1007/s00419-023-02458-5
Article Google Scholar
Peherstorfer B (2022) Breaking the Kolmogorov barrier with nonlinear model reduction. Notices Am Math Soc 69(05):1. https://doi.org/10.1090/noti2475
Article MathSciNet Google Scholar
Fresca S, Manzoni A (2022) POD-DL-ROM: enhancing deep learning-based reduced order models for nonlinear parametrized PDEs by proper orthogonal decomposition. Comput Methods Appl Mech Eng 1(388):114181. https://doi.org/10.1016/j.cma.2021.114181
Article MathSciNet Google Scholar
Gonzalez FJ, Balajewicz M. Deep convolutional recurrent autoencoders for learning low-dimensional feature dynamics of fluid systems. arXiv:1808.01346
Maulik R, Lusch B, Balaprakash P (2021) Reduced-order modeling of advection-dominated systems with recurrent neural networks and convolutional autoencoders. Physics of Fluids. https://doi.org/10.1063/5.0039986
Article Google Scholar
Stoffel M, Bamer F, Markert B (2020) Deep convolutional neural networks in structural dynamics under consideration of viscoplastic material behaviour. Mech Res Commun 108:103565. https://doi.org/10.1016/j.mechrescom.2020.103565
Article Google Scholar
Bamer F, Thaler D, Stoffel M, Markert B (2021) A Monte Carlo simulation approach in non-linear structural dynamics using convolutional neural networks. Front Built Environ. https://doi.org/10.3389/fbuil.2021.679488
Article Google Scholar
Meneghetti L, Demo N, Rozza G (2023) A dimensionality reduction approach for convolutional neural networks. Appl Intell 53(19):22818–22833. https://doi.org/10.1007/s10489-023-04730-1
Article Google Scholar
Bronstein MM, Bruna J, Cohen T, Veličković P (2021) Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges. arXiv:2104.13478
Gao H, Sun L, Wang JX (2021) PhyGeoNet: physics-informed geometry-adaptive convolutional neural networks for solving parameterized steady-state PDEs on irregular domain. J Comput Phys 3(428):110079. https://doi.org/10.1016/j.jcp.2020.110079
Article MathSciNet Google Scholar
Wang Y, Sun Y, Liu Z, Sarma SE, Bronstein MM, Solomon JM (2019) Dynamic graph CNN for learning on point clouds. ACM Trans Graph. 38(5):1–12. https://doi.org/10.1145/3326362
Monti F, Boscaini D, Masci J, Rodola E, Svoboda J, Bronstein MM (2017) Geometric deep learning on graphs and manifolds using mixture model CNNs. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR)
Defferrard M, Bresson X, Vandergheynst P. Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering. In: Lee D, Sugiyama M, Luxburg U, Guyon I, Garnett R, editors. Advances in Neural Information Processing Systems. vol. 29. Curran Associates, Inc.; 2016. Available from: https://proceedings.neurips.cc/paper_files/paper/2016/file/04df4d434d481c5bb723be1b6df1ee65-Paper.pdf
Kipf TN, Welling M (2016) Semi-Supervised Classification with Graph Convolutional Networks. arXiv:1609.02907
Gruber A, Gunzburger M, Ju L, Wang Z (2022) A comparison of neural network architectures for data-driven reduced-order modeling. Comput Methods Appl Mech Eng 4(393):114764. https://doi.org/10.1016/j.cma.2022.114764
Article MathSciNet Google Scholar
Chen M, Wei Z, Huang Z, Ding B, Li Y (2020) Simple and deep graph convolutional networks. In: III HD, Singh A (eds) Proceedings of the 37th international conference on machine learning. vol. 119 of proceedings of machine learning research. PMLR; p. 1725–1735. Available from: https://proceedings.mlr.press/v119/chen20v.html
Pichi F, Moya B, Hesthaven JS (2024) A graph convolutional autoencoder approach to model order reduction for parametrized PDEs. J Comput Phys. https://doi.org/10.1016/j.jcp.2024.112762
Article MathSciNet Google Scholar
Franco NR, Fresca S, Tombari F, Manzoni A (2023) Deep learning-based surrogate models for parametrized PDEs: handling geometric variability through graph neural networks. arXiv:2308.01602
Grattarola D, Zambon D, Bianchi FM, Alippi C (2021) Understanding Pooling in Graph Neural Networks. arXiv:2110.05292. [cs.LG]
Wu Z, Pan S, Chen F, Long G, Zhang C, Yu PS (2021) A Comprehensive Survey on Graph Neural Networks. IEEE Trans Neural Netw Learn Syst 32(1):4–24. https://doi.org/10.1109/tnnls.2020.2978386
Article MathSciNet Google Scholar
Battaglia PW, Hamrick JB, Bapst V, Sanchez-Gonzalez A, Zambaldi V, Malinowski M, et al (2018) Relational inductive biases, deep learning, and graph networks. arXiv:1806.01261. [cs.LG]
Pfaff T, Fortunato M, Sanchez-Gonzalez A, Battaglia PW (2020) Learning Mesh-Based Simulation with Graph Networks. International Conference on Learning Representations (ICLR), 2021. arXiv:2010.03409. [cs.LG]
Sanchez-Gonzalez A, Godwin J, Pfaff T, Ying R, Leskovec J, Battaglia P (2020) Learning to simulate complex physics with graph networks. In: International conference on machine learning. PMLR; p. 8459–8468
Cranmer MD, Xu R, Battaglia P, Ho S. Learning symbolic physics with graph networks. arXiv:1909.05862. [cs.LG]
Cranmer M, Sanchez Gonzalez A, Battaglia P, Xu R, Cranmer K, Spergel D et al (2020) Discovering symbolic models from deep learning with inductive biases. In: Larochelle H, Ranzato M, Hadsell R, Balcan MF, Lin H (eds) Advances in neural information processing systems, vol 33. Inc, Curran Associates, pp 17429–17442
Google Scholar
Li Y, Vinyals O, Dyer C, Pascanu R, Battaglia P (2018) Learning deep generative models of graphs. arXiv:1803.03324. [cs.LG]
Min E, Chen R, Bian Y, Xu T, Zhao K, Huang W, et al (2022) Transformer for graphs: an overview from architecture perspective. arXiv:2202.08455. [cs.LG]
Bryutkin A, Huang J, Deng Z, Yang G, Schonlieb CB, Aviles-Rivero A. HAMLET: Graph Transformer Neural Operator for Partial Differential Equations. arXiv:2402.03541. [cs.LG]
Gao H, Ji S (2021) Graph U-Nets. IEEE Trans Patt Anal Mach Intell. https://doi.org/10.1109/tpami.2021.3081010
Fortunato M, Pfaff T, Wirnsberger P, Pritzel A, Battaglia P (2022) MultiScale MeshGraphNets. 2nd AI4Science workshop at the 39th international conference on machine learning (ICML). arXiv:2210.00612. [cs.LG]
Jain A, Haghighat E, Nelaturi S (2024) LatticeGraphNet: A two-scale graph neural operator for simulating lattice structures. arXiv:2402.01045. [cs.LG]
Han X, Gao H, Pfaff T, Wang JX, Liu LP (2022) Predicting physics in mesh-reduced space with temporal attention. arXiv:2201.09113. [cs.LG]
Lee S, Lee S, Jang K, Cho H, Shin S (2023) Data-driven Nonlinear Parametric Model Order Reduction Framework using Deep Hierarchical Variational Autoencoder. arXiv:2307.06816. [cs.LG]
Liu Y, Kutz JN, Brunton SL (2022) Hierarchical deep learning of multiscale differential equation time-steppers. In: Philosophical transactions of the royal society a: mathematical, physical and engineering sciences. https://doi.org/10.1098/rsta.2021.0200
In: McCormick SF (ed) Appendix 2: Multigrid Bibliography; p. 187–278. Available from: https://epubs.siam.org/doi/abs/10.1137/1.9781611971057.appb
Trottenberg U, Oosterlee CW, Schuller A (2000) Multigrid. Elsevier
He J, Xu J (2019) MgNet: a unified framework of multigrid and convolutional neural network. Sci China Math. 62(7):1331–1354. https://doi.org/10.1007/s11425-019-9547-2
Article MathSciNet Google Scholar
Czech C, Lesjak M, Bach C, Duddeck F (2022) Data-driven models for crashworthiness optimisation: intrusive and non-intrusive model order reduction techniques. Struct Multidiscip Optim. https://doi.org/10.1007/s00158-022-03282-1
Lesjak M, Duddeck F (2023) Dimensional reduction for parametric projection- based reduced- order models in crash. PAMM. https://doi.org/10.1002/pamm.202300063
Article Google Scholar
Garland M, Heckbert PS (1997) Surface simplification using quadric error metrics. In: Proceedings of the 24th annual conference on Computer graphics and interactive techniques—SIGGRAPH ’97. ACM Press
Ranjan A, Bolkart T, Sanyal S, Black MJ (2018) Generating 3D faces using convolutional mesh autoencoders. In: Proceedings of the European conference on computer vision (ECCV) p. 704–720
Shiiba T, Fehr J, Eberhard P (2012) Flexible multibody simulation of automotive systems with non-modal model reduction techniques. Vehicle Syst Dyn 50(12):1905–1922. https://doi.org/10.1080/00423114.2012.700403
Article Google Scholar
Fehr J, Holzwarth P, Eberhard P (2016) Interface and model reduction for efficient explicit simulations—a case study with nonlinear vehicle crash models. Math Comput Modell Dyn Syst 22(4):380–396. https://doi.org/10.1080/13873954.2016.1198385
Article MathSciNet Google Scholar
Fresca S, Gobat G, Fedeli P, Frangi A, Manzoni A (2022) Deep learning-based reduced order models for the real-time simulation of the nonlinear dynamics of microstructures. Int J Numer Meth Eng 123(20):4749–4777. https://doi.org/10.1002/nme.7054
Article MathSciNet Google Scholar
Kneifl J, Fehr J (2024) Crash simulation of the structural frame of a racing kart colliding against a rigid wall. DaRUS. Data
Unger B, Gugercin S (2019) Kolmogorov n-widths for linear dynamical systems. Adv Comput Math 45(5):2273–2286. https://doi.org/10.1007/s10444-019-09701-0
Article MathSciNet Google Scholar
Kast M, Guo M, Hesthaven JS (2020) A non-intrusive multifidelity method for the reduced order modeling of nonlinear problems. Comput Methods Appl Mech Eng 364:112947. https://doi.org/10.1016/j.cma.2020.112947
Conti P, Guo M, Manzoni A, Frangi A, Brunton SL, Kutz JN (2023) Multi-fidelity reduced-order surrogate modeling. arXiv:2309.00325. [cs.LG]
Demo N, Tezzele M, Rozza G (2023) A DeepONet multi-fidelity approach for residual learning in reduced order modeling. Adv Model Simul Eng Sci. https://doi.org/10.1186/s40323-023-00249-9
Article Google Scholar
Chen D, Lin Y, Li W, Li P, Zhou J, Sun X (2020) Measuring and relieving the over-smoothing problem for graph neural networks from the topological view. In: Proceedings of the AAAI conference on artificial intelligence. 34(04):3438–3445. https://doi.org/10.1609/aaai.v34i04.5747
Rusch TK, Bronstein MM, Mishra S (2023) A Survey on Oversmoothing in Graph Neural Networks. arXiv:2303.10993. [cs.LG]
Alon U, Yahav E (2020) On the Bottleneck of Graph Neural Networks and its Practical Implications. arXiv:2006.05205. [cs.LG]
Rahaman N, Baratin A, Arpit D, Draxler F, Lin M, Hamprecht F, et al (2019) On the Spectral Bias of Neural Networks. In: Chaudhuri K, Salakhutdinov R (eds) Proceedings of the 36th international conference on machine learning. vol. 97 of proceedings of machine learning research. PMLR. p. 5301–5310. Available from: https://proceedings.mlr.press/v97/rahaman19a.html

Download references

Acknowledgements

Funded by Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Germany’s Excellence Strategy - EXC 2075 - 390740016. We acknowledge the support by the Stuttgart Center for Simulation Science (SimTech). Furthermore, the authors would like to thank the Ministry of Science, Research and Arts of the Federal State of Baden-Württemberg for the financial support within the InnovationsCampus Future Mobility. We also wish to acknowledge the support of the National Science Foundation AI Institute in Dynamic Systems grant 2112085.

Funding

The bold numbers indicate the best performance

Author information

Authors and Affiliations

Institute of Engineering and Computational Mechanics, University of Stuttgart, 70569, Stuttgart, Baden-Württemberg, Germany
Jonas Kneifl & Jörg Fehr
Department of Mechanical Engineering, University of Washington, Seattle, WA, 98195, USA
Steven L. Brunton
Department of Applied Mathematics and Electrical and Computer Engineering, University of Washington, Seattle, WA, 98195, USA
J. Nathan Kutz

Authors

Jonas Kneifl
View author publications
You can also search for this author in PubMed Google Scholar
Jörg Fehr
View author publications
You can also search for this author in PubMed Google Scholar
Steven L. Brunton
View author publications
You can also search for this author in PubMed Google Scholar
J. Nathan Kutz
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

J.K.: conceptualization, data curation, investigation, methodology, software, validation, visualization, writing–original draft, writing-review and editing. J.F.: conceptualization, funding acquisition, methodology, project administration, supervision, writing-review and editing. J.N.K.: conceptualization, funding acquisition, methodology, project administration, supervision, writing-review and editing. S.L.B.: conceptualization, funding acquisition, methodology, project administration, supervision, writing-review and editing.

Corresponding author

Correspondence to Jonas Kneifl.

Ethics declarations

Conflict of interest

The authors have no relevant financial or non-financial interests to disclose.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Kneifl, J., Fehr, J., Brunton, S.L. et al. Multi-hierarchical surrogate learning for explicit structural dynamical systems using graph convolutional neural networks. Comput Mech (2024). https://doi.org/10.1007/s00466-024-02553-6

Download citation

Received: 17 April 2024
Accepted: 07 September 2024
Published: 05 October 2024
DOI: https://doi.org/10.1007/s00466-024-02553-6

Multi-hierarchical surrogate learning for explicit structural dynamical systems using graph convolutional neural networks

Abstract

Similar content being viewed by others

Mesh-based GNN surrogates for time-independent PDEs

Assessing the Quality of Real-Time Hybrid Simulation Tests with Deep Learning Models

Deep Learning Enhanced Snapshot Generation for Efficient Hyper-reduction in Nonlinear Structural Dynamics

1 Introduction

1.1 State-of-the-art

1.2 Main contributions

1.3 Structure

2 Multi-hierarchical surrogate modeling approach

2.1 Problem setup

2.2 Down- and upsampling

2.3 Surrogate modeling

2.3.1 Graph convolutional neural networks

2.3.2 Network architecture: a graph convolutional autoencoder with a multilayer perceptron

2.4 Transfer learning

2.4.1 Multi-hierarchical model

2.5 Alternative architectures

2.5.1 Adaptive refinement

2.5.2 Unified latent representation

3 Numerical example of a racing kart

3.1 A racing kart frontal collision simulation

4 Results & discussion

4.1 Training comparison between fine and coarse models

4.2 Evaluation on coarse levels

4.3 Approximation quality

4.4 Discussion

5 Conclusion

Data availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation