Nothing Special   »   [go: up one dir, main page]

Bayesian Tomography Using Polynomial Chaos Expansion and Deep Generative Networks

Download as pdf or txt
Download as pdf or txt
You are on page 1of 18

Geophysical Journal International Royal

Astronomical
Society

Geophys. J. Int. (2024) 237, 31–48 https://doi.org/10.1093/gji/ggae026


Advance Access publication 2024 January 18
GJI General Geophysical Methods

Bayesian tomography using polynomial chaos expansion and deep


generative networks

Giovanni Angelo Meles ,1 Macarena Amaya ,1 Shiran Levy ,1 Stefano Marelli2 and
Niklas Linde1
1 Institute
of Earth Sciences, Department of Applied and Environmental Geophysics, University of Lausanne, 1015 Lausanne, Switzerland.
E-mail: Giovanni.Meles@unil.ch
2 Department of Civil, Environmental and Geomatic Engineering, ETH Zurich, Institute of Structural Engineering, 8093 Zurich, Switzerland

Accepted 2024 January 17. Received 2023 December 19; in original form 2023 October 10

Downloaded from https://academic.oup.com/gji/article/237/1/31/7577609 by guest on 02 April 2024


SUMMARY
Implementations of Markov chain Monte Carlo (MCMC) methods need to confront two fun-
damental challenges: accurate representation of prior information and efficient evaluation of
likelihood functions. The definition and sampling of the prior distribution can often be facili-
tated by standard dimensionality-reduction techniques such as Principal Component Analysis
(PCA). Additionally, PCA-based decompositions can enable the implementation of accurate
surrogate models, for instance, based on polynomial chaos expansion (PCE). However, intri-
cate geological priors with sharp contrasts may demand advanced dimensionality-reduction
techniques, such as deep generative models (DGMs). Although suitable for prior sampling,
these DGMs pose challenges for surrogate modelling. In this contribution, we present a MCMC
strategy that combines the high reconstruction performance of a DGM in the form of a vari-
ational autoencoder with the accuracy of PCA–PCE surrogate modelling. Additionally, we
introduce a physics-informed PCA decomposition to improve accuracy and reduce the com-
putational burden associated with surrogate modelling. Our methodology is exemplified in the
context of Bayesian ground-penetrating radar traveltime tomography using channelized sub-
surface structures, providing accurate reconstructions and significant speed-ups, particularly
when the computation of the full-physics forward model is costly.
Key words: Ground penetrating radar; Machine Learning; Numerical modelling; Probability
distributions; Tomography.

lead to comparatively small changes in the data response. In many


1 I N T RO D U C T I O N
practical implementations, it is advantageous to rely on another
Bayesian inversion methods can account for data and modelling parametrization than the one used for physics-based modelling and
uncertainties as well as prior knowledge, thus, representing an at- visualization, that is, typically a pixel-based parametrization (Asher
tractive approach for tomography and its uncertainty quantification. et al. 2015; Alizadeh et al. 2020). Physical media are typically as-
Nevertheless, the difficulties in specifying appropriate prior distri- sociated with points in R N , where N is the number of elements in
butions and the high computational burden associated with repeated the corresponding pixel-based representation. While allowing easy
forward model evaluations often hinder proper implementations of implementation of forward modelling schemes [e.g. Finite Differ-
Bayesian tomography (Chipman et al. 2001). In geophysical set- ence (FD) based on partial differential equations], pixel-based N-
tings, prior distributions have traditionally been specified by as- dimensional parametrizations are often not suitable to effectively
suming the subsurface to be represented by a Gaussian random parametrize the prior distribution, as N can be very large. When
field. More advanced options are made possible by relying on the prior knowledge suggests constrained spatial patterns, such as co-
information content of training images (TI), that is, large gridded variance or connected spatial structures, the prior-compatible mod-
2-D or 3-D unconditional representations of the expected target els populate manifolds embedded in R N . If this manifold can be
spatial field that can be either continuous or categorical (Mariethoz locally assimilated to a subset of R M , with M  N, the actual
& Caers 2014; Laloy et al. 2017, 2018). Bayesian inversion needs prior distribution reduces to a function of M variables only, which
not only a parametrization of the prior that accurately represents effectively implies an M-dimensional forward/inverse problems.
the prior information, but also one that is easy to manipulate and Various approaches can be used to achieve manifold identifica-
one with a an underlying distribution for which small perturbations tion through dimensionality reduction. Among these techniques,


C The Author(s) 2024. Published by Oxford University Press on behalf of The Royal Astronomical Society. This is an Open Access
article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which
permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
31
32 G. A. Meles et al.

principal component analysis (PCA) and related methods are the parametrizations and the parametrizations, we preserve the advan-
most common linear dimensionality reduction methods (Boutsidis tageous prior representation capabilities of DGMs while simulta-
et al. 2008; Jolliffe & Cadima 2016). Based on a set of prior model neously achieving precise PCA-based surrogate modelling, thereby
realizations and the eigenvalues/eigenvectors of the corresponding significantly speeding up forward calculations.
covariance matrix, PCA provides optimal M-dimensional represen-
tations in terms of uncorrelated variables that retain as much of the
sample variance as possible. PCA has found widespread applica- 2 METHODOLOGY
tion in geophysics, using both deterministic and stochastic inver-
sion algorithms, with recent advancements offering the potential 2.1 Bayesian inversion
to reconstruct even discontinuous structures (Reynolds et al. 1996; Forward models are mathematical tools that quantitatively evaluate
Giannakis et al. 2021; Thibaut et al. 2021; Meles et al. 2022). In the outcome of physical experiments. We refer to the relationship
the context of complex geological media with discrete interfaces between input parameters and output values as the ’forward prob-
(Strebelle 2002; Zahner et al. 2016; Laloy et al. 2018) investigated lem’:
in this study, PCA-based methods are ineffective in encoding the
prior knowledge. Promising alternatives are offered by deep gen- F(u) = y + . (1)
erative models (DGM) that learn the underlying input distribution Here, F, u, y and  stand for the physical law or forward operator,

Downloaded from https://academic.oup.com/gji/article/237/1/31/7577609 by guest on 02 April 2024


and generate synthetic samples that closely resemble the statistics the input parameters, typically representing local media properties,
in a provided data set based on TIs. For instance, Variational Au- the n-dimensional output of a physical experiment and a noise term,
toencoders (VAEs) encode data patterns into a compact latent space respectively. Given the manuscript’s significant emphasis on mod-
for both data reconstruction and generation; Generative Adversarial elling, we deviate from the standard formalism, and position the
Networks (GANs) use adversarial training to create synthetic out- forward operator on the left-hand side and related subsequent equa-
put that closely resembles real reference data; Diffusion Models are tions, while placing the observed data and the noise term on the
parametrized Markov chains, trained using variational inference, to right-hand side. The goal of the ‘inverse problem’ is to infer prop-
generate samples that converge to match the underlying data distri- erties of u conditioned by the data y while taking into account any
bution (Kingma & Welling 2013; Jetchev et al. 2016; Goodfellow available prior information about u. For a constant model dimen-
et al. 2020; Ho et al. 2020). In the context of Bayesian inversion, sion, a general solution to this problem can be expressed in terms
DGMs possess a crucial property: they generate realizations that ex- of the unnormalized posterior distribution:
hibit patterns consistent with the TI when sampling an uncorrelated
standard normal or uniform distributed latent space (Laloy et al. P ( u | y ) ∝ P ( y | u ) P ( u ) = L ( u ) P ( u ). (2)
2017). The incorporation of such a low-dimensional parametriza- Here, P (u| y) is the posterior distribution of the input parameter
tion representing the prior distribution simplifies the sampling pro- u given the data y, P ( y|u) (also indicated as L (u) and known as
cess, thus making DGMs well-suited for MCMC schemes. ’the likelihood’) is the probability of observing the data y given the
Using an effective parametrization for the prior (e.g. via PCA or input parameter u, while P (u) is the prior distribution in the input
DGMs) does not alleviate the computational cost associated with parameter domain. In case of Gaussian noise, the likelihood takes
repeated forward modelling, which can often limit practical appli- the following form:
cation of MCMC schemes. However, it opens up the possibility  n/2  
of using surrogate modelling. Various classes of surrogate models, 1 1
L ( u) = | C d |−1/2 exp − (F (u) − y)T C d −1 (F (u) − y) .
such as those based on Gaussian process models or Kriging (Sacks 2π 2
et al. 1989; Rasmussen 2003) and polynomial chaos expansions (3)
(PCEs; Xiu & Karniadakis 2002; Blatman & Sudret 2011), can be
used to effectively solve low-dimensional Bayesian inverse prob- where the covariance matrix C d accounts for data uncertainty. Note
lems (Marzouk et al. 2007; Marzouk & Xiu 2009; Higdon et al. that throughout this paper, we adhere to the common formalism
2015; Nagel 2019; Wagner et al. 2020, 2021a). used in Geophysics, utilizing the same symbol to denote both indi-
Recently, Meles et al. (2022) presented a Bayesian framework vidual instances of a random variable and the random variable itself
that uses a shared PCA-based parametrization for characterizing (Tarantola 2005; Aster et al. 2018). To draw samples from the un-
prior information and modelling travel times through PCE. In this normalized posterior in eq. (2), common practice is to use a Markov
paper, we show that the direct application of that methodology pro- chain Monte Carlo (MCMC) methods to draw samples propor-
duces suboptimal results when the input is parametrized in terms of tionally from P (u| y) (Hastings 1970). However, computing L (u)
latent variables associated with a DGM. Indeed, a prerequisite in this requires the solution of a forward problem, which can be demand-
framework is that the parametrization chosen to describe and ex- ing in Bayesian inversions as this evaluation needs to be repeated
plore the prior allows for accurate surrogate modelling of the physi- many times. In the following sections we discuss how this problem
cal data used in the inversion. However, the relationship between the can be approached by using a low-dimensional latent representation
parametrization provided by DGMs and the corresponding physical and surrogate modelling to evaluate P (u) and approximate L (u),
output is typically highly non-linear, and training a surrogate cap- respectively.
turing such complex relationship can be extremely challenging or
infeasible.
2.2 Bayesian inversion in latent spaces
We present a strategy that allows the use of a DGM parametriza-
tion to define the prior distribution and explore the posterior distri- Parametrizations are well suited for surrogate modelling when they
bution while still enabling accurate surrogate modelling. We achieve encode the prior distribution and effectively simplify the input–
this goal by using surrogate modelling with either global or local output relationship within the problem under investigation. Meles
PCA decompositions of the input generated by the underlying DGM et al. (2022) used variables defined in terms of Principal Com-
during the MCMC process. Through the decoupling of the MCMC ponents to (i) represent the prior distribution related to a random
Bayesian tomography via PCE and DGMs 33

Gaussian field on a low-dimensional manifold and (ii) learn an ac- The formulation given above relies on the weak hypothesis that
curate surrogate to compute the forward problem. However, it is not Global proximity in the input domain leads to proximity in the
generally granted that a parametrization can achieve both (i) and output domain u (i.e. the observed quantity under investigation).
(ii). For representing the prior distribution, we can utilize manifold Critically, when the output functionally depends mainly on a subset
identification using DGMs. This involves utilizing a DGM to char- L of the entire domain of u, proximity in the output domain can be
acterize a latent space using a set of coordinates (here indicated attained by approximating the input within this Local region. Based
as z ) with a statistical distribution defined prior to the training. on this strong physically-informed assumption, we can achieve this
The DGM allows mapping between this latent space to the physi- goal by means of a Local PCA decomposition restricted to L:
cal space through a decoder/generator, denoted here as GDGM . For a
G LPC ( x LPC )  L ≈ u  L ⇒ MLPC ( x LPC ) = y + ,
ˆ (8)
given random realization z , the decoder operation GDGM ( z ) produces
an output u in the physical space that adheres to the characteristics where LPC refers to Local Principal Components (in the following
of the prior distribution. The use of this new set of coordinates z LPCs) and L restricts the validity of these relationships to the
casts the inverse problem on the latent manifold as: subset L. However, because the area spanned by LPCs is smaller
P ( z | y ) ∝ P ( y | z ) P ( z ). (4) than that of GPCs, we expect to need fewer LPCs than GPCs to
achieve a satisfactory approximation in the output domain. Note
While formally identical to eq. (2), eq. (4) involves significant ad- that this change of coordinates is not invertible even if a complete

Downloaded from https://academic.oup.com/gji/article/237/1/31/7577609 by guest on 02 April 2024


vantages. Not only is z typically low-dimensional (consisting typi- set of LPCs is used. Note that also for the LPC parametrization, the
cally of a few tens of variables instead of many thousands of vari- corresponding likelihood differs from that in eq. (3), but that using
ables) but we can also design the corresponding statistical prior a proper number of components can render this misfit negligible.
distribution P ( z ) as desired. Here, we impose during training that Generally speaking, a surrogate model is a function that seeks to
P ( z ) is a multivariate standard Gaussian distribution. emulate the behaviour of an expensive forward model at negligible
computational cost per run. Clearly, the function a forward solver
has to model depends on the set of coordinates used to represent the
2.3 Decoupling of inversion and forward modelling
input. For simplicity, we discuss here a surrogate for MGPC ( x GPC ),
domains in MCMC inversions
namely M ˜ GPC satisfying:
We now rewrite the forward problem in eq. (1) using the new coor-
dinates: M
˜ GPC ( x GPC ) ≈ MGPC ( x GPC ). (9)

MDGM ( z ) = y + , (5) Once available, surrogate models can be used for likelihood eval-
uation in MCMC inversions, with a modified covariance operator
where MDGM = F ◦ G DGM , ◦ stands for function composition, and C D = C d + C Tapp comprising the covariance matrices C d and C Tapp
we assume no error induced by the DGM dimensionality reduc- accounting for data uncertainty and modelling error, respectively. In
tion, which in turns implies that the corresponding likelihood is the such cases, the likelihood not only shows a mild dependence on the
same as that in eq. (3). The complexity and non-linearity of GDGM parametrization but also on the surrogate model. This relationship
imply that the forward operator MDGM exhibit considerable irreg- is expressed as follows:
ularity, making it difficult to learn a surrogate model due to typical  n/2 
issues of neural networks, such as those associated with limited 1 −1/2 1 ˜
L ( M ( x GPC )) =
˜ |C D | exp − (M GPC ( x GPC ) − y)
T
data availability and overfitting (Bejani & Ghatee 2021). Conse- 2π 2
quently, we investigate alternative approaches that avoids using the 
C D −1 (M
˜ GPC ( x GPC ) − y) . (10)
latent parametrization for surrogate modelling while retaining it
for the prior representation. Building upon Meles et al. (2022), where | C D | is the determinant of the covariance matrix C D (Hansen
we explore surrogate modelling based on PCA-inputs spanning the et al. 2014). Similar formulas for the likelihood can be derived
Global spatial extent of the input. Without any loss of generality, for any other parametrization of the input space, such as those
we consider a complete set of Global Principal Components (in the associated with DGMs of LPCs. The substantial differences in like-
following, GPCs) for realizations of the DGM (implemented via lihoods associated with the DGM, GPC and LPC parametrizations
GGPC ( x full
GPC ) = GDGM ( z ) = u) and rewrite eq. (1) as:
can potentially give rise to significantly different estimations of the
 full  posterior distribution.
MGPC x GPC = y + , (6)
Surrogate models all adhere to a fundamental principle: the more
where GGPC , and MGPC are the physical distribution and the model complex the input–output relationship, the higher the computational
associated with the GPCs and therefore MGPC = F ◦ G GPC . We demands needed to build the surrogate model (e.g. in terms of train-
will show in that the linear relationship GGPC ( x full
GPC ) = u helps in ing set). Furthermore, the efficiency of constructing a surrogate is
implementing a surrogate of MGPC , provided that the input and the significantly affected by the number of input parameters, and it can
model can be faithfully represented as operating on an effective become unfeasible when the input dimensionality exceeds several
M-dimensional truncated subset x GPC of the new coordinates x full
GPC , tens of parameters, thus surrogate modelling often relies on some
that is: kind of dimensionality reduction. The dimensionality reduction step
does not necessarily need to be invertible since what holds signifi-
G GPC ( x GPC ) ≈ u ⇒ MGPC ( x GPC ) = y + ,
ˆ (7)
cance is the supervised performance, specifically the minimization
where ˆ is a term including both observational noise and modelling of modelling error (Lataniotis et al. 2020). Surrogate models ex-
errors related to the projection on the subset represented by x GPC . hibit their peak potential when dealing with low-dimensional input
Due to the error introduced by the truncation, the likelihood associ- spaces, provided that such simplicity does not entail a complex
ated with the GPC parametrization deviates from the one of eq. (3) input–output relationship.
However, when a substantial number of GPCs are utilized, the extent Based on these general considerations, we can qualitatively antic-
of such difference tends to be minimal. ipate different surrogate performances when operating on different
34 G. A. Meles et al.

latent variables (i.e. M ˜ DGM ( z )), GPCs (i.e. M


˜ GPC ( x GPC )) and LPCs sources/receivers is 0.9 m, leading to 144 traveltimes. We use both
(i.e. M˜ LPC ( x LPC )). We can expect that using the latent variables an eikonal and a 2-D finite difference time domain (FDTD) solver
of a DGM involves a lower dimensionality, albeit at the cost of a to simulate noise-free propagation of GPR waves (Irving & Knight
complex input–output relationship. On the other hand, using GPCs 2006; Hansen et al. 2013). For the FDTD code each source is
likely results in a considerably simpler input–output relationship characterized by a 100 MHz Blackman–Harris function, while per-
but necessitates a larger number of variables, as the entire input fectly matched layers surrounding the propagation domain are used
domain needs to be approximated. In cases where the underly- to prevent spurious reflections from contaminating the data, while
ing forward model permits, the LPCs projection is likely to offer appropriate space-time grids are used to avoid instability and dis-
similarly straightforward input–output characteristics as the GPCs persion artefacts. Traveltimes are picked automatically based on a
approach but with fewer parameters, thus facilitating the training of threshold associated with the relative maximum amplitude of each
a surrogate. source–receiver pair.
To effectively combine MCMC methods with surrogate mod-
elling, it is essential to ensure a high level of accuracy in the
output domain. To achieve this goal, we propose decoupling the
parametrizations used for inversion and surrogate modelling. The 3.2 PCE modelling within MCMC
latent representation provided by the DGM is used to evaluate P ( z )

Downloaded from https://academic.oup.com/gji/article/237/1/31/7577609 by guest on 02 April 2024


and explore the posterior. Once a prior has been defined and a sur- For surrogate-based modelling, we rely on PCE modelling due to
rogate modelling strategy devised, a Metropolis–Hastings MCMC its efficiency, flexibility and ease of deployment (Xiu & Karni-
algorithm can be used to sample the posterior distribution P ( z | y) adakis 2002; Blatman & Sudret 2011; Métivier et al. 2020; Lüthen
(Hastings 1970). A sample of the posterior in the physical space, et al. 2021). We here summarize the most relevant aspects of PCE
P (GDGM ( z )| y), is then also available via mere application of the modelling with a surrogate MGPC ( x GPC ), but similar considerations
GDGM to draws from P ( z | y). For each step in the MCMC process, apply to MDGM , F or MLPC , albeit with relevant caveats or advan-
we consider three strategies with surrogates operating on latent vari- tages that will be discussed below. PCE approximate functions in
ables, on GPCs and or LPCs associated with the field GDGM ( z ). We terms of linear combinations of orthonormal multivariate polyno-
depict these three strategies for a travel time tomography problem mials α :
in Fig. 1.
M
˜ GPC ( x GPC ) = aα α ( x GPC ), (11)
α∈A

3 A P P L I C AT I O N T O G P R C R O S S H O L E where M is the dimension of x GPC and A is a subset of N M imple-


T R AV E L T I M E T O M O G R A P H Y menting a truncation scheme to be set based on accuracy require-
In the previous section, we briefly covered the basic principles of ments and available computational resources (Xiu & Karniadakis
Bayesian methods and emphasized the significance of dimension- 2002). The training of the coefficients aα is computationally un-
ality reduction and surrogate modelling for their implementation. feasible when the input domain is high-dimensional (the case for a
Now, we integrate these concepts to tackle GPR crosshole travel- surrogate F˜ (u) of F(u)). Moreover, when the imposed truncation
time tomography with MCMC. GPR wave-propagation is governed pattern cannot fully account for the degree of non-linearity of the
by the distribution of dielectric permittivity () and electric conduc- underlying model [the case for a surrogate M ˜ VAE ( z ) of MVAE ( z )],
tivity (σ ) in the subsurface. The wave propagation velocity mainly the still unbiased PCE predictions are inevitably affected even if the
depends on permittivity, which is in turn related to porosity and input domain is low-dimensional. On the other hand, using tailored
water content through petrophysical relationships (Gloaguen et al. PCA decomposition as required by M ˜ LPC ( x LPC ) of MLPC ( x LPC )
2005). GPR data can be collected in a variety of configurations, could decrease the computational burden and increase accuracy. In
with crosshole designs being particularly well suited for groundwa- any case, the surrogate forward modelling predictor can be evalu-
ter investigations (LaBrecque et al. 2002; Annan 2005). ated at a negligible cost by direct computation of eq. (11) and its
accuracy estimated using a validation set or cross-validation tech-
niques (Blatman & Sudret 2011; Marelli et al. 2021).
3.1 Experimental setup We here test the three strategies discussed in Section 2.3 us-
ing PCE for surrogate modelling in conjunction with the VAE
We target lossless media represented by binary images with two parametrization for prior-sampling. For each strategy we build a
facies of homogeneous GPR velocity (6 × 107 and 8 × 107 m s−1 ) corresponding PCE to model traveltime arrivals using the Matlab
resembling river channels (Strebelle 2002; Zahner et al. 2016; Laloy Package UQlab (Marelli & Sudret 2014; Marelli et al. 2021). To
et al. 2018). For the representation of the prior and posterior explo- offer a fair comparison, we use the same training and validation
ration, we consider coordinates induced by a VAE (in the following data sets for all proposed schemes.
the subscript VAE refers to this specific DGM parametrization) as
recent research suggests that their lower degree of non-linearity
in the corresponding networks compared with GANs makes them
more amenable for modelling and inversion (Lopez-Alvis et al.
3.2.1 VAE-PCE strategy
2021; Levy et al. 2023). The details of the VAE utilized in this
study can be found in Laloy et al. (2017) and Lopez-Alvis et al. In the first strategy, referred to as VAE-PCE, the input for the PCE
(2021) with the dimension of the latent space being 20. As for the modelling are 20-D z vectors mapping the latent space into the
output, we consider arrival-times associated with the crosshole con- physical one, that is: GVAE ( z ) = u (Lopez-Alvis et al. 2021). This
figuration displayed in Figs 2(a)–(e) with 12 evenly spaced sources choice amounts to applying the strategy by Meles et al. (2022), as
and receivers located in two vertically oriented boreholes. The dis- the same parametrization is used for both prior characterization and
tance between the boreholes is 4.6 m, while the spacing between surrogate modelling.
Bayesian tomography via PCE and DGMs 35

Downloaded from https://academic.oup.com/gji/article/237/1/31/7577609 by guest on 02 April 2024


Figure 1. Graphical schematic of the three strategies discussed in this manuscript. The prior input distribution is always parametrized by the latent space
of the DGM, with P ( z ) = N (0, Idim ( z) ), while the input to the reference/surrogate model depends on the chosen strategy. When the DGM parametrization
is chosen, draws of the latent variables are considered (left column). With the PCA strategies, Globally or Locally defined G/LPCs are used for the Global
and Local approach, respectively (central and right column). The Global approach uses a set x GPC of PCs defined across the entire domain, illustrated by the
corresponding velocity field that extends throughout the entire medium. Conversely, the Local approach utilizes a set x LPC of PCs that cover only specific
portions of the domain chosen based on physical considerations. This is exemplified by the corresponding velocity field, which is confined to a limited portion
of the medium. In this illustration, as we consider travel time modelling, the Local domain is defined in terms of fat-ray sensitivity kernels (for more details,
please refer to Section 3.2 ).

3.2.2 Global-PCA-PCE strategy To quantify the faithfulness of the various reduced parametriza-
tions in terms of the output, we consider 100 realizations of the
The second strategy, in the following Global-PCA-PCE, uses in-
generative model, and compute the resulting histograms of the trav-
puts of the PCE modelling defined in terms of projections on
eltime residuals using the reference forward solver. The root-mean-
prior-informed PCA components spanning the entire domain. More
square error (in the following, rmse) of the misfit between the data
specifically, in the Global-PCA-PCE approach we randomly cre-
associated with the original distribution and its projections on 30, 60
ate a total of 1000 slowness realizations GVAE ( z ) from the prior
and 90 principal components, shown in Figs 4(i)–(k), are 1.60, 0.85
and compute the corresponding principal components (see Fig. 3).
and 0.55 ns, respectively, that are to be compared to the expected
The input for PCE in the Global-PCA-PCE approach are then the
level of GPR data noise of 1 ns for 100 MHz data (Arcone et al.
projections of GVAE ( z ) on a to-be-chosen number of such GPCs.
1998). The number of principal components (i.e. 90 PCs) required
Following Meles et al. (2022), all PCA processes are defined in
to approximate the forward solver below the expected noise level
terms of slowness.
is larger than for the example considered by Meles et al. (2022)
The effective dimensionality of the input with respect to MGPC ,
(i.e. 50 PCs). Building a PCE on such a large basis is challenging
that is, the number of GPCs representing the input, is not a-priori
in terms of computational requirements and efficiency, and could
defined. Following a similar approach to Meles et al. (2022), the
lead to poor accuracy if a small training set is used. To address
effective dimensionality is here assessed by comparison with the
this, one approach is to either reduce the number of components,
reference solution in the output domain with respect to the noise
which introduces larger modelling errors, or explore alternative
level. In Figs 4(a) and (e), two velocity distributions are shown next
parametrizations that offer improved computational efficiency and
to the approximate representations (Figs 4b–d and f–h) obtained by
accuracy. In this study, the Global-PCA-PCE approach utilizes 60
projecting them on 30, 60 and 90 GPCs, respectively. As expected,
GPCs, while an alternative strategy is considered below that is based
the reconstruction quality improves as more principal components
on physical considerations.
are included.
36 G. A. Meles et al.

Downloaded from https://academic.oup.com/gji/article/237/1/31/7577609 by guest on 02 April 2024


Figure 2. (a)–(e) Representative velocity fields generated by the decoder of the used VAE; crosses and circles stand for sources and receivers, respectively.
(f)–(h) Corresponding exemplary traveltime gathers.

Figure 3. (a)–(j) The first five GPCs in the input domain corresponding to entire slowness fields. Crosses and circles stand for sources and receivers, respectively.

3.2.3 Local-PCA-PCE strategy expect that fewer LPCs are needed than GPCs to achieve the de-
sired input–output accuracy. In practice, our construction of LPCs
As mentioned in Section 2.3, an improved parametrization for surro-
involves utilizing fat-ray sensitivity kernels, which capture the sensi-
gate modelling can sometimes be found by considering the forward
tivity of arrival-times to small perturbations in the subsurface model,
problem’s specific characteristics. The GPCs in the Global-PCA-
thus providing valuable insights into the regions [corresponding to
PCE approach refer to the input field in its entirety. However, the
the L subset in eq. (8)] that have the most significant impact on the
actual first-arrival time for a given source–receiver combination de-
observed data (Husen & Kissling 2001). For a given source–receiver
pends only on a subdomain of the entire slowness distribution. This
pair, the corresponding sensitivity kernel depends on the slowness
leads us to suggest a Local approach, in the following referred to
field itself and its patterns can vary significantly (see Figs 5a–j).
as Local-PCA-PCE. Instead of using principal components describ-
The sought Local decomposition needs to properly represent any
ing the entire slowness field, we aim to use output-specific LPCs
possible slowness field within the prior, thus it reasonable to define
that characterize only the sub-domains impacting the physics of
it based on a representative sample of input realizations. To reduce
the problem (Jensen et al. 2000; Husen & Kissling 2001). We then
Bayesian tomography via PCE and DGMs 37

Downloaded from https://academic.oup.com/gji/article/237/1/31/7577609 by guest on 02 April 2024


Figure 4. (a) and (e) Random velocity distributions and the corresponding representations using the first (b) and (f) 30, (c) and (g) 60, and (d) and (h) 90 GPCs
defined with the Global approach; (i–k) corresponding histograms of traveltime residuals based on simulations on the true field and partial reconstructions for
100 velocity distributions. Crosses and circles stand for sources and receivers, respectively.

the overall PCE computational cost it is also convenient to limit cumulative kernel (see Figs 5k–t). Based on this insight, we define
the number of used output-driven decompositions. To achieve these principal components spanning only the area covered by such cu-
goals, we assume that the prior model is stationary with respect mulative kernels or relevant parts thereof (e.g. a threshold can be
to translation. Instead of focusing on each specific source–receiver taken into consideration to reduce the size of these subdomains).
pair (a total of 144), we can then consider each source–receiver For the practical definition of the components, the cumulative ker-
angle (a total of 23). We then use a total of 1000 slowness real- nels are either set to 0 or 1 depending on whether the correspond-
izations GVAE ( z ) from the prior and build the corresponding fat-ray ing value is below or larger than the threshold, respectively. We
sensitivity kernels using the reference eikonal solver for each of the then multiply point by point the slowness distributions with the cu-
23 possible angles (Hansen et al. 2013). For any given angle, we mulative kernels, and consider the principal components of such
consider a cumulative kernel consisting of the sum of the absolute products.
values of each kernel (green areas in Figs 5k–t). Such a cumulative The first five LPCs are shown for given source–receiver pairs
kernel cover an area larger than each individual kernel but is still (Figs 6a–e and f–j). Note that the pattern variability is confined
considerably smaller than the entire slowness field. For any possible within the cumulative kernels, while in the complementary areas the
additional input model, the corresponding sensitivity kernel is then values are 0. Note also that compared to the five principal compo-
very likely to be geometrically included in the area covered by the nents in Fig. 3, higher resolution details can be identified. Given the
38 G. A. Meles et al.

Downloaded from https://academic.oup.com/gji/article/237/1/31/7577609 by guest on 02 April 2024

Figure 5. (a)–(j) For the velocity fields in Fig. 2, the sensitivity-based kernels for two arbitrarily selected source–receiver pairs are shown superimposed on the
corresponding velocity distributions. (k)–(t) The same sensitivity kernels as in (a)–(j), but superimposed on the cumulative kernels (green shaded areas) used
to define the support of the sensitivity-based LPCs used in the Local-PCA-PCE approach.

same number of principal components, we can then expect the input altitude angle, the same kernels and principal components are used,
to be better presented in the physically relevant subdomain when provided they are shifted to cover the appropriate geometry.
the Local-PCA-PCE rather then the Global-PCA-PCE approach is In Figs 7(a)–(g) the two slowness distributions from Fig. 4 are
followed. For all source–receiver pairs corresponding to the same shown next to the representations obtained by projecting them on
Bayesian tomography via PCE and DGMs 39

Downloaded from https://academic.oup.com/gji/article/237/1/31/7577609 by guest on 02 April 2024


Figure 6. The first five LPCs in the input domain corresponding to the cumulative kernels associated with the source–receiver pairs considered in Figs 5(k)–(t).

Figure 7. (a) and (g) Velocity fields with (b)–(f) and (h)–(l) the corresponding representations used for surrogate modelling based on the first 30 LPCs used in
the Local-PCA-PCE approach. Different kernels are used for each source–receiver angle.
40 G. A. Meles et al.

30 LPCs. In the areas complementary to the sensitivity kernels, the corresponding likelihood to mildly differ from that involving no
speed is set to 0.07 m ns−1 . Input reconstructions are remarkably truncation in the GPC projection and exact modelling.
improved with respect to when using the same number of GPCs
[compare Figs 7(a)–(g) and (h)–(l) to Figs 4(b) and (f)] of the entire
slowness field. More importantly, the modelling errors provided by
3.3.3 Local-PCA-PCE performance
using just 30 sensitivity-based LPCs is lower than what was previ-
ously provided by 90 standard components (i.e. rms ≈0.45 ns). By In the Local-PCA-PCE approach, for each of the 23 angles consid-
incorporating these tailored LPCs, we can attain enhanced output ered, training involves randomly chosen source–receiver pair data
fidelity when utilizing truncated representations of the input. This associated with identical angles, while the final rmse is computed on
enhanced fidelity proves particularly advantageous for the imple- the standard 144 traveltime gathers. For a more comprehensive eval-
mentation of PCE, allowing for more precise and efficient mod- uation of the Local-PCA-PCE scheme, we incorporate the results of
elling. Consequently, this approach holds substantial promise in FDTD data processing. In this analysis, training and validation rely
achieving superior accuracy and computational efficiency in PCE- on the FDTD data, while the PCE implementation remains consis-
based analyses. tent with that of the eikonal data. Results are similar and well below
In summary, we have introduced three different parametrizations the noise, with an rmse of 0.65 ns (the corresponding histogram is
to be used as input for PCE. We consider coordinates inherited by displayed in Fig. 8d). All PCE results are unbiased and the model er-

Downloaded from https://academic.oup.com/gji/article/237/1/31/7577609 by guest on 02 April 2024


the VAE, and principal components derived by considering either rors have Gaussian-like distributions. Given the excellent accuracy
entire slowness fields or sensitivity-based subdomains. We refer to of the PCE performance for this parametrization, we can expect the
these three parametrizations in what follows as VAE-PCE, Global- corresponding likelihood to only slightly differ from that involving
PCA-PCE and Local-PCA-PCE, respectively. no truncation in the LPC projection and exact modelling.
Depending on the input parametrization, PCEs approximate
eikonal and FDTD solvers to different degrees. Fig. 9 represent the
3.3 PCE performance corresponding covariance matrices accounting for the modelling
error of each surrogate model.
We here analyse the PCE performance of the different input We now discuss the computational burden of each strategy when
parametrizations for surrogate modelling introduced in Section 3.2. running on a workstation equipped with 16GB DDR4 of RAM and
In agreement with Meles et al. (2022), we consider for each surro- powered by a 3.5 GHz Quad-Core processor running Matlab and
gate a total of 900 training sets consisting of noise-free travel times Python on Linux. We emphasize that our goal with the present
calculated by the eikonal solver and a polynomial degree p of five. manuscript is to propose novel methods to conjugate VAE and PCE
rather than offer optimized implementations.
There are up to three relevant computational steps in the execu-
3.3.1 VAE-PCE performance tion of a forward run for the VAE-PCE, Global-PCA-PCE, Local-
When the VAE parametrization is used as input, the PCE perfor- PCA-PCE, eikonal and FDTD simulations, namely the evaluation
mance is rather poor, with a rmse of 2.01 ns, which is well beyond of the physical input, GVAE ( z ), its decomposition on either GPCs or
the expected noise level in the data and is consequently considered LPCs and the actual evaluation of the forward or surrogate model.
unsatisfactory (Fig. 8a). The poor performance is due to the highly Not all methods require each of these steps. The VAE-PCE model,
non-linear map MVAE . In such a scenario, PCE does not provide ac- for example, is not a function of GVAE ( z ) but depends on z only.
curate results even if the physical input of the validation set is exactly Evaluation of the VAE-PCE model is extremely fast both when
defined by GVAE ( z ) = u. In this scheme, note that the evaluation of involving one or 35 (as in the MCMC inversion discussed below
GVAE ( z ) is not required to compute the corresponding PCE. Given that is based on 35 chains) simultaneous model runs, taking on
the low accuracy of the PCE performance for this parametrization, average ≈0.06s and ≈0.08s, respectively. Evaluation of the Python-
we can expect the corresponding likelihood to differ significantly based decoder GVAE ( z ) required for all forward models except the
from that involving exact modelling. VAE-PCE, is the bottleneck of the Matlab algorithms used herein,
requiring ≈1.35s and ≈1.43s, respectively, when operating on one
or 35 input when considering the eikonal solver. Nevertheless, this
cost could be reduced by either implementing the decoder and the
3.3.2 Global-PCA-PCE performance
PCE within the same Python environment or by optimizing the calls
Despite the comparatively poor reconstruction of the input (e.g. to the Matlab/Python scripts in our code for greater efficiency. When
GG/LPC ( x ) ≈ u) provided by the PCA approaches, the corresponding in its native environment, evaluation of GVAE ( z ) is actually very fast,
parametrizations perform well when used as input to build PCE sur- taking only ≈0.005s and ≈0.08s when operating on one or 35 in-
rogates, with the Global-PCA-PCE approach being outperformed puts, respectively. Still, note that this cost is overall negligible even
by the Local-PCA-PCE scheme in terms of accuracy [rmse of 1.31 in our non-optimized setting when considering expensive physics-
and 0.68 ns, respectively, see Figs 8(b) and (c) for the corresponding based forward solvers such as FDTD. Only the Global-PCA-PCE
histograms]. In both of these cases, the PCE operates on more vari- and Local-PCA-PCE strategies require PCA decompositions. The
ables (i.e. 60 and 30 for the Global-PCA-PCE and Local-PCA-PCE Global-PCA-PCE approach is faster, requiring only up to ≈0.002s
parametrizations, respectively, versus 20 for the VAE-PCE scheme). and ≈0.05s when applied to one and 35 input elements, respec-
Moreover, the input for the Global-PCA-PCE and Local-PCA-PCE tively, while the Local-PCA-PCE method is slower, taking up to
schemes are projections of images, which require the evaluation ≈0.06s and ≈0.23s in the same situation. For the Global-PCA-
of GVAE ( z ), on G/LPCs. As such, evaluation of the corresponding PCE method the cost of a single forward run is ≈0.52s, which
PCEs is computationally more expensive for the PCA-based ap- is significantly more than for VAE-PCE. The difference between
proaches than for the VAE-PCE case. Given the good accuracy of these two PCE strategies can be attributed to the significantly larger
the PCE performance for this parametrization, we can expect the number of input parameters of the Global-PCA-PCE with respect
Bayesian tomography via PCE and DGMs 41

Figure 8. (a)–c) Histograms of the model error with respect to the PCE prediction when using the VAE (20 input parameters), Global (60 input parameters)
and Local (30 input parameters per angle) parametrizations of the input in the PCE-based surrogate modelling, respectively, using the eikonal solver to compute
the training set. (d) Histogram of the model error using Local parameters and FDTD reference data.

Downloaded from https://academic.oup.com/gji/article/237/1/31/7577609 by guest on 02 April 2024


Figure 9. Model error covariance matrices associated with PCE-based surrogate modelling based on: (a) VAE-PCE, (b) Global-PCA-PCE and (c) Local-PCA-
PCE schemes trained with 900 eikonal data sets. (d) Model error covariance matrix for the Local-PCA-PCE scheme trained with FDTD data.

to the VAE-PCE scheme (i.e. 60 versus 20). Note that the PCE to either one or 35 input, respectively. While such a result cannot
model evaluations are vectorized and, therefore, the cost is almost be generalized, it is always possible to test the corresponding PCEs
the same when applied to 35 input (≈0.57s). Moreover, the com- accuracy with a representative evaluation set. The option of relying
putational cost of the Global-PCA-PCE method could be reduced on a single family of polynomials for the Local-PCA-PCE method
by applying a PCA decomposition of the output, akin to what is is certainly to be taken into account when optimising the approach.
proposed in Meles et al. (2022). Despite involving fewer variables
than the Global-PCA-PCE approach, the Local-PCA-PCE method
3.4 Inversion results
is slightly more computationally demanding with a cost of ≈0.64s
and ≈0.65s, respectively, when operating on one or 35 input, re- We now explore the performance of the different input parametriza-
spectively. The increase in cost compared to the Global-PCA-PCE tions used for PCE-based surrogate modelling, namely VAE-PCE,
method depends on the fact that for each source–receiver angle (θ), Global-PCA-PCE and Local-PCA-PCE, when performing proba-
the Local-PCA-PCE utilizes its own polynomial basis, denoted as bilistic MCMC inversion. The inversions were carried out using the
 θ in the following. UQLAB Matlab-based framework (Marelli & Sudret 2014; Wag-
In comparison with the PCE methods, the cost of the reference ner et al. 2021b). As reference model, consider the field shown
eikonal solver is basically a linear function of the number of input in Fig. 10, which is used to generate a total of 144 traveltimes
distributions it operates on. A single run requires ≈0.05s, while for using the reference eikonal and FDTD solvers. Note that this
35 velocity distributions the cost increases up to ≈1.67s. As such, field is not used to train the PCEs. Uncorrelated Gaussian noise
its cost is either significantly smaller or slightly larger than what characterized by σ 2 = 1 ns2 was added to the data used for
is required by the Global/Local-PCA-PCE approaches. Finally, the inversion.
cost required by the reference FDTD code is ≈120s and ≈4200s if We use a Metropolis–Hastings algorithm, and run 35 non-
operating on either one or 35 velocity distributions, which is orders interacting Markov chains in parallel for 4 × 105 iterations per
of magnitude longer than for the eikonal or PCE models. These chain. During burn-in determined according to the Geweke method,
results are summarized in Table 1, where we estimate the perfor- the scaling factor of the proposal distribution is tuned such that an
mance of an ideally-optimized Local-PCA-PCE method benefiting acceptance rate of about 30 per cent is achieved for each experi-
from (a) evaluating GVAE ( z ) in its native environment and (b) using ment (Geweke 1992; Brunetti et al. 2019). Finally, outlier chains
a single family of polynomials  θ for all angles. In numerical re- with respect to the interquartile range statistics discussed in Vrugt
sults not presented herein, we find that choosing any one of the  θ et al. (2009) are considered aberrant trajectories and are ignored in
families for all models provides nearly identical fidelity to what is the analysis of the results.
achieved by using specifically tailored polynomials for each angle We first present the results for training data generated by an
at the considerably smaller cost of ≈0.06s and 0.16s when applied eikonal solver. We compare VAE-PCE, Global-PCA-PCE and
42 G. A. Meles et al.

Table 1. Summary of the computational cost of the various steps for a single realization/batch of 35 input of the proposed algorithms.
In addition to the strategies used in the MCMC examples discussed in the manuscript and summarized in the first four columns
(i.e. the VAE-PCE, Global- and Local-PCA-PCE and eikonal schemes), we also consider the cost of the FDTD approach using the
reference code (fifth column) and an optimized Local-PCA-PCE approach ideally benefiting from executing the VAE decoder in the
same environment as the PCE model and based on a single polynomial family for all angles (sixth column).
VAE-PCE Global-PCA-PCE Local-PCA-PCE Eikonal FDTD Optimized Local-PCA-PCE
GVAE ( z ) 0 ≈1.35/1.43s ≈1.35/1.43s ≈1.35/1.43s ≈1.35/1.43s ≈0.005/0.08s
PCA 0 ≈0.002/0.05s ≈0.06/0.23s 0 0 ≈0.06/0.23s
Forward ≈0.06/0.08s ≈0.52/0.57s ≈0.64/0.65s ≈0.05/1.67s ≈120/4200s ≈0.06/0.16s

feasible to run for comparison purposes. The quality of the solu-


tion offered by the surrogate on FDTD data can be heuristically
appreciated by noting its similarity to results obtained by the Local-
PCA-PCE based on eikonal data, which in turn produces results
close to those of the eikonal solver on eikonal data. Although not

Downloaded from https://academic.oup.com/gji/article/237/1/31/7577609 by guest on 02 April 2024


strictly consequential, it is to be expected that the results offered by
the surrogate-based on FD data would also be similar to those that
would have been obtained if using the FDTD solver on FDTD data.
High standard deviations of the posterior distribution are dis-
tributed in wide domains of the image when VAE-PCE is used (see
Fig. 11f). In contrast, when Global-PCA-PCE (Fig. 11g), Local-
PCA-PCE (Fig. 11h) and eikonal (Fig. 11i) solvers are used, high
standard deviation values are found mainly along channel bound-
aries in agreement with other studies (Zahner et al. 2016). Con-
vergence is assessed using the potential-scale reduction factor Rˆ
that compares the variance of the individual Markov chains with the
variance of all the Markov chains merged together (Gelman & Rubin
1992) calculated from the second half of the chains. Convergence
is usually assumed if R < 1.2 for all parameters. In our experi-
ments, full convergence for all of the 20 parameters is achieved
Figure 10. The reference velocity distribution used in the numerical inver- when the VAE-PCE, the Global-PCA-PCE and the Local-PCA-
sion experiments.
PCE approaches are used. Six parameters do not converge when
the eikonal solver is used, but the values of R are nevertheless
Local-PCA-PCE inversion results to those achieved by using the close to 1.2 (see Fig. 12). Further quantitative assessments can be
eikonal solver, which represent the reference solution since the achieved by comparing the reference input and the correspond-
full physics solver is used in the entire MCMC process. This test ing inversion solutions in terms of input domain rmse, structural
provides an ideal benchmark as we compare the proposed PCE similarity (in the following, SSIM) and rmse in the output do-
strategies with results derived from exact modelling. Note that such main (Gneiting & Raftery 2007; Levy et al. 2022). Among these
a comparison is not feasible for the FDTD scenario due to the metrics, SSIM specifically evaluates the structural similarity be-
computational burden associated with the corresponding MCMC tween images, emphasizing their underlying patterns and details.
process. Inversion results in terms of mean and standard deviations It assigns a value between −1 and 1, with −1 indicating a no-
incorporating the model error [i.e. using the PCE-derived C Tapp in table dissimilarity and 1 denoting perfect similarity. Again, we
eq. (10)] are shown in Figs 11(a)–(e). The mean of the posterior see that the VAE-PCE performs poorly, with a low SSIM value.
obtained using the VAE-PCE poorly resembles the reference ve- Better results are provided by the Global-PCA-PCE strategy. The
locity field, with relevant topological differences between the two Local-PCA-PCE scheme results are the closest to the reference
(compare Fig. 10 to Fig. 11a). Note that the data misfit is partic- solutions achieved using the eikonal solver. It is found that the
ularly large (i.e. 3.49 ns). This poor performance is also obtained FDTD Local-PCA-PCE performs similarly to the Local-PCA-PCE
when considering other test models (see the Appendix). The mean strategy. The results for all considered metrics are summarized in
of the posterior provided by the Global-PCA-PCE approach shares Table 2.
many features with the reference distribution, but differences are We also consider histograms of SSIM values in the correspond-
apparent in the lower and upper channelized structures. The simi- ing posterior distributions (Fig. 13). The VAE-PCE posterior has
larity between the posterior mean and the true distribution increases low similarity with the reference model, with the maximum SSIM
significantly when the Local-PCA-PCE is used (compare Fig. 10 to value being below 0.5. Closer proximity is found among samples
Fig. 11c). These results also show close proximity with the poste- obtained using the Global-PCA-PCE approach, a trend that is fur-
rior mean solution obtained by the eikonal solver (see Fig. 11d), ther improved when considering the Local-PCA-PCE scheme that
that is, without any surrogate modelling. Also, when the FDTD shows some overlap with the results of the reference eikonal in-
Local-PCA-PCE is used, that is when training and inversion is per- version. Note again that the statistics of the FDTD Local-PCA-
formed based on FDTD data, an almost identical solution to what is PCE algorithm is again similar to the Local-PCA-PCE scheme.
achieved using the Local-PCA-PCE is obtained (see Fig. 11e). For These findings can be further appreciated by considering ran-
this alternative data set, the use of the FDTD solver in the inver- dom posterior realizations for each of the inversion strategies
sion algorithm would be extremely expensive and is not considered (see Fig. 14).
Bayesian tomography via PCE and DGMs 43

Downloaded from https://academic.oup.com/gji/article/237/1/31/7577609 by guest on 02 April 2024


Figure 11. Posterior mean and standard deviation respectively for the (a, f) VAE-PCE inversion scheme, (b, g) Global-PCA-PCE, (c, h) Local-PCA-PCE, (d,
i) eikonal and (e, j) FDTD Local-PCA-PCE inversion strategies.

Figure 12. Gelman–Rubin statistics for the various inversion strategies using 35 chains after 4 × 105 iterations per chain.

4 DISCUSSION alone does not ensure efficient estimation of posterior distributions


when using MCMC methods. In many practical situations, the use
Deep generative models offer a flexible framework for encoding
of surrogate models becomes beneficial or even essential for evalu-
complex spatial priors, enabling the description of intricate input
ating the likelihood functions effectively. Surrogate modelling with
distributions. However, proper parametrization of prior distributions
44 G. A. Meles et al.

Table 2. Assessment of the inversion results in the input and output domains for the various surrogate
modelling strategies.
Model Rmse mean velocity SSIM mean velocity Rmse mean output
VAE-PCE 8.01 × 106 m s−1 0.30 3.49 ns
Global-PCA-PCE 5.38 × 106 m s−1 0.54 1.49 ns
Local-PCA-PCE 2.67 × 106 m s−1 0.73 1.15 ns
Eikonal 1.57 × 106 m s−1 0.87 1.01 ns
FD Local-PCA-PCE 3.06 × 106 m s−1 0.71 1.15 ns

Downloaded from https://academic.oup.com/gji/article/237/1/31/7577609 by guest on 02 April 2024


Figure 13. Histograms of SSIM values across the posterior distributions for the various inversion strategies.

PCE has become widespread in many disciplines. The massive de- bound of the PCA decomposition decreases when more PCs are
crease of the computational costs associated with PCE is achieved taken into account, the corresponding accuracy of PCE is primarily
by approximating demanding computational forward models with limited by the size of the training set. Increasing the number of
simple and easy-to-evaluate functions. A key requirement is that components can actually worsen PCE performance if the training
the number of input variables describing the computational model set is insufficient to determine the polynomial coefficients, which,
is relatively small (i.e. up to a few tens) and that the target model as mentioned, grow significantly with input size. In our case, using
can be approximated by truncated series of low-degree multivariate 90 components would imply an rmse of 1.39 ns, which is worse
polynomials. The number of coefficients defining the PCE model than what was obtained by the 60 components PCE.
grows polynomially in both the size of the input and the max- Both the VAE-PCE and Global-PCA-PCE consist of 144 (i.e. the
imum degree of the expansion. When the reference full-physics total number of source–receiver combinations) different PCE mod-
model response is highly non-linear in its input parameters, the els operating, for each traveltime simulation, on an identical input,
problem is typically non-tractable (Torre et al. 2019). Since the high- that is, the latent variables of the DGM or the 60 PCs characterizing
fidelity mapping of complex prior distributions provided by DGMs the entire physical domain. On the other hand, the Local-PCA-PCE
is based on highly non-linear relationships between latent variables scheme consists of 23 (i.e. the total number of source–receiver al-
and physical domains, replicating the performance of such net- titude angles) different PCE models operating, for each traveltime
works and/or composite functions thereof (e.g. MVAE = F ◦ G VAE simulation, on 144 different input, that is, the 30 LPCs characteriz-
in eq. 5) using PCE is problematic. To circumvent this challenge, ing each local subdomain. Since each of the 23 models operates on
we have explored two PCA-based decompositions that facilitated specific LPCs, the corresponding families of orthonormal polyno-
the straightforward implementation of PCE. One decomposition mials  θ are different. This is in contrast with the Global-PCA-PCE
was designed to encompass the entire input domain, while the other method, for which each model operates via a single family of polyno-
specifically focused on subdomains of particular physical relevance. mials. Thus, the Local-PCA-PCE scheme is computationally more
Although this latter concept is investigated here in the context of demanding than the Global-PCA-PCE (see Table 1). However, the
traveltime tomography, the integration of problem-related PCs op- use of a single family of polynomials can also be considered for the
erating synergistically with DGM-defined latent parametrizations Local-PCA-PCE method, resulting in shorter run time. When con-
has the potential for broader applications. sidering computational performance, an optimal implementation of
Whatever the choice of input coordinates, for example, based on GVAE ( z ) should also be sought.
PCA or local properties of the input, the determining criterion for In this study, to determine the minimum number of G/LPCs for
evaluating the quality of the corresponding PCE should always be constructing an accurate PCE, we assess the lower bound of output
performance on a representative validation set. In case of PCA, the prediction misfit rmse as a function of the number of G/LPCs used.
lower bound of prediction misfit rmse can be a priori estimated by We project the input onto subsets of G/LPCs, typically ranging in the
comparing the accuracy of the reference model acting on the full tens. This process generates non-binary images, which are then uti-
and the compressed domains, that is, MG/LPC ( x full ) and MG/LPC ( x ). lized to compute the output using the reference forward modelling.
In our case, such lower bounds for the Global-PCA-PCE approach Alternatively, we could consider rebinarizing the reconstructed im-
operating with 60 components is 0.85 ns. Using the Local-PCA- ages as done in Thibaut et al. (2021). This approach would bring the
PCE scheme with 30 components only, the rmse drops to 0.55 ns. projected input back into the prior, but this property is not neces-
However, the accuracy of the corresponding Global/Local-PCA- sarily relevant for the determination of PCE accuracy. Irrespective
PCE is worse, with rmse of 1.31 and 0.67 ns, respectively, mainly of the chosen reconstruction algorithm, the Local approach main-
due to the small size of the training set. Note that while the lower tains a significant advantage over the Global method. When using
Bayesian tomography via PCE and DGMs 45

Downloaded from https://academic.oup.com/gji/article/237/1/31/7577609 by guest on 02 April 2024

Figure 14. Random samples from the posterior obtained by using the (a) VAE-PCE, (b) Global-PCA-PCE, (c) Local-PCA-PCE, (d) eikonal and (e) FD
Local-PCA-PCE strategies. The results in (c)–(e) are visually similar.

an equal number of components, LPCs, in fact, consistently yield rather than advanced deep learning methods for surrogate mod-
superior approximations of the relevant input compared to GPCs. elling can be advantageous in terms of ease of implementation,
We have seen that once a DGM-based latent parametrization has as potentially complex training of a neural network is not needed.
been found to reduce the effective dimensionality of the input do- Many effective sampling methods, such as Adaptive Metropolis,
main and, based on PCA decompositions, high fidelity PCEs have Hamiltonian Monte Carlo, or the Affine Invariant Ensemble Algo-
been trained, MCMC inversion can be efficient. Relying on PCE rithm (Duane et al. 1987; Haario et al. 2001; Goodman & Weare
46 G. A. Meles et al.

2010) could be considered in our workflow instead of the current REFERENCES


use of a standard Metropolis–Hastings sampling algorithm. Alizadeh, R., Allen, J.K. & Mistree, F., 2020. Managing computational
Adaptation of the Global-PCA-PCE strategy presented here complexity using surrogate models: a critical review, Res. Eng. Des., 31,
could easily be used for other imaging problems such as active 275–298.
or passive seismic tomography at different scales (Bodin & Sam- Annan, A.P., 2005. GPR methods for hydrogeological studies, in Hydrogeo-
bridge 2009; Galetti et al. 2017). On the other hand, implementation physics, Part of the Water Science and Technology Library book series
of the Local-PCA-PCE schemes would depend on the properties of (WSTL, Vol. 50), pp. 185–213, eds Rubin, Y. & Hubbard, S.S., Springer.
the corresponding sensitivity kernels, which would require more Arcone, S.A., Lawson, D.E., Delaney, A.J., Strasser, J.C. & Strasser, J.D.,
careful evaluation and problem-specific design. 1998. Ground-penetrating radar reflection profiling of groundwater and
bedrock in an area of discontinuous permafrost, Geophysics, 63(5), 1573–
1584.
5 C O N C LU S I O N S Asher, M.J., Croke, B.F., Jakeman, A.J. & Peeters, L.J., 2015. A review of
Low-dimensional latent variables associated with deep generative surrogate models and their application to groundwater modelling, Water
models, such as VAE, conform well to complex priors, and pro- Resour. Res., 51(8), 5957–5973.
Aster, R.C., Borchers, B. & Thurber, C.H., 2018. Parameter Estimation and
vide an attractive basis to explore posterior model distributions us-
Inverse Problems, Elsevier.
ing MCMC strategies. MCMC methods can also benefit greatly
Bejani, M.M. & Ghatee, M., 2021. A systematic review on overfitting control
from surrogate modelling based on PCE, provided the forward

Downloaded from https://academic.oup.com/gji/article/237/1/31/7577609 by guest on 02 April 2024


in shallow and deep neural networks, Artif. Intell. Rev., 54, 6391–6438.
model can be approximated by low-degree multivariate polyno- Blatman, G. & Sudret, B., 2011. Adaptive sparse polynomial chaos expan-
mials. However, this type of latent variable models tend to have sion based on least angle regression, J. Comput. Phys., 230(6), 2345–2367.
a highly non-linear relation to data, and are, thus, poorly approxi- Bodin, T. & Sambridge, M., 2009. Seismic tomography with the reversible
mated by low-degree PCEs. As such, performing PCE-accelerated jump algorithm, Geophys. J. Int., 178(3), 1411–1436.
MCMC inversion based on a latent VAE-based parametrization for Boutsidis, C., Mahoney, M.W. & Drineas, P., 2008. Unsupervised feature
both inversion and as input to surrogate modelling leads to large selection for principal components analysis, in Proceedings of the 14th
posterior uncertainty due to the need to account for important mod- ACM SIGKDD International Conference on Knowledge Discovery and
Data Mining, pp. 61–69.
elling errors in the likelihood function. In the context of GPR trav-
Brunetti, C., Bianchi, M., Pirot, G. & Linde, N., 2019. Hydrogeological
eltime tomography, for instance, PCE-based surrogate modelling
model selection among complex spatial priors, Water Resour. Res., 55(8),
using VAE latent variables as input results in modelling errors that 6729–6753.
are well beyond the expected noise level in the data. By separating Chipman, H., George, E.I., McCulloch, R.E., Clyde, M., Foster, D.P. & Stine,
the parametrization used for inversion and the one used as input for R.A., 2001. The practical implementation of Bayesian model selection,
surrogate modelling, we can circumvent this problem and perform Lecture Notes-Monogr. Ser., 38, 65–134.
MCMC in a latent space defined by deep generative models while Duane, S., Kennedy, A.D., Pendleton, B.J. & Roweth, D., 1987. Hybrid
surrogate modelling is approximated by PCEs operating on glob- Monte Carlo, Phys. Lett. B, 195(2), 216–222.
ally or locally defined principal components. We find that these two Galetti, E., Curtis, A., Baptie, B., Jenkins, D. & Nicolson, H., 2017. Transdi-
approaches largely outperform surrogate modelling based on VAE mensional Love-wave tomography of the British Isles and shear-velocity
structure of the East Irish Sea Basin from ambient-noise interferometry,
input parametrizations. For the channelized structures considered,
Geophys. J. Int., 208(1), 36–58.
modelling errors are comparable to the typical observational errors
Gelman, A. & Rubin, D.B., 1992. Inference from iterative simulation using
when PCE are based on globally defined principal components and multiple sequences, Stat. Sci., 7(4), 457–472.
significantly lower when locally defined principal components are Geweke, J., 1992. Evaluating the accuracy of sampling-based approaches to
considered. Generally speaking, using PCE significantly reduces the the calculations of posterior moments, Bayesian Stat., 4, 641–649.
computational burden of MCMC, but it can be successfully used Giannakis, I., Giannopoulos, A., Warren, C. & Sofroniou, A., 2021. Fractal-
to perform non-linear MCMC inversion only if the corresponding constrained crosshole/borehole-to-surface full-waveform inversion for
modelling error is comparatively low. In this manuscript we have hydrogeological applications using ground-penetrating radar, IEEE Trans.
shown how PCE based on VAE parametrizations performs poorly Geosci. Remote Sens., 60, doi:10.1109/TGRS.2021.3054173.
in MCMC inversion, whereas PCE based on globally and locally Gloaguen, E., Marcotte, D., Chouteau, M. & Perroud, H., 2005. Borehole
radar velocity inversion using cokriging and cosimulation, J. Appl. Geo-
defined principal components produce results comparable or close
phys., 57(4), 242–259.
to those obtained using full-physics forward solvers. The methods
Gneiting, T. & Raftery, A.E., 2007. Strictly proper scoring rules, prediction,
presented herein are extendable to other problems involving wave- and estimation, J. Am. Stat. Assoc., 102(477), 359–378.
based physics of similar complexity. Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D.,
Ozair, S., Courville, A. & Bengio, Y., 2020. Generative adversarial net-
AC K N OW L E D G M E N T S works, Commun. ACM, 63(11), 139–144.
Goodman, J. & Weare, J., 2010. Ensemble samplers with affine invariance,
We thank Associate Editor Prof Andrew Valentine (Durham Uni- Commun. Appl. Math. Comput. Sci., 5(1), 65–80.
versity), Prof Thomas Mejer Hansen (Aarhus University), Dr Robin Haario, H., Saksman, E. & Tamminen, J., 2001. An adaptive Metropolis
Thibaut (Ghent University) and the anonymous reviewers whose algorithm, Bernoulli, 7(2), 223–242.
comments helped us improve the paper. Niklas Linde, Macarena Hansen, T.M., Cordua, K.S., Jacobsen, B.H. & Mosegaard, K., 2014.
Amaya and Shiran Levy acknowledge support by the Swiss Na- Accounting for imperfect forward modeling in geophysical inverse
tional Science Foundation (project number: 184574). problems—exemplified for crosshole tomography, Geophysics, 79(3),
H1–H21.
Hansen, T.M., Cordua, K.S., Looms, M.C. & Mosegaard, K., 2013. Sippi: A
D ATA AVA I L A B I L I T Y matlab toolbox for sampling the solution to inverse problems with com-
The data underlying this paper are available upon request to the plex prior information: Part 2—application to crosshole GPR tomography,
Comput. Geosci., 52, 481–492.
corresponding author.
Bayesian tomography via PCE and DGMs 47

Hastings, W.K., 1970. Monte Carlo sampling methods using Markov chains Métivier, D., Vuffray, M. & Misra, S., 2020. Efficient polynomial chaos
and their applications, Biometrika, 57(1), 97–109. expansion for uncertainty quantification in power systems, Electr. Power
Higdon, D., McDonnell, J.D., Schunck, N., Sarich, J. & Wild, S.M., 2015. Syst. Res., 189, doi:10.1016/j.epsr.2020.106791.
A Bayesian approach for parameter estimation and prediction using a Nagel, J.B., 2019. Bayesian Techniques for Inverse Uncertainty Quantifica-
computationally intensive model, J. Phys. G: Nucl. Part. Phys., 42(3), tion, pp. 504, IBK Bericht.
doi:10.1088/0954–3899/42/3/034009. Rasmussen, C.E., 2003. Gaussian processes in machine learning, in Summer
Ho, J., Jain, A. & Abbeel, P., 2020. Denoising diffusion probabilistic models, School on Machine Learning, pp. 63–71, Springer.
Adv. Neural Inform. Process. Syst., 33, 6840–6851. Reynolds, A.C., He, N., Chu, L. & Oliver, D.S., 1996. Reparameteriza-
Husen, S. & Kissling, E., 2001. Local earthquake tomography between tion techniques for generating reservoir descriptions conditioned to vari-
rays and waves: fat ray tomography, Phys. Earth planet. Inter., 123(2–4), ograms and well-test pressure data, SPE J., 1(04), 413–426.
127–147. Sacks, J., Schiller, S.B. & Welch, W.J., 1989. Designs for computer experi-
Irving, J. & Knight, R., 2006. Numerical modeling of ground-penetrating ments, Technometrics, 31(1), 41–47.
radar in 2-D using MATLAB, Comput. Geosci., 32(9), 1247–1258. Strebelle, S., 2002. Conditional simulation of complex geological structures
Jensen, J.M., Jacobsen, B.H. & Christensen-Dalsgaard, J., 2000. Sensitivity using multiple-point statistics, Math. Geol., 34(1), 1–21.
kernels for time-distance inversion, Solar Phys., 192(1), 231–239. Tarantola, A., 2005. Inverse Problem Theory and Methods for Model Pa-
Jetchev, N., Bergmann, U. & Vollgraf, R., 2016. Texture synthesis with rameter Estimation, SIAM.
spatial generative adversarial networks, preprint (arXiv:1611.08207). Thibaut, R., Laloy, E. & Hermans, T., 2021. A new framework for exper-
Jolliffe, I.T. & Cadima, J., 2016. Principal component analysis: a re- imental design using bayesian evidential learning: the case of wellhead

Downloaded from https://academic.oup.com/gji/article/237/1/31/7577609 by guest on 02 April 2024


view and recent developments, Phil. Trans. R. Soc., A, 374(2065), protection area, J. Hydrol., 603, doi:10.1016/j.jhydrol.2021.126903.
doi:10.1098/rsta.2015.0202. Torre, E., Marelli, S., Embrechts, P. & Sudret, B., 2019. Data-driven polyno-
Kingma, D.P. & Welling, M., 2013. Auto-encoding variational bayes, mial chaos expansion for machine learning regression, J. Comput. Phys.,
preprint (arXiv:1312.6114). 388, 601–623.
LaBrecque, D., Alumbaugh, D.L., Yang, X., Paprocki, L. & Brainard, J., Vrugt, J.A., Ter Braak, C., Diks, C., Robinson, B.A., Hyman, J.M. & Hig-
2002. Three-dimensional monitoring of vadose zone infiltration using don, D., 2009. Accelerating Markov Chain Monte Carlo simulation by
electrical resistivity tomography and cross-borehole ground-penetrating differential evolution with self-adaptive randomized subspace sampling,
radar, in Methods in Geochemistry and Geophysics, Vol. 35, pp. 259–272, Int. J. Nonlin. Sci. Numer. Simulat., 10(3), 273–290.
Elsevier. Wagner, P.-R., Fahrni, R., Klippel, M., Frangi, A. & Sudret,
Laloy, E., Hérault, R., Jacques, D. & Linde, N., 2018. Training-image based B., 2020. Bayesian calibration and sensitivity analysis of heat
geostatistical inversion using a spatial generative adversarial neural net- transfer models for fire insulation panels, Eng. Struct., 205,
work, Water Resour. Res., 54(1), 381–406. doi:10.1016/j.engstruct.2019.110063.
Laloy, E., Hérault, R., Lee, J., Jacques, D. & Linde, N., 2017. Inversion Wagner, P.-R., Marelli, S. & Sudret, B., 2021a. Bayesian model in-
using a new low-dimensional representation of complex binary geological version using stochastic spectral embedding, J. Comput. Phys., 436,
media based on a deep neural network, Adv. Water Resour., 110, 387–405. doi:10.1016/j.jcp.2021.110141.
Lataniotis, C., Marelli, S. & Sudret, B., 2020. Extending classical surrogate Wagner, P.-R., Nagel, J., Marelli, S. & Sudret, B., 2021b. UQLab user man-
modeling to high dimensions through supervised dimensionality reduc- ual – Bayesian inversion for model calibration and validation, Technical
tion: a data-driven approach, Int. J. Uncertain. Quantif., 10(1), 55–82. report, Report UQLab-V1.4-113, Chair of Risk, Safety and Uncertainty
Levy, S., Hunziker, J., Laloy, E., Irving, J. & Linde, N., 2022. Using deep Quantification, ETH Zurich, Switzerland.
generative neural networks to account for model errors in Markov chain Xiu, D. & Karniadakis, G.E., 2002. The Wiener-Askey polynomial chaos for
monte carlo inversion, Geophys. J. Int., 228(2), 1098–1118. stochastic differential equations, SIAM J. Sci. Comput., 24(2), 619–644.
Levy, S., Laloy, E. & Linde, N., 2023. Variational Bayesian inference with Zahner, T., Lochbühler, T., Mariethoz, G. & Linde, N., 2016. Image syn-
complex geostatistical priors using inverse autoregressive flows, Comput. thesis with graph cuts: a fast model proposal mechanism in probabilistic
Geosci., 171(C), doi:10.1016/j.cageo.2022.105263. inversion, Geophys. J. Int., 204(2), 1179–1190.
Lopez-Alvis, J., Laloy, E., Nguyen, F. & Hermans, T., 2021. Deep generative
models in inversion: the impact of the generator’s nonlinearity and devel-
opment of a new approach based on a variational autoencoder, Comput. A P P E N D I X : O V E RV I E W O F T H E
Geosci., 152, doi:10.1016/j.cageo.2021.104762. L I M I TAT I O N S O F T H E VA E – P C E
Lüthen, N., Marelli, S. & Sudret, B., 2021. Sparse polynomial chaos expan- A P P ROAC H
sions: literature survey and benchmark, SIAM/ASA J. Uncertain. Quant.,
9(2), 593–649. For the configuration considered in this manuscript, PCEs based on
Marelli, S., Lüthen, N. & Sudret, B., 2021. UQLab user manual – polynomial VAE parameters provide poor accuracy in predicting traveltimes.
chaos expansions, Technical report, Chair of Risk, Safety and Uncertainty When calculated on a representative validation set, an aggregate
Quantification, ETH Zurich, Switzerland, Report # UQLab-V1.4-104. rmse of 2.01 ns is observed for the misfit between reference and
Marelli, S. & Sudret, B., 2014. UQLab: a framework for uncertainty quan- predicted data. For the velocity distribution in Fig. 10, the traveltime
tification in Matlab, in Proceedings of the 2nd International Conference prediction is particularly poor, with an rmse of 3.1 ns. In Fig. A1,
on Vulnerability and Risk Analysis and Management (ICVRAM 2014),
we consider six additional reference velocity fields and the corre-
Liverpool, United Kingdom, 13–16 July 2014, pp. 2554–2563.
Mariethoz, G. & Caers, J., 2014. Multiple-Point Geostatistics: Stochastic
sponding posterior mean images for MCMC inversion when using
Modeling With Training Images, John Wiley & Sons. VAE–PCE surrogate modelling. In some cases the posterior mean
Marzouk, Y. & Xiu, D., 2009. A stochastic collocation approach to Bayesian resembles the reference velocity field well [compare Fig. A1(a)
inference in inverse problems, Commun. Comput. Phys., 6(4), 826–847. to A1(g), or Fig. A1(f) to A1(l)]. However, large differences can
Marzouk, Y.M., Najm, H.N. & Rahn, L.A., 2007. Stochastic spectral meth- arise between the reference and the VAE-PCE posterior mean [e.g.
ods for efficient Bayesian solution of inverse problems, J. Comput. Phys., compare Fig. A1(b) to A1(h), or Fig. A1(e) to A1(k)]. Even if the
224(2), 560–586. cor responding modelling er ror is accounted for in the inversion
Meles, G.A., Linde, N. & Marelli, S., 2022. Bayesian tomography with prior- implying that the posterior mean models should be unbiased, we
knowledge-based parametrization and surrogate modeling, Geophys. J. find that the modelling error has severe impacts by increasing the
Int., 231(1), 673–691.
posterior model uncertainty.
48 G. A. Meles et al.

Downloaded from https://academic.oup.com/gji/article/237/1/31/7577609 by guest on 02 April 2024


Figure A1. Reference velocity fields (a)–(f) and (g)–(l) the corresponding posterior mean images for MCMC inversion based on VAE-PCE surrogate modelling.


C The Author(s) 2024. Published by Oxford University Press on behalf of The Royal Astronomical Society. This is an Open Access
article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which
permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

You might also like