Introduction

Today, there is a growing awareness of the importance of early brain development on health later in life1,2,3,4,5,6,7,8,9 as brain maturation involves complex and interconnected structural and functional processes that can be altered by various genetic and environmental factors. Magnetic resonance imaging (MRI) may be required during pregnancy to investigate equivocal situations as a support for diagnosis and prognosis, but also for postnatal management planning10. In clinical routine, T2-weighted (T2w) fast spin echo (FSE) sequences are used to scan multiple 2D thick slices that provide information on the whole brain volume with a good signal-to-noise ratio (SNR) while minimizing the effects of random fetal motion during acquisition11. However, stochastic movements of the fetus in the womb cause various artefacts in the images, including drops in signal intensity. Post-processing approaches built on motion estimation and correction can compensate for such artefacts. Especially, super-resolution (SR) reconstruction techniques take advantage of the redundancy between low-resolution (LR) series acquired in orthogonal orientations to reconstruct an isotropic high-resolution (HR) volume of the fetal brain with reduced intensity artefacts and motion sensitivity12,13,14,15,16,17. Navigating through the resulting SR volume provides valuable information on the developing brain anatomy, including consistent biometric measurements18,19,20. Besides, fetal brain tissue segmentation is critical for further investigation of brain development, especially for volumetric evaluation21,22,23,24,25.

Manual segmentation is a cumbersome and time-consuming task. Therefore, supervised deep learning approaches that rely on annotated data have emerged as accurate techniques for automated delineation of the fetal brain22,23,24,26. The development and validation of such advanced image processing and analysis methods require access to large-scale data to account for the subject variability, but the number of good quality, exploitable MR acquisitions available in this sensitive cohort remains relatively scarce.

Numerical simulations can mitigate these limitations by providing a controlled environment with a known ground truth for accurate, robust and reproducible research27,28. MR developments often rely on computer simulations that enable pulse sequence design, accurate prototyping and evaluation of new advanced acquisition schemes as well as validation of reconstruction techniques in a controlled setting27,28,29,30,31. Such platforms are also valuable educational tools for physicists and technologists29,30. In this sense, MR simulations can efficiently complement or even replace the design and use of sophisticated experimental phantoms, as well as experiments on animal models or even on human volunteers30. MR simulators can be designed to address multiple challenges, such as system imperfections, multichannel transmission, correction/suppression of image artefacts, and optimization of specific absorption rate (SAR)31,32.

Motion is a major hurdle in various MRI applications, from cardiovascular MR to functional MRI analysis and fetal imaging, as it is responsible for artefacts in the images and can lead to erroneous data analysis and interpretation27,28,30,32. Whereas periodic movements can be directly related to physiological processes such as breathing or a heartbeat, and may therefore be compensated during post-processing, stochastic fetal motion impedes the repeatability of measurements28 and thus hinders retrospective motion correction. The difficulty of estimating such unpredictable movements results in the lack of any ground truth, yet necessary for the validation of new methods32. Numerical phantoms are an interesting alternative that offers a fully scalable and flexible environment where any image-acquisition, -reconstruction, or -processing technique can be evaluated, optimized, and validated from a collection of synthetic, yet realistic data that simulate multiple controlled conditions. Furthermore, the results obtained by these various strategies can be quantitatively compared to each other through simulated reference data based on full-reference image quality assessment metrics such as the mean squared error (MSE), the peak signal-to-noise ratio (PSNR), or even the more perceptual structural similarity index (SSIM)12,33,34.

Paradoxically, the diversity of MRI simulators, each with its own advantages and limitations, jeopardizes the comparison from one setup to another27. In contrast to a simplified analytical description of the MR signal arising from proton isochromats, advanced developments in the field of MRI require more realistic numerical simulations31. Two main approaches have been investigated: on one hand, (i) the analytical numerical formalism is based on a mathematical description of both the anatomy and the MR experiment as in the Shepp–Logan head phantom35, and on the other hand, (ii) voxel-based phantoms are usually derived from segmented clinical acquisitions relevant to the targeted application27. The correspondence between the image and the corresponding k-space is governed by the continuous Fourier transform in analytical numerical models which enables an accurate representation of k-space, whereas it is approximated by its discrete version in their voxel-based equivalents. Analytical numerical phantoms are powerful tools to study k-space truncation artefacts while voxel-based simulations can be used to also model physiological processes and motion that may alter the acquisitions27,28. However, spatial and temporal resolutions depend on the original images from which voxel-based phantoms are derived, which may limit their use in reproducibility studies.

Hybrid phantoms have been developed to leverage both approaches and overcome their respective limitations, resulting in versatile and realistic models27,28. Depending on the targeted application, advanced phantoms can be built on these models to include more features as in the case of the fetal extended Cardiac-Torso (XCAT) cardiovascular magnetic resonance imaging (Fetal XCMR) phantom that combines two independent XCAT models of both the anatomy and physiology of a mother and her baby with a simulation framework for 2D cardiovascular magnetic resonance (CMR) acquisitions28. Of note, realistic MRI simulations are hampered by the high computational burden associated with the 3D representation of the simulated object and the introduction of motion during acquisition30,31. Although the parallelization of calculations on computer clusters allows to speed up the simulations, such advanced infrastructures are not always available.

Various advanced MRI simulation platforms31,36,37 are based on the numerical resolution of Bloch equations which may be demanding due to the need to compute a numerical solution for each resonant frequency within a single voxel. The extended phase graph (EPG) concept38,39,40 is a surrogate for Bloch equations to describe the magnetization response to various MR pulse sequences, including complex acquisition schemes that involve multiple radiofrequency (RF) pulses and gradients. Modeling the evolution of spin magnetization depending on tissue properties allows to gain insight into the obtained MR signal and to evaluate its behavior. Originally suggested to assess signal intensities in multi-echo experiments with variable flip angles41, the EPG algorithm has aroused growing interest in recent years, especially for precise characterization of echoes42,43,44 and diffusion effects45, evaluation of physics-constrained reconstruction methods46, sequence pulse design38,47,48 and quantitative MRI techniques49,50,51. The EPG formalism relies on the Fourier representation of the evolution of spin magnetization within a voxel after application of various RF pulses and gradients. It assumes that a single set of relaxation parameters can characterize a given tissue40. The Fourier series are used to account for the multiple resonant frequencies that may arise from local magnetic field inhomogeneities within a voxel. Thus, the EPG algorithm is able to provide fast and accurate simulations.

To our knowledge, there is no simulation framework for fetal brain MRI. In this work, we present FaBiAN, an open-source Fetal Brain magnetic resonance Acquisition Numerical phantom that simulates T2w images of the in utero developing brain built on segmented anatomical images from a previously published normative spatiotemporal HR MRI atlas of the fetal brain10. FaBiAN relies on the EPG formalism38,39,40 of the signal formation to simulate FSE acquisitions of the fetal brain, in this case Half-Fourier Acquisition Single-shot Turbo spin Echo (HASTE, Siemens Healthineers) and Single-Shot Fast Spin Echo (SS-FSE, GE Healthcare) sequences, based on a flexible and realistic setup that accounts for intensity non-uniformity fields and stochastic fetal motion. We investigate the capabilities of the developed framework and provide a proof of concept of its practical value in two key application examples. First, we generate LR images with multiple levels of motion to fine-tune a SR reconstruction algorithm16,52. Then, we explore the potential of using multiple fetal brain images simulated at different gestational ages (GA) with several motion amplitudes and a variety of acquisition parameters for data augmentation in fetal brain tissue segmentation.

Methods

Numerical implementation of FSE sequences

Figure 1 provides an overview of the workflow implemented in MATLAB (MathWorks, R2019a) to simulate T2w fetal brain images acquired using clinical FSE sequences. (i) High-resolution anatomical images from a normative spatiotemporal MRI atlas10 are used as a model of normal fetal brain. (ii) Segmented brain tissues are organized into gray matter, white matter and cerebrospinal fluid as shown in Table 1, and (iii) are assigned relaxometry properties from the literature at 1.5 T53,54,55,56,57 or 3 T58,59,60,61,62 accordingly. (iv) The T2 decay over time is computed in every voxel of the HR anatomical images from the sequence parameters using the EPG formalism39,40. (v) The Fourier domain, or k-space, of the simulated images is sampled from the T2 decay matrix to reflect the process of FSE acquisition in the presence of random rigid motion. (vi) The final simulated images are recovered by a 2D inverse Fourier transform.

The clinical implementation of FSE sequences can vary from one MR vendor to another, but also according to the technical characteristics and hardware limitations of the MR scanner. For instance, the gradients are characterized by their slew rate that directly impacts the echo time. Especially in the context of fetal imaging, different SAR limits may be enforced, resulting in various flip angles. Furthermore, a specific clinical indication may require adjusting standard acquisition parameters, for example by increasing the effective echo time to enhance the contrast between the subplate and surrounding structures, and investigate pathologies of the cerebrospinal fluid. Therefore, we have developed this open-source numerical phantom with the idea of keeping the framework as general as possible to enable users a large flexibility in the type of images simulated. As such, multiple acquisition parameters can be set up with respect to the MR contrast (effective echo time, excitation/refocusing pulse flip angles, echo spacing, echo train length), the geometry (number of 2D slices, slice orientation, slice thickness, slice gap, slice position, phase oversampling), the resolution (field-of-view, matrix size), the resort to any acceleration technique (acceleration factor, number of reference lines) or to scanner interpolation, as well as other settings related to the age of the fetus, the RF transmit field inhomogeneities, the amplitude of random fetal motion in the three main directions, and the SNR. Table 2 provides the ranges of values used in this study to simulate T2w images of the fetal brain as they are acquired in clinical routine at two sites: Lausanne University Hospital (CHUV, HASTE sequence) and University Children’s Hospital Zurich (Kispi, SS-FSE sequence). Additionally, Supplementary Table S1 reports the ranges of values that are relevant to an application in fetal brain MRI, thus consistent with common clinical protocols, and that were tested during the development of this first prototype. FaBiAN allows to transpose the main differences between the HASTE and SS-FSE protocols related to timing (effective echo time and echo spacing), geometry (slice thickness, slice gap, and phase oversampling), and spatial resolution (field-of-view and acquisition/reconstruction matrix). Irrespective of the MR vendor, the slice profile is modeled by a 2D Gaussian function with the full width at half maximum equal to the slice thickness in the slice-selection direction16. The sampling trajectory is assumed to be the same in both protocols.

The entire simulation pipeline is described in detail in the following.

Figure 1
figure 1

Workflow for simulating images of the fetal brain acquired by a fast spin echo (FSE) sequence (i) from segmented HR anatomical MR images10, illustrated for a fetus of 30 weeks of GA. (ii) Brain tissues are classified into gray matter, white matter and cerebrospinal fluid. (iii) Anatomical structures are converted to the corresponding MR contrast to obtain reference T1 and T2 maps of the fetal brain at either 1.5 or 3 T. (iv) The EPG algorithm allows to accurately simulate the T2 decay over time in every brain voxel by accounting for the effects of the stimulated echoes, as highlighted by the enlargement of the beginning of the curve. This spatiotemporal information is subsequently used (v) to sample the Fourier domain of the simulated images of the moving fetus. After the addition of noise to match the SNR of real clinical acquisitions, (vi) FSE images of the fetal brain are eventually recovered by 2D inverse Fourier transform (2D \(FT^{-1}\)).

Table 1 Classification of segmented brain tissues10 as gray matter, white matter and cerebrospinal fluid.
Table 2 The flexibility of FaBiAN is illustrated by the number of sequence parameters and settings available to the user.

Fetal brain model and MR properties

Our numerical phantom is based on segmented 0.8-mm-isotropic anatomical images (Fig. 1-(i)) from the normative spatiotemporal MRI atlas of the developing brain built by Gholipour and colleagues from normal fetuses scanned between 19 and 39 weeks of gestation10. Due to the lack for ground truth relaxometry measurements in the fetal brain, all thirty-four segmented tissues are merged into three classes according to medical experts: gray matter, white matter and cerebrospinal fluid (Fig. 1-(ii) and Table 1). Corresponding T1 and T2 relaxation times at 1.5 T53,54,55,56,57 are assigned to these anatomical structures to obtain reference T1 and T2 maps, respectively (Fig. 1-(iii)). The value of these relaxometry properties at 3 T is estimated from the literature58,59,60,61,62, assuming that doubling the magnetic field strength from 1.5 to 3 T increases the T1 relaxation time by approximately 25% in gray matter and 10% in both white matter and cerebrospinal fluid, while the T2 properties remain unchanged.

Intensity non-uniformity (INU) fields

Non-linear slowly-varying INU fields due to transmit field inhomogeneities (B1+) are based on BrainWeb estimations from real scans to simulate T2w images63,64,65,66,67. The available 20% INU version is resized to fit the dimensions of the atlas images and normalized by 1.2 to provide multiplicative fields in the range of 0.8 to 1.2 over the brain area. It is subsampled to a 0.1-mm resolution with linear interpolation in the slice thickness orientation in order to account for B1 bias field variations across the slice profile.

EPG formalism

The evolution of thousands of spin systems (i.e. isochromats) over time is commonly studied using Bloch equations to characterize echoes. However, this method is computationally expensive, because the equations must be solved for each isochromat, and not fully accurate to determine the echo intensity. Therefore, and since FSE sequences involve multiple successive RF pulses, the EPG concept seems particularly relevant to our application. Indeed, it combines the configuration state and partition state approaches to efficiently account for the evolution of a significant ensemble of isochromats, and thus accurately characterize echoes (type, intensity, timing)39,68 (see Supplementary Method S1, Supplementary Fig. S1 for further details on the EPG formalism).

The EPG algorithm39 simulates the T2 decay in every voxel of the anatomical images over each echo train (Fig. 1-(iv)) based on the FSE sequence pulse design. Along with reference T1 and T2 maps of the fetal brain, the following sequence parameters are fed into the EPG estimation algorithm: the effective echo time, the excitation and refocusing pulse flip angles that modulate the realistic INU fields described above, resulting in spatially varying flip angles, and the echo train length, namely the number of repetitions of RF refocusing flip angles. The resulting 4D matrix that combines information about both the anatomy and the magnetic relaxation properties of the fetal brain is hereafter referred to as the T2 decay matrix.

K-space sampling and image formation

The T2 decay matrix is Fourier-transformed and subsequently used for k-space sampling of the simulated images. For a given echo time (TE), at most one line from the associated Fourier domain of the T2 decay matrix is used, with the central line corresponding to the effective TE. If an acceleration technique such as GRAPPA interpolation is required to decrease the scanning time as in the HASTE sequence acquired in clinical routine at CHUV, multiple reference lines are consecutively sampled around the center of k-space. Beyond, one line out of two is actually needed to simulate an acceleration factor of two. As a first approximation, these sampled lines are copied to replace the missing lines. According to partial Fourier imaging techniques, the properties of Hermitian symmetry in the frequency domain are used to fill the entire k-space. While intra-slice motion can be neglected in FSE sequences, inter-slice random 3D translation and rotation of the fetal brain are implemented during k-space sampling (Fig. 1-(v)). If needed, zero-interpolation filling (ZIP) is performed by filling the edges of the simulated k-space with zeros in order to reach the desired reconstruction matrix size, as for the high in-plane resolution SS-FSE images acquired at Kispi. Data in k-space are previously processed using a Fermi low-pass filter with a radius of 0.85 and a width of 1/23 to avoid Gibbs ringing artefacts69,70. Complex Gaussian noise (mean, 0; standard deviation, 0.15 for the HASTE, 0.01 for the SS-FSE implementation respectively) is added to simulate thermal noise generated during the acquisition process and qualitatively match the SNR of clinical data. The simulated images are eventually recovered by 2D inverse Fourier transform (Fig. 1-(vi)).

With the aim of replicating the clinical protocol for fetal brain MRI, FSE acquisitions are simulated in the three orthogonal orientations. Besides, the position of the field-of-view is slightly shifted by \(\pm\, 1.6\) mm in the slice thickness orientation to produce additional partially-overlapping datasets in each orientation.

Fetal motion

The amplitude of typical fetal movements is estimated from clinical data71,72. Three levels are defined accordingly for little, moderate and strong motion of the fetus, with a maximum of 5% corrupted slices over the fetal brain volume. They are characterized by a uniform distribution of respectively \([-1,1]\) mm, \([-3,3]\) mm and \([-4,4]\) mm for independent translation in every direction and \([-2,2]\)° , \([-5,5]\)° and \([-8,8]\)° for 3D rotation (Fig. 1-(v)).

Computational specifications

Since the addition of 3D motion during k-space sampling is expensive in computing memory, the simulations are run on 16 CPU workers in parallel with 20 GB of RAM each. After investigation, the amount of resources allocated to run such simulations can be reduced to 13 CPU workers in parallel with 20 GB of RAM each.

Simulating data from clinical MR acquisitions

Clinical datasets

The data used in this study were acquired in earlier studies in accordance with the relevant guidelines and regulations, under the supervision of Ethics Boards composed of representatives at different levels (hospitals, cantons, and federal state). Mothers of all fetuses included in the current work provided written informed consent for the re-use of their data for research purposes. Clinical cases are used as typical acquisition examples to generate realistic synthetic images of the fetal brain throughout development and to visually compare the quality of the simulated images.

Thirteen healthy subjects in the GA range of 21 to 33 weeks (\(27.0 \pm\, 3.85\) weeks) were scanned at CHUV as part of a larger institutional research protocol approved by the ethical committee of the Canton of Vaud, Switzerland (CER-VD, decision number: 2021-00124). In particular, these clinical cases are used to showcase the implemented pipeline and the realistic appearance of the corresponding synthetic HASTE images, as well as to explore the potential of FaBiAN in optimizing SR fetal brain MRI. For this purpose, clinical acquisitions were reconstructed offline using the docker version of the MIAL Super-Resolution Toolkit16,52.

Fifteen subjects (thirteen neurotypical subjects and two subjects with light ventriculomegaly) from the Fetal Tissue Annotation Dataset (FeTA)25 in the GA range of 21 to 34.6 weeks (\(27.5 \pm\, 4.46\) weeks) were scanned at Kispi. Their inclusion in research studies was approved by the ethical committee of the Canton of Zurich, Switzerland (KEK, decision number: 2016-01019). The original LR series of each subject were combined with each other a posteriori and offline to form a SR volume of the fetal brain using a simplified version of the Image Registration Toolkit (SIMPLE IRTK)14 under Licence from Ixico Ltd. The resulting SR reconstructions were manually segmented according to the FeTA annotation guidelines25. In particular, the clinical cases from Kispi allow to extend FaBiAN to generate multiple SS-FSE images of the fetal brain with various settings. We investigate if these simulated images are realistic enough to replace part of the original data in the training phase of a deep learning network for fetal brain tissue segmentation, and to complement a clinical dataset to improve the performance of the segmentation algorithm.

Clinical MRI protocol

Typical fetal brain HASTE images are acquired on patients at 1.5 T (MAGNETOM Aera, Siemens Healthcare, Erlangen, Germany) with an 18-channel body coil and a 32-channel spine coil at CHUV. At least three T2w series (3–11 series, \(6.69 \pm\, 2.14\) series) of 2D thick slices are acquired in three orthogonal orientations (axial, coronal and sagittal) with respect to the fetal brain using an ultra-fast multi-slice HASTE sequence (TR/TE, 1200 ms/90 ms; flip angle, 90°; echo train length, 224; echo spacing, 4.08 ms; field-of-view, 360 \(\times\) 360 mm\(^2\); voxel size, 1.13 \(\times\) 1.13 \(\times\) 3.00 mm\(^3\); inter-slice gap, 10%)20,73. Twenty-two to thirty slices are needed to cover the whole fetal brain depending on the GA (between 21 and 33 weeks) and size of the fetus, which corresponds to an acquisition time of between 26 to 36 seconds.

Fetal brain SS-FSE images are acquired on patients on either a 1.5-T or 3-T clinical GE whole-body scanner (Signa Discovery MR450 or MR750), either using an 8-channel cardiac coil or body coil, at Kispi. At least three series (4–6 series, \(5.53 \pm\, 0.64\) series) of 2D thick slices are acquired in three orthogonal orientations (axial, coronal and sagittal) with respect to the fetal brain using a T2w SS-FSE sequence (TR/TE, 3000–3200 ms/116.032–124.08 ms; flip angle, 90°; echo train length, 224; echo spacing, 3–3.2 ms; field-of-view, from \(240 \times 240\, \text {mm}^2\) to \(300 \times 300 \,\text {mm}^2\); slice thickness, 3.00–4.00 mm; acquisition matrix, 1.5 T: \(256 \times 224\, \text{voxels}^2\), 3 T: \(320 \times 224\, \text{voxels}^2\); reconstruction matrix, \(512 \times 512 \,\text{voxels}^2\); isotropic in-plane resolution, from \(0.47 \times 0.47 \,\text{mm}^2\) to \(0.59 \times 0.59\, \text{mm}^2\))25. Twenty-four to forty-three slices are needed to cover the whole fetal brain depending on the GA (between 21 and 34.6 weeks) and size of the fetus, which corresponds to an acquisition time of between one to two minutes.

The position of the field-of-view is slightly shifted in the slice thickness orientation to acquire additional data with some redundancy. In clinical practice, a total of six partially-overlapping LR series are commonly acquired in the three orthogonal orientations for subsequent SR reconstruction of the fetal brain.

Simulated datasets

The general framework presented in this paper makes it possible to simulate the clinical acquisition schemes described above for different MR vendors, at various magnetic field strengths, and with realistic SNR and amplitude of fetal movements.

The clinical cases from CHUV are used as representative examples of fetal brain HASTE acquisitions: the corresponding sequence parameters are replicated to simulate HASTE images of the fetal brain at various GA in the GA range of 21 to 33 weeks (\(27.8 \pm\, 3.74\) weeks). The amplitude of fetal movements in clinical acquisitions is assessed by an engineer expert in MR image analysis to ensure a similar level of motion in the simulated images. Besides, a 3D HR 1.1-mm-isotropic HASTE image of the fetal brain is simulated without noise or motion to serve as a reference for the quantitative evaluation of SR reconstructions from simulated LR 1.1-mm-in-plane HASTE images using the docker version of the MIAL Super-Resolution Toolkit16,52.

SS-FSE images of the fetal brain are simulated for fifteen subjects in the GA range of 21 to 35 weeks (\(28.0 \pm\, 4.47\) weeks) at either 1.5 T or 3 T. We reproduce the same acquisition parameters and geometry as in the clinical dataset from Kispi. For every subject, three partially-overlapping series are simulated in each of the three orthogonal orientations, two with little motion and one with moderate motion.

Table 3 reports the main characteristics of the original clinical acquisitions and the simulated data according to the MR vendor and the main magnetic field strength.

Table 3 Number of subjects, either scanned or simulated, considered throughout this study and distribution of gestational age (GA) according to the MR vendor and the main magnetic field strength.

Motion index

An engineer in biomedical imaging visually inspected all the clinical data retrospectively. 46% of the MR images from CHUV were rated as corrupted by little motion, 38% by moderate motion, and the remaining 16% by strong motion. In the case of MR images from Kispi, 53% were rated as corrupted by little motion, 35% by moderate motion, and the remaining 12% by strong motion. A motion index is defined to support the assessment of the level of fetal movements in clinical acquisitions and simulations based on binary masks that are drawn on each LR series to cover the whole fetal brain volume using ITK-SNAP74. The motion index is estimated from tracking the displacement of the centroids of the 2D brain masks. It is computed as the sum of the variances of the 2D brain mask centroid coordinates of adjacent slices over the central third of the 3D brain mask, normalized by the number of slices considered (see Supplementary Method S2). An index less than 0.5 mm stands for little motion, in the range of [0.5,1]mm for moderate motion, and larger than 1 mm for strong motion. This motion index correlates well with the assessment of motion in multiple MR images from CHUV by an engineer expert in MR image analysis. It is used in the following to estimate the amplitude of fetal movements in clinical acquisitions and ensure a similar level of motion in the simulated images.

Qualitative assessment

A neuroradiologist and a pediatric (neuro)radiologist provided independent qualitative assessment of the HASTE images of the fetal brain simulated in the GA range of 21 to 33 weeks, in the three orthogonal orientations and with various levels of motion. Special attention was paid to the MR contrast between brain tissues, the SNR, the consistency of the brain anatomy, the delineation and sharpness of the structures of diagnostic interest (i.e., corpus callosum, vermis, brain stem, lateral ventricles, cortex, white matter, pituitary gland, etc.) that are analyzed in clinical routine, as well as characteristic motion artefacts. In a second step, six months later, the same neuroradiologist (Rater 1) and pediatric (neuro)radiologist (Rater 2) went through most of the simulated images, including SS-FSE images, and evaluated their realistic appearance and their similarity to MR images acquired in clinical routine based on a quality index between 0 and 2: 0 corresponds to data that do not look realistic, 1 to quite realistic data, meaning they have similarities with clinical acquisitions, and 2 to highly realistic data.

To facilitate visualization and qualitative comparison of images throughout this manuscript, simulated brain images were co-registered with clinical acquisitions of fetuses at the same GA and with equivalent level of motion. For this purpose, at least five landmarks were manually defined over the fetal brain volume using the landmark registration of 3D Slicer75.

Application 1: Super-resolution reconstruction

Implementation of SR reconstruction

Orthogonal T2w LR HASTE series contain redundant information that enables the subsequent reconstruction of a 3D HR volume. Clinical acquisitions, respectively simulated images, are combined into a motion-free 3D image \({\hat{\mathbf{X}}}\) using the Total Variation (TV) SR reconstruction algorithm16,52 which solves:

$$\begin{aligned} \begin{aligned} {\hat{\mathbf {X}}} = \arg \min _{{\mathbf {X}}} \ \frac{\lambda }{2} \sum _{kl} \Vert \underbrace{{\mathbf {D}}_{kl}{\mathbf {B}}_{kl}{\mathbf {M}}_{kl}}_{{\mathbf {H}}_{kl}} {\mathbf {X}} - {\mathbf {X}}_{kl}^{LR}\Vert ^2 + \Vert {\mathbf {X}}\Vert _{TV}, \end{aligned} \end{aligned}$$
(1)

where the first term relates to data fidelity with k being the k-th LR series \({\mathbf {X}}^{LR}\) and l the l-th slice, \(\Vert {\mathbf {X}}\Vert _{TV}\) is a TV prior introduced to regularize the solution, and \(\lambda\) balances the trade-off between data fidelity and regularization terms (default setting, \(\lambda\) = 0.75). \({\mathbf {D}}\) and \({\mathbf {B}}\) are linear downsampling and Gaussian blurring operators given by the acquisition characteristics. \({\mathbf {M}}\) encodes the rigid motion of slices.

Regularization setting

LR HASTE images of the fetal brain are simulated to mimic clinical HASTE acquisitions of three subjects of 26, 30 and 33 weeks of GA respectively, which are characterized by a high SNR, the presence of a realistic low to moderate bias field, and variable amplitudes of fetal movements, from little to strong. Particular attention is paid to ensuring that the motion level is respected. For each subject, a SR volume of the fetal brain is reconstructed from the various orthogonal acquisitions, either real or simulated (six, respectively seven and eight series were acquired and are further simulated in the subject of 26, respectively 30 and 33 weeks of GA), with different values of \(\lambda\) (0.1, 0.3, 0.5, 0.75, 1.5, 3) to study the potential of FaBiAN in optimizing the quality of the SR reconstruction in a clinical setup. Indeed, the weight \(\lambda\) is a sensitive hyper-parameter in SR methods that are based on solving an inverse problem. A quantitative analysis is conducted on the resulting SR reconstructions to determine the value of \(\lambda\) that provides the sharpest reconstruction of the fetal brain with high SNR, namely the smallest normalized root mean squared error (NRMSE) with respect to the corresponding synthetic 3D HR ground truth.

Number of LR series: an SNR and motion case study

Static LR HASTE images of the fetal brain (GA of 30 weeks) are recovered after the addition of various levels of complex Gaussian noise (mean, 0; standard deviation, 0.07, 0.15 or 0.3) to k-space. A standard deviation (SD) of 0.15 results in synthetic images that closely resemble clinical acquisitions. Six independent realizations of the 2D series are generated for each SNR. A SR volume of the fetal brain is reconstructed from three, six and nine orthogonal LR HASTE series using the different realizations. In total, fifty-four SR volumes are reconstructed.

Additional LR HASTE acquisitions of the fetal brain (GA: 30 weeks, SD of noise: 0.15) are simulated with inter-slice motion. The impact of little and moderate movements of the fetus on SR reconstructions from various numbers of LR series is studied using a reference series without motion and with little amplitude of fetal motion respectively.

The different configurations are compared to each other by computing the NRMSE, the local SSIM and its mean (MSSIM) over the image with respect to the 3D isotropic ground truth. The latter are computed using Matlab ssim function with the SD of the isotropic Gaussian function that determines the weights of the pixels in a neighborhood to estimate local statistics set to its default value (1.5), and the dynamic range of the normalized input images to 255.

Application 2: Data augmentation for automated fetal brain tissue segmentation

Supplementary Figure S2 displays the histograms of the distribution of (a) gestational age (from 21 to 35 weeks, \(27.8 \pm\, 4.40\) weeks) across the original clinical cases and simulated subjects involved in this experiment, as well as (b) in-plane isotropic resolution in the corresponding LR images according to the main magnetic field strength.

LR SS-FSE images of the fetal brain are simulated based on the MR sequence parameters and acquisition settings extracted from the clinical cases scanned at Kispi to mimic typical MR acquisitions, thus minimizing confounding factors. The synthetic images are interpolated to \(0.8594 \,\text {mm} \times 0.8594\, \text {mm}\) in the in-plane direction to match the resolution of the clinical SR reconstructions. Label maps are automatically generated from the simulation framework using a nearest-neighbour interpolation. Two different labels are assigned to the ventricular system (lateral, third and fourth ventricles) and to the extra-axial cerebrospinal fluid spaces in the SR reconstructions from Kispi subjects25, whereas the segmented HR anatomical images from which our simulations are derived distinguish the lateral ventricles from the cerebrospinal fluid10. For consistency between brain annotations, we merge the labels that correspond to structures from the ventricular system and the cerebrospinal fluid spaces in both models.

Network architecture

We designed a convolutional neural network based on the well established U-Net architecture76 for biomedical semantic image segmentation, as it recently proved its ability to perform well for 2D fetal brain MRI tissue segmentation22. The baseline 2D U-Net is trained using a hybrid loss function defined as the sum of a categorical cross-entropy and a Dice loss. The latter is intended to mitigate any imbalance in the samples of the different classes22,77.

The implementation is performed in the framework of TensorFlow 2.578 and an Nvidia GeForce RTX 2080 GPU is deployed for training.

Experimental design

As indicated in Table 4, we run two independent experiments:

  • Experiment 1: It aims at investigating whether the simulated images are realistic enough to substitute for clinical data. We compare three configurations that combine different proportions of subjects with SR volumes from clinical MR acquisitions and subjects with simulated LR images: a baseline (A) that consists of real data only (fifteen subjects), a configuration (B) that gathers ten original subjects and five simulated ones, a configuration (C) with eight original subjects and seven simulated ones. Real data remain predominant in all configurations studied.

  • Experiment 2: If the MR images generated using FaBiAN are close enough to real data, we hypothesize that they could complement a small clinical dataset. In this setting, synthetic images are used as additional data to augment the training dataset. The configuration (D) complements the fifteen original subjects from the baseline (A) with fifteen simulated ones. It is compared to a conventional data augmentation strategy (E) that includes only the fifteen original subjects from the baseline (A).

Since we could not automatically propagate the annotations on the simulated images to the corresponding SR reconstruction, and conversely the labels in the SR reconstruction from clinical acquisitions to the original LR 2D series, three LR series are needed to replace one SR volume. The segmentation is computed on skull-stripped images to process only the voxels within the intracranial volume.

Table 4 Two experiments are presented along with the different configurations studied to compare the performance of an algorithm for automated fetal brain tissue segmentation.

Training strategy

Networks are fed with 64-by-64 overlapping patches in the axial orientation. To avoid any bias, we ensure an equivalent number of 2D patches between an SR-reconstructed clinical case and three LR axial series simulated for a given subject. Intensities of all image patches are standardized.

  • Experiment 1: Each patch is repeated once using minimal data augmentation, i.e., random flip and rotation of the patches (n times, by \(n\times 90\)\(^\circ\), \(n\in \llbracket 0,3\rrbracket\)).

  • Experiment 2: To maintain a similar number of input samples, each patch from configuration (D) is repeated once, while patches from configuration (E) are repeated three times. (D) is only augmented through random flip and rotation, while more extensive conventional augmentation strategies used in fetal brain MRI segmentation25, which also include random Gaussian noise, random gamma, and random bias field, are applied in configuration (E)79.

Five-fold cross-validation approaches (training sets: 12 subjects, validation sets: 3 subjects) are used to determine the epochs for the learning rate decay in each configuration.

Analysis

The performance of the fetal brain tissue segmentation networks is evaluated based on the Dice similarity coefficient (DSC)80 that quantifies the overlap between the predicted segmentation and the manually-annotated ground truth. The performance metrics are assessed on the validation sets. We report the average over all folds. Statistical significance is given between both data augmentation strategies (configurations (D) and (E)) in Experiment 2. P-values of Wilcoxon rank sum test for individual fetal brain tissue segmentation are adjusted for multiple comparisons using Bonferroni correction. P \(< 0.05\) is considered statistically significant.

Results

Computational performance

As highlighted in Fig. 1 for a fetus of 30 weeks of GA whose brain is covered by twenty-five slices, the computation time to convert segmented HR anatomical images of the fetal brain to MR contrast is in the order of one second. EPG simulations are run in every voxel of the 3D HR anatomical images in less than four minutes. For one axial series with either little, moderate or strong motion, k-space sampling takes less than seven minutes for HASTE images simulated with an acceleration factor of two, respectively less than eight minutes for SS-FSE images simulated without using any acceleration technique.

Qualitative assessment

Figure 2 illustrates the close resemblance between simulated HASTE images of the fetal brain and clinical MR acquisitions in terms of MR contrast between tissues, SNR, brain anatomy and relative proportions across development for representative subjects in the GA range of 23 to 32 weeks, as well as typical out-of-plane motion patterns related to the interleaved slice acquisition scheme. A neuroradiologist and a pediatric (neuro)radiologist report a good contrast between gray and white matter, which is important to investigate the cortex continuity and identify the deep gray nuclei as well as any neuronal migration defect. The radiologists also notice good SNR in the different series and report proper visualization of the main anatomical structures: the four ventricles, the corpus callosum, the vermis, the cerebellum, even sometimes the fornix. Besides, the experts are able to monitor the evolution of normal gyration throughout gestation. However, the radiologists point out that small structures such as the pituitary gland, the chiasma, the recesses of the third ventricle, and the vermis folds that look part of the cerebellum, are more difficult to observe. The cortical ribbon is clearly visible but quite pixelated, which is likely to complicate the diagnosis of polymicrogyria. White matter appears too homogeneous, which makes its multilayer aspect barely distinguishable, with an MR signal that is constant across GA, thus preventing physicians from exploring the myelination process throughout brain maturation.

Further investigation is illustrated in Table 5 based on a quality index, where Rater 1 refers to the neuroradiologist and Rater 2 to the pediatric (neuro)radiologist. Most images are evaluated as highly realistic, or at least quite realistic: 97%, respectively 86% of the HASTE images are evaluated with an index greater than or equal to 1 by Rater 1, respectively Rater 2, and 92%, respectively 78% of the SS-FSE images are evaluated with an index greater than or equal to 1 by Rater 1, respectively Rater 2. The better evaluation of the simulated HASTE images may be related to the fact that the raters are used to analyzing HASTE acquisitions in their daily practice rather than SS-FSE scans. Rater 1 emphasizes that synthetic images look even more realistic in older fetuses. Rater 1 also highlights that FaBiAN reproduces incredibly well fetal movements as they occur in clinical practice. It is worth noting that this index is intended to better reflect the realism of the simulated images rather than the actual image quality. However, especially for those corrupted by high levels of fetal motion, the low quality of the resulting images may impact the overall appreciation of the raters. Indeed, the HASTE images that have been evaluated with an index of 0 are, in most cases (\(> 83\%\)), corrupted by a moderate to strong level of motion. More than half of the SS-FSE images that have been evaluated as not realistic are simulated in the sagittal orientation. Rater 2 points out that sagittal views of fetuses of GA between 32 and 35 weeks show cerebrospinal fluid above the posterior part of the corpus callosum, which is not as pronounced in clinical images. In addition, Rater 2 explains that images with significant duplication of a structure are also rated at 0 because such artefacts do not appear as prominent in clinical practice, but either the image is not in a strict orientation or part of it is darker due to signal drops.

Figure 2
figure 2

Visual inspection and comparison between clinical MR acquisitions and representative simulated HASTE images of the fetal brain in the three orthogonal orientations at four different GA (23, 26, 30 and 32 weeks). The amplitude of movement of the fetus is indicated from the motion index computation. Red arrows point out typical out-of-plane motion patterns.

Table 5 Independent evaluation of the realism of the images generated using FaBiAN based on a quality index.

Application 1: Super-resolution reconstruction

Regularization setting

Thanks to its controlled environment, FaBiAN makes it possible to adjust the parameter \(\lambda\) for optimal SR reconstruction with respect to a simulated 3D isotropic HR ground truth of the fetal brain. Figure 3 explores the quality of SR fetal brain MRI from LR HASTE images corrupted by motion depending on the weight of TV regularization in two subjects of 26 and 30 weeks of GA respectively. Based on the simulations, a high level of regularization (\(\lambda = 0.1\)) provides a blurry SR reconstruction with poor contrast between the various structures of the fetal brain, especially in the deep gray nuclei and the cortical plate. In addition, the cerebrospinal fluid appears brighter than in the reference image. A low level of regularization (\(\lambda = 3\)) leads to a better tissue contrast but increases the overall amount of noise in the resulting SR reconstruction. A fine-tuned regularization (\(\lambda = 0.75\)) provides a sharp reconstruction of the fetal brain with a high SNR and a tissue contrast close to the one displayed in the reference image. In the SR images reconstructed from clinical LR HASTE series altered by a little-to-moderate level of motion, as in the simulations, the structure of the corpus callosum and the delineation of the cortex are especially well defined for appropriate TV regularization (\(\lambda = 0.75\)), leading to high-SNR HR images of the fetal brain. Although the NRMSE between SR reconstructions from simulated HASTE images and the corresponding ground truth are close to each other for a given GA across the various weights studied, Fig. 4 shows that the error is systematically minimal for \(\lambda = 0.75\), which further supports this parameter setting for optimal SR reconstruction of the fetal brain from this type of MR images.

Figure 3
figure 3

Appreciation of the quality of SR reconstruction depending on the weight \(\lambda\) that controls the strength of the TV regularization. The potential of our framework FaBiAN for optimizing the reconstruction quality through parameter fine-tuning in the presence of motion is illustrated at two GA: 26 and 30 weeks. Two representative clinical cases are provided for comparison. The results for three values of \(\lambda\) are presented. For \(\lambda = 0.1\), the SR reconstruction looks blurry with poor tissue contrast. Using \(\lambda = 3\) improves the contrast but the images look noisy. For \(\lambda = 0.75\), the SR reconstruction is sharp with a contrast between different brain tissues similar to that observed in the 3D isotropic ground truth. Clinical cases from which the simulated HASTE images are derived highlight the accuracy of a SR reconstruction for this intermediate value of \(\lambda\), especially with regards to the definition of the corpus callosum and the delineation of the cortex.

Figure 4
figure 4

Normalized root mean squared error (NRMSE) between SR reconstructions from simulated data at a GA of 26, 30 and 33 weeks respectively and the corresponding 3D HR ground truth depending on the weight \(\lambda\) of the TV regularization. Six values of \(\lambda\) are tested: 0.1, 0.3, 0.5, 0.75, 1.5 and 3. The NRMSE is minimal for \(\lambda\) = 0.75.

Number of LR series: an SNR and motion case study

Figure 5 shows the NRMSE and the MSSIM between SR reconstructions from different numbers of orthogonal LR HASTE series simulated with various SNR or variable amplitude of movements and a 3D HR reference81.

Figure 5
figure 5

(a) Normalized root mean squared error (NRMSE) and (b) mean structural similarity index (MSSIM) between SR reconstructions from different numbers of orthogonal LR HASTE series simulated at a GA of 30 weeks and the corresponding static 3D HR ground truth. The left panel shows results for motion-free data with various noise levels, a SD of 0.15 leading to a similar appearance as in clinical acquisitions. The right panel illustrates how the algorithm performs depending on the amplitude of fetal movements in the input series.

In the case of static data (Fig. 5-left panel), the NRMSE decreases when increasing the number of series used in the SR reconstruction. According to both the NRMSE and the MSSIM, the quality of the SR reconstructions resulting from simulated images with an SNR close to that observed in clinical acquisitions and from synthetic images with a distribution of complex Gaussian noise of twice less SD is similar. Noisier images lead to a slight decrease in the NRMSE, but also in the MSSIM which in turn increases with the number of series.

The stronger the movements in the LR series, the higher the NRMSE of the resulting SR reconstruction. The addition of motion-corrupted LR series to reconstruct a SR volume of the fetal brain does not increase the MSSIM. Data with slight motion are well handled by the SR algorithm as the MSSIM is equivalent for SR reconstructions from static images and series with little motion. In the case of moderate motion, the MSSIM is lower than in the case of little motion.

Figure 6 highlights the benefit of increasing the number of orthogonal LR series on the rendering of the SR reconstruction. The higher the number of LR series combined in the SR reconstruction, even altered by motion, the smoother the frontal cortex and the sharper the putamen area in the resulting SR volume81.

Figure 6
figure 6

Appreciation of sharpness and tissue contrast enhancement in SR reconstructions from higher numbers of simulated orthogonal LR HASTE images corrupted by little motion at a GA of 30 weeks in comparison with the corresponding static 3D HR ground truth. The frontal cortex looks smoother and the putamen area sharper in the SR reconstruction from nine series compared to the SR reconstruction from three series. The mapping of local SSIM values and the computation of the MSSIM over the corresponding region-of-interest further support these observations.

Application 2: Data augmentation for automated fetal brain tissue segmentation

We aim at proving the realistic appearance of the simulated images (representative SS-FSE images of two subjects scanned/simulated at 1.5 T and 3 T respectively are provided as Supplementary Fig. S3) and demonstrating the practical value of the developed environment to complement clinical datasets for data augmentation strategies in deep learning, here with the example of fetal brain tissue segmentation.

Table 6 shows the mean DSC ± SD computed for every segmented brain tissue in each configuration. Overall, the performance of the segmentation algorithm is maintained when replacing original subjects by synthetic images obtained from an SS-FSE sequence simulated with the same acquisition parameters as in the clinical protocol (Table 6, Experiment 1). This trend is also observed for each individual structure studied. In the configuration (C), the overall DSC for brain tissue segmentation is slightly increased compared to the baseline.

Table 6 DSC (mean ± SD) in the different configurations studied for all segmented brain tissues: cerebrospinal fluid (CSF) and ventricles, cortical gray matter (GM), white matter (WM), cerebellum, deep gray matter and brain stem, and on average.

Since the simulated images look realistic enough to substitute for original clinical acquisitions, we further investigate if they can be used for data augmentation (Table 6, Experiment 2). Increasing the training data (15 real cases) by 15 supplementary simulated subjects (configuration D) results in a significantly improved mean DSC of 0.90 ± 0.05 over the six segmented brain tissues compared to 0.86 ± 0.06 in configuration (E), where the 15 original cases are extensively augmented by conventional intensity-based operations such as additional Gaussian noise, bias field and gamma intensity changes79. In more detail, the DSC is higher for all segmented brain structures when complementing clinical acquisitions with close simulated data, with statistical significance (\(p < 0.05\)) for the cerebrospinal fluid and ventricles, the cortical gray matter, the cerebellum, the deep gray matter and the brain stem.

Figure 7 illustrates on an axial view the accuracy of fetal brain tissue segmentation in a subject of 30.6 weeks of GA. The results obtained in the configuration (D) where real clinical data are complemented with simulated subjects look close to the manually-annotated ground truth. In particular, the segmentation seems to be more accurate in the cortex with an enhanced sensitivity to folding compared to the segmentation obtained in the baseline (A) where the network is solely trained on clinical data.

Figure 7
figure 7

Illustration of the accuracy of fetal brain tissue segmentation in a subject of 30.6 weeks of GA on (a) an axial slice from the SR reconstruction. Comparison of (b) the reference manual annotations, (c) the segmentation results obtained when performing extensive standard data augmentation on the clinical SR reconstructions (configuration (E), C15/S0), (d) the segmentation results obtained by the configuration (D) that complements this original dataset with fifteen additional simulated subjects (C15/S15), overlaid on the SR image. The segmentation of the cortex especially looks more accurate in (d), with an increased sensitivity to folding as highlighted by the white arrows.

Discussion and conclusion

In this work, we present FaBiAN, a novel Fetal Brain magnetic resonance Acquisition Numerical phantom, and illustrate some of its potential uses. Our tool relies on EPG simulations to account for stimulated echoes in the computation of the T2 decay in every voxel of the HR anatomical images from which the simulated images are derived. The developed framework remains general and highly flexible in the choice of the sequence parameters and anatomical settings available to the user. It simulates as closely as possible the physical principles involved in FSE sequences on several MR systems, namely from various MR vendors and at different magnetic field strengths, resulting in highly realistic T2w images of the developing brain throughout gestation.

The limitations in the resemblance of the simulated FSE images compared to typical clinical MR acquisitions may be explained by the origin of the simulated images and the lack of T1 and T2 ground truth measurements, both in the multiple fetal brain tissues and throughout maturation. As a model for our simulations, we use a normative spatiotemporal MRI atlas of the fetal brain10 where representative images at each GA correspond to an average of fetal brain scans across several subjects, thus resulting in smoothing of subtle inter-individual heterogeneities, especially in the multilayer aspect of the white matter. T1 and T2 mapping of the fetal brain is currently limited by long scanning times, unpredictable fetal motion, and acquisition of thick slices to ensure a good SNR over the whole fetal brain volume. For ethical reasons, there is also a lack of normative T1 and T2 values across GA. Therefore, and as a first approximation, we consider average T1 and T2 values of the various fetal brain structures labeled as gray matter, white matter or cerebrospinal fluid throughout development. As a result, our simulated images may fail to capture the fine details of the fetal brain anatomy as well as maturation processes that imply changes in T1 and T2 relaxation times during gestation. In this sense, experts report that they would feel confident in performing standard biometric measurements on the simulated images and in evaluating the volume of white matter, but not its fine structure. It is worth noticing that in-plane motion artefacts like signal drops are not accounted for at this stage, as slices severely corrupted by such artefacts will be removed from the analysis. Furthermore, we restricted our simulations to images of the fetal brain without the remaining fetal anatomy or the surrounding maternal uterine cavity, as most image analysis and post-processing techniques intended for fetal brain MRI, including SR reconstruction, first perform an automated fetal brain extraction/segmentation. Unlike other models that represent fetal subjects by scaled pediatric data and oversimplify the complexity of the fetal brain anatomy by reducing it to white matter alone or to a single brain tissue label82,84,84, we wanted our numerical simulations to be based on a comprehensive model of the fetal brain consistent with the underlying clinical application. In this sense, the normative spatiotemporal MRI atlas of the fetal brain10 from which the simulated FSE images are built captures the details of the fetal brain anatomy throughout maturation. Future work will improve the accuracy of the fetal brain model, both in terms of structure differentiation and T1 and T2 changes during maturation. This implies modeling of T1 and T2 variations, both locally and across time, especially within white matter. We will generate locally varying reference relaxometric properties assigned to every brain tissue to capture changes within finer structures, and modulate these values according to the gestational age of the fetus to enhance the MR contrast between the various brain tissues throughout gestation and simulate more realistic images of the developing fetal brain. Thanks to the flexibility of FaBiAN, the surrounding fetal and maternal anatomy, as well as any other structures, could easily be included as long as we have access to segmented HR images, for instance by resorting to more complex anatomical models such as the XCAT phantoms85,86,87. Several models can also be combined.

We have designed this open-source simulator to aid in the development and validation of advanced image processing techniques dedicated to improving the analysis of fetal brain MR images and support accurate diagnosis. Despite SR reconstruction has already demonstrated its potential for accurate biometric measurements in the fetal brain18,19,20, some parameters still need to be adjusted to the nature of the input LR images to provide optimal evaluation and support computer-assisted diagnosis. In fetal MRI, the level of regularization is commonly set empirically based on visual perception12,13,14,17. Intuitively, the level of regularization depends on the amount of data available to solve the ill-posed inverse problem. Thanks to its controlled environment, the presented framework makes it possible to explore the optimal settings for SR fetal brain MRI according to the quality of the input motion-corrupted LR series with respect to a synthetic 3D HR ground truth. It is worth noticing that in-plane motion artefacts like signal drops are not accounted for in the simulation workflow at this stage, as heavily corrupted slices are commonly removed from the reconstruction. Besides, FaBiAN also enables quantitative assessment of the robustness of any SR reconstruction algorithm depending on various parameters that can be intrinsic to the system like noise, or related to the clinical application such as the amplitude of fetal motion in the womb and the number of series used for SR reconstruction81. For instance, a decrease in the MSSIM when adding motion-corrupted series to reconstruct a SR volume of the fetal brain compared to the SR reconstruction from static series can be caused by an inappropriate slice-to-volume registration (SVR). Therefore, our numerical phantom provides a valuable framework for reproducibility studies and validation of image processing methods. Additional examples among its wide variety of applications include the simulation of a static reference volume at a given GA on which to align the clinical orthogonal LR series acquired in a subject at the same GA in order to perform SVR and subsequent SR reconstruction of the fetal brain, especially in the presence of heavily motion-corrupted acquisitions. Besides, the performance of the SVR can be quantitatively assessed by comparing the motion transform estimated by the algorithm to the controlled 3D rigid movements actually simulated in the images. Synthetic HR images can also be used as a reference for more general motion compensation techniques.

As raised by Wissmann et al., the lack of comparability between simulation setups hinders the evaluation of image reconstruction methods in relation to each other27. This first numerical phantom for MR imaging of the in utero developing brain has already proven its high flexibility in generating multiple images with various acquisition parameters and settings. To make it even more general, we will extend FaBiAN to other MR vendors and sequences. From image acquisition to post-processing developments, the fetal brain MRI community should be able to take advantage of such a unified environment that is intended to simulate MR images of the developing fetal brain as they are acquired in clinical routine all around the world.

Furthermore, the developed framework generates T2w images of the fetal brain realistic enough to complement real clinical acquisitions for data augmentation strategies, as shown here with a proof of concept for fetal brain tissue segmentation. It especially makes it possible to exploit the whole range of GA in the simulated data when clinical cases are scarce and not necessarily uniformly distributed across development. Thus, we can take advantage of larger and more diverse datasets at no cost. In Experiment 1, we limit augmentation of the clinical and simulated data to random flip and rotation only in order to minimize its effect in evaluating the combination of both types of images. Thus, it becomes possible to explore how replacing a real image with a synthetic one impacts the variability of the model, and subsequently the segmentation accuracy. In the scenario of Experiment 2, we rather compare standard data augmentation practices in modern convolutional neural network segmentation pipelines (additional Gaussian noise, bias field, and gamma intensity corrections)25,79 to the data augmentation possibilities offered by FaBiAN. Results show that the greater anatomical variability brought by the addition of MR images simulated from another source (configuration (D)) certainly benefits the segmentation accuracy more than intensity-based operations (configuration (E)). Overall, the SR images from the FeTA dataset used in this study present a homogeneous appearance, with high quality, denoised and bias-free reconstructions. Thus, augmentation of the original clinical dataset through noise addition and intensity changes does not help the network to better generalize in this cohort. Conversely, the segmentation network is more likely to take advantage of the more heterogeneous and diverse data generated by FaBiAN. In this proof-of-concept study, it is worth highlighting that the bias field and SNR are the same throughout all simulations. However, as demonstrated in this manuscript, FaBiAN is an excellent playground to perform additional data augmentation itself by generating multiple MR images for the same subject with varying parameters, whether related to intensity changes, spatial resolution, or amplitude of fetal movements. This asset would further complement conventional data augmentation strategies to enhance data heterogeneity and mitigate the scarcity of fetal brain MR images.

Although the resort to an atlas restricts the inter-subject variability at a given GA, the great flexibility of FaBiAN also lies in the possibility of simulating images from various sources, either atlases or clinical segmented HR anatomical images of the fetal brain like SR reconstructions. Thus, it becomes possible to simulate several subjects at a given GA from various fetal brain models to increase the inter-subject anatomical heterogeneity in the synthetic images. Beyond normal brain development, we aim at exploring common developmental pathologies, like ventriculomegaly and spina bifida, by simulating new datasets of synthetic MR images from annotated SR reconstructions of the fetal brain in pathological subjects.

In line with the demonstration of the added value of FaBiAN for diverse applications that revolve around improving diagnosis and prognosis from MR images of the developing fetal brain, future work involves investigating the ability of such a numerical framework to generalize post-processing tools like fetal brain tissue segmentation to datasets acquired on other MR systems and with other sequence parameters using a collection of various synthetic images for domain adaptation techniques.