Keywords

1 Introduction

Preterm birth is associated with a distinct brain magnetic resonance image (MRI) phenotype that includes generalized and specific dysconnectivity of neural systems and increased risk of neurocognitive and psychiatric impairment [1, 2]. Different structural properties derived from MRI data, such as fractional anisotropy, mean, axial and radial diffusivity and neurite density index, have shown age related differences in the developing brain [3, 4] and have been linked to altered neurodevelopment in children born preterm [5].

In recent years, several studies have used connectomics to study the developing brain and to investigate atypical development and dysmaturation [3, 6, 7]. The majority of connectomics work in neonates is based on single modes of data such as diffusion MRI (dMRI) tractography or resting-state functional connectivity (rsfMRI). An alternative method to model connectivity is the structural covariance network (SCN) approach [8] that works by calculating the covariance between regional measurements across subjects, resulting in a single network for the entire population. Other approaches have constructed subject-specific SCNs [9, 10], but these techniques have been restricted to the use of morphometric variables available through standard structural T1-weighted MRI sequences and by using a single metric to assess the “connectivity” between nodes. To the best of our knowledge, only Shi et al. used SCNs in addition to white matter (WM) networks to provide evidence of the existence of brain alteration in neonates at genetic risk for schizophrenia [11]. The main limitation of the mentioned methods is that they do not take advantage of additional information that can be derived combining multiple modalities. In recent work Ball et al. applied canonical correlation analysis to multi-modal imaging and clinical data from preterm infants [12]. The study identified specific patterns of neonatal neuroanatomic variation that related to perinatal environmental exposures and correlated with functional outcome. They show that data-driven approaches which incorporate broad image phenotypes improve characterisation of atypical brain development after preterm birth.

In this work, we investigated whether connectomes derived from multimodal data within a predicting framework capture novel information about brain dysmaturation in preterm infants. Morphometric similarity networks (MSN) model the inter-regional correlations of multiple macro- and micro-structural multimodal MRI variables in a single individual. MSNs were originally devised to understand better how human cortical networks underpin individual differences in psychological functions [13]. The method works by computing a number of metrics for each region of interest (ROI) derived from different MRI sequences which are then arranged in a vector. The aim is to obtain a multidimensional description of the structural properties of the ROIs. The MSN is then built considering the ROIs as nodes and modelling connection strength as the correlation between pairs of ROI vectors. The pattern of correlations can be conceptualized as a “fingerprint” of an individual’s brain. Here, the edges of individual MSNs were used to train a regression model to predict postmenstrual age (PMA) at the time of MRI acquisition. Then, to quantify brain maturation, the Relative Brain Network Maturation Index (RBNMI) [7] was computed as the difference between predicted and actual age. We aimed to test the hypothesis that RBNMI is reduced in preterm infants at term equivalent age compared with infants born at term.

2 Methods

2.1 Participants and Data Acquisition

We combined neonatal brain MRI data collected in our institution with data from the first release of the Developing Human Connectome ProjectFootnote 1 (dHCP) in order to achieve balance across the age range of interest (37–45 weeks PMA). The total study sample consisted of data of 95 subjects.

The Edinburgh Birth Cohort (TEBC). Participants were recruited as part of a longitudinal study designed to investigate the effects of preterm birth on brain structure and outcome.Footnote 2 The study was conducted according to the principles of the Declaration of Helsinki, and ethical approval was obtained from the UK National Research Ethics Service. Parents provided written informed consent. A total of 55 neonates (30 preterm, 25 term, PMA range 38–45 weeks) underwent MRI at term equivalent age at the Edinburgh Imaging Facility: Royal Infirmary of Edinburgh, University of Edinburgh, Scotland, UK. A Siemens MAGNETOM Prisma 3 T MRI clinical scanner (Siemens Healthcare Erlangen, Germany) and 16-channel phased-array paediatric head coil were used to acquire: 3D T1-weighted MPRAGE (T1w) (acquired voxel size = 1 mm isotropic); 3D T2-weighted SPACE (T2w) (voxel size = 1 mm isotropic); and two dMRI acquisitions, the first one using a protocol consisting of 8 baseline volumes (b = 0 s/mm2) and 64 volumes with b = 750 s/mm2, the second one consisting of 8 baseline volumes, 3 volumes with b = 200 s/mm2, 6 volumes with b = 500 s/mm2 and 64 volumes with b = 2500 s/mm2. Diffusion-weighted single-shot spin-echo echo planar imaging (EPI) volumes with 2-fold simultaneous multislice and 2-fold in-plane parallel imaging acceleration were acquired with 2 mm isotropic voxels; both acquisitions had the same echo and repetition time. Infants were scanned without sedation. All images were reviewed by an expert in paediatric neuroadiology (AJQ), and none contained major focal parenchymal lesions.

Developing Human Connectome Project. The goal of the dHCP is to create a dynamic map of human brain connectivity from 20 to 45 weeks PMA. The infants were scanned using optimized protocols for structural T1w and T2w, rsfMRI, and dMRI. The first release consists of images of 40 term infants (PMA range 37–44 weeks).

Fig. 1.
figure 1

Distribution of PMA at scanning in the pooled sample (37–45 weeks PMA). In the left panel, age distributions in term and preterm infants are shown. In the right panel, the age distributions in the TEBC and dHCP populations are shown.

The distribution of PMA at scan for the pooled data, each cohort, and the term and preterm groups is shown in Fig. 1. The neonates from both cohorts were separated in two groups: 65 term (PMA at scan range 37–44 weeks) and 30 preterm infants at term equivalent age (PMA at scan range 38–45 weeks). All the term infants were born after 36 week of gestation, while the preterm children were born before 32 weeks.

2.2 Data Preprocessing

Structural data were preprocessed using the dHCP minimal processing pipeline described by Makropoulos et al. [14]. The result of this processing is the parcellation of each structural image into 87 ROIs [15].

Diffusion MRI processing of TEBC dataset was performed as follows: for each subject the two dMRI acquisitions were first concatenated and then denoised using a PCA-based algorithm [16]; the eddy current, head movement and EPI geometric distortions were corrected using outlier replacement and slice-to-volume registration [17,18,19,20]; bias field inhomogeneity correction was performed by calculating the bias field of the mean b0 volume and applying the correction to all the volumes [21]. The dMRI data from the dHCP was already preprocessed [22].

For both datasets, the mean b0 volume of each subject was co-registered to their T2w volume using boundary-based registration [23], then the inverse transformation was used to propagate the ROI labels to the dMRI. For each ROI, two metrics were computed in the structural space: ROI volume and the mean T1w/T2w signal ratio. In dMRI space, seven additional metrics were also computed: the mean of each tensor-derived metric (FA: fractional anisotropy, MD: mean diffusivity, AD: axial diffusivity and RD: radial diffusivity) and the mean of the three Neurite Orientation Dispersion and Density Imaging (NODDI) parameters (ICVF: intracellular volume fraction, ODI: orientation dispersion index and VISO: isotropic volume fraction) [24]. For the tensor derived metrics, \(b = 750\) s/mm2 was used for the TEBC and \(b = 1000\) s/mm2 for the dHCP data. NODDI maps were calculated with the default parameters.

2.3 Data Harmonization

We used ComBat to harmonise data because of its efficacy for removing scanner variability while preserving biological variation [25]. Each image metric was harmonized separately using gender, gestational age at birth, PMA at scan and prematurity (coded as a binary variable) as biological covariates of interest.

2.4 Network Construction

The MSN for each subject was constructed by selecting 82 of the total 87 ROIs (excluding CSF and background parcels); each of the ROI metrics was normalized (z-scored) and then Pearson correlations were computed between the vectors of metrics from each pair of ROIs. In this way, the nodes of each network are the ROIs and the edges represent the morphometric similarity between the two related ROIs (see Fig. 2 for a graphic overview of the approach). For comparison, we also computed single-metric networks: in this case connections were assigned a weight computed as the absolute difference between ROI values [10]. Note that we did not control for the effect of brain size; this is an aspect worth investigating in future work.

Fig. 2.
figure 2

Individual MSN construction. Different metrics are extracted from diffusion and structural MRI data. The same parcellation is applied to both image types and the average metric values are computed for each ROI. A connectivity matrix, representing MSN connections, is built by computing the Pearson correlation between the vectors of metrics of each pair of ROIs.

2.5 Regression Model

We trained a linear regression model with elastic net regularisation to predict PMA at scan in both preterm and term infants starting from individual MSNs. For each subject, the edges of the MSN (inter-regional correlations) were concatenated to form a feature vector to be given as input to the regression model. Following the approach of [7], prediction performances were evaluated with a Monte Carlo cross-validation scheme. This worked as follows: in each of 100 repetitions, 50 term infants were selected at random to form the training set: the remaining 15 full term infants and 15 premature infants selected at random constituted the test set. Then, the mean absolute error (MAE) was computed across subjects and repetitions. The parameters of the regression model were selected with a nested 3-fold cross-validation loop.

2.6 Feature Selection

After the preprocessing phase, nine different metrics are available for each ROI. To study which combination of features produced better performance in the age prediction task, we implemented a sequential backward feature selection scheme. Starting from the full set of features, at each iteration we removed the feature whose subtraction caused the least increase in prediction error. The prediction error was computed with a leave-one-out cross-validation scheme on term infant data only. The best performing model, which was adopted for all subsequent analyses, was based on eight features (regional volume, T1w/T2w ratio, FA, AD, RD, ICVF, VISO, ODI) with a MAE of 0.64 (±0.44) weeks. None of the single-metric networks (Sect. 2.4) outperformed the selected MSN, however it is interesting to note that the metrics that yielded the least and second least error were regional volume (MAE: \(0.76\,\pm \, 0.61\)) and FA (MAE: \(0.79\,\pm \,0.91\)), respectively. These two metrics were previously associated with age-related changes in the developing brain [3, 4, 15]. See supplementary material for a comprehensive report of feature selection results.

2.7 Measuring Brain Maturation

The information contained in a MSN can be used to derive a data-driven metric of brain maturation [7]. RBNMI is the difference between the predicted and actual age: a negative score implies that the apparent age (i.e., the age predicted by the model) is lower than the actual age; hence RBNMI can be interpreted as an index of dysmaturation.

3 Results and Discussion

3.1 Data Harmonisation

We tested the efficacy of ComBat harmonization by comparing MSNs built before and after harmonizing the data. We used principal component analysis to perform an unsupervised dimension reduction and projected individual MSN onto the first two principal components (PCs), explaining 28% of the variance. Figure 3 shows that without harmonization points are distinctly clustered by dataset, while after harmonization there is no clear separation.

Fig. 3.
figure 3

Individual MSNs projected onto the first two principal components (PC1 and PC2) that explain most of the variation in the data, before harmonization (left panel) and after (right panel).

Fig. 4.
figure 4

Distribution of per group mean RBNMI values, averaged across test subjects within each of the 100 rounds of cross-validation, shown as histograms (left) and as box plots (right). Averaged RBNMI values in the preterm group were significantly more negative than in the term group.

3.2 Morphometric Similarity Networks

The selected model predicted PMA at scan with a MAE of 0.88 (±0.63) weeks when tested on all data. We then computed the mean RBNMI values for each group of subjects, averaged across the 100 Monte Carlo repetitions. The group RBNMI was more negative for the preterm group (Fig. 4). The difference between the two groups was confirmed with a two-tailed t-test (\(p<0.001\)) after testing for departure from normality of the RBNMI distributions with a D’Agostino and Pearson’s test (\(p=0.42\) for the pretermn group; \(p=0.98\) for the term group). These results suggest that the information contained in a MSN is sufficient to detect a dysmaturation in preterm infants at term equivalent age. One interesting aspect to consider is that the MSN was built by combining features from different imaging sequences that describe complementary aspects of brain structure and have been previously studied in isolation [3, 15]. Hence, to fully describe the developing brain it is crucial to follow a holistic approach, integrating information from multiple sources.

To study which connections contributed the most to age prediction, we selected only edges which were assigned a non-zero coefficient in at least 99% of the Monte Carlo repetitions. These edges are shown in the chord diagram in Fig. 5. The selected connections are predominantly located in subcortical regions (thalamus, caudate and subthalamic nuclei); white matter ROIs in the temporal lobe and posterior cingulate cortex; frontal regions, brain stem and cerebellum. These areas have been previously associated with age-related changes and preterm birth [3, 26, 27].

Fig. 5.
figure 5

Chord diagram showing MSN edges used for prediction in at least 99% of regression models in the Monte Carlo repetitions. The thickness of connections reflects the strength of the correlation between edge weights and age across subjects (positive correlations in shades of gray; negative correlations in shades of red). (Color figure online)

4 Conclusions

We used inter-regional morphometric similarities to train a predictive model for PMA at scan and to derive a data-driven metric of brain maturation, the RBNMI. We found that preterm infants at term equivalent age were consistently assigned a lower RBNMI than infants born at term, indicating delayed brain maturation. The predictive model for age performed best when structural, diffusion tensor-derived and NODDI metrics were combined, which demonstrates the importance of integrating different biomarkers to generate a global picture of the developing human brain. The motivation for using a network-based approach is indeed obtaining a whole-brain description able to capture a developmental pattern. Our results suggest that the information contained in MSNs is sufficient to train a machine learning model in the age prediction task. A second reason for working with similarities instead of single regional metrics is methodological: computing edge weights as inter-regional similarities enables an integrated representation of all available metrics in a single network; to work with the original features directly would mean either working with several networks (thus requiring a further step to integrate them) or concatenating all the features in a single predictive model, aggravating the problems related with the “curse of dimensionality”. In future work, we plan to extend the presented approach to other clinical variables beyond PMA, such as clinical risk indices and neurocognitive outcome data.