An AutoML Approach for the Prediction of Fluid Intelligence from MRI-Derived Features

Sebastian Pölsterl¹²,
Benjamín Gutiérrez-Becker¹²,
Ignacio Sarasua¹²,
Abhijit Guha Roy¹² &
…
Christian Wachinger¹²

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11791))

Included in the following conference series:

Challenge in Adolescent Brain Cognitive Development Neurocognitive Prediction

1647 Accesses
3 Citations

Abstract

We propose an AutoML approach for the prediction of fluid intelligence from T1-weighted magnetic resonance images. We extracted 122 features from MRI scans and employed Sequential Model-based Algorithm Configuration to search for the best prediction pipeline, including the best data pre-processing and regression model. In total, we evaluated over 2600 prediction pipelines. We studied our final model by employing results from game theory in the form of Shapley values. Results indicate that predicting fluid intelligence from volume measurements is a challenging task with many challenges. We found that our final ensemble of 50 prediction pipelines associated larger parahippocampal gyrus volumes with lower fluid intelligence, and higher pons white matter volume with higher fluid intelligence.

You have full access to this open access chapter, Download conference paper PDF

Deep learning of structural MRI predicts fluid, crystallized, and general intelligence

Article Open access 14 November 2024

Predicting intelligence from brain gray matter volume

Article Open access 21 July 2020

Discriminative-Region-Aware Residual Network for Adolescent Brain Structure and Cognitive Development Analysis

1 Introduction

This paper describes our method submitted to the ABCD Neurocognitive Prediction Challenge 2019. The task of the challenge is to predict fluid intelligence solely from structural T1-weighted magnetic resonance images (MRI). The challenge uses data from the Adolescent Brain Cognitive Development (ABCD) Study.

In this approach, we first extract features from MRI scans and then use an automated machine learning approach for the prediction. For the feature extraction, we use volume measurements as provided by the challenge’s organizers. For the prediction, we use an automated machine learning (AutoML) approach, as determining a good machine learning pipeline is a tedious and error-prone task for humans. A typical ML pipeline includes various types of preprocessing that can be applied to input features. Afterwards, an appropriate classifier needs to be selected and the optimal hyperparameters selected to achieve high predictive performance. The goal of AutoML is to automate the whole machine learning pipeline. A recent overview of AutoML approaches together with an analysis of the results of ChaLearn AutoML Challenges over the last four years is given in [5]. AutoML has not yet been widely explored in the medical field, with PubMed listing only four articles [1, 7, 10, 14]; none of which study MRI or neuroscience.

2 Data

Data was provided by The Adolescent Brain Cognitive Development(ABCD) Study [13], which recruited children aged 9–10. Participants were given access to T1-weighted MRI scans from 3,736 children for training, 415 children for validation, and 4,402 children for testing. Fluid intelligence scores were residualized to account for confounding due to sex at birth, ethnicity, highest parental education, parental income, parental marital status, and image acquisition site. Residualized fluid intelligence scores were provided for the training and validation data, but not for the test data. All data was obtained from the National Institute of Mental Health Data Archive.^{Footnote 1}

3 Methods

Our proposed pipeline for the prediction of fluid intelligence from T1-weighted MRI scans builds on the Automated Machine Learning (AutoML) framework summarized in Fig. 1. Scans were acquired according to the acquisition protocol of the Adolescent Brain Cognitive Development (ABCD) study protocol.^{Footnote 2} For parcellation of the brain and the estimation of volume of each region of interest, we relied on the work of the challenge’s organizers.

3.1 Feature-Preprocessing

We used volume measurements of 122 regions of interest extracted by the challenge’s organizers from each T1-weighted MRI scan based on the SRI24 atlas [15].^{Footnote 3} We normalized all volume measurements while accounting for outliers by subtracting the median and dividing by the range between the 5% and 95% percentile. Thus, we reduce the impact of outliers and still obtain approximately centered features with equal scale. Finally, the provided residualized fluid intelligence scores in the training data where standardized to zero mean and unit variance; the same transformation as derived from the training data was applied to features and scores in the validation and test data. Additional pre-processing steps were selected without human interaction as described in the next section.

3.2 Automated Machine Learning

For the prediction of residualized fluid intelligence score, we used automated machine learning that leverages recent advances in Bayesian optimization, meta-learning, and ensemble construction. For every machine learning task, the fundamental problem is to decide which machine learning algorithm to use and whether and how to pre-process features. This task is extremely challenging, because there is no single algorithm that performs best on all datasets and the performance of machine learning methods depends to a large extent on their hyper-parameter settings, which can vary from one task to the next. Here, we use AutoML for the prediction of the residualized fluid intelligence score by producing test set predictions without human input within a given computational budget. Specifically, we employ Combined Algorithm Selection and Hyperparameter (CASH) optimization [3].

Let $\mathcal {A} = \{ A^{(1)}, \ldots , A^{(R)} \}$ be a set of machine learning algorithms, and $\varLambda ^{(j)}$ be the domain of the hyper-parameters of each algorithm. Further, we define $\mathcal {D}_\text {train} = \{ (\mathbf {x}_1, y_1), \ldots , (\mathbf {x}_n, y_n) \}$ to be the training set, which we split into K cross-validation folds to obtain $\{ \mathcal {D}_\text {train}^{(1)}, \ldots , \mathcal {D}_\text {train}^{(K)} \}$ and $\{ \mathcal {D}_\text {valid}^{(1)}, \ldots , \mathcal {D}_\text {valid}^{(K)} \}$ with $\mathcal {D}_\text {train}^{(k)} = \mathcal {D}_\text {train} \backslash \mathcal {D}_\text {valid}^{(k)}$. For a particular hyper-parameter configuration $\varvec{\varTheta }$, we solve the CASH optimization problem

$$\begin{aligned} \mathop {{{\,\mathrm{\arg \!\min }\,}}}\limits _{A^{(j)} \in \mathcal {A}, \varvec{\varTheta } \in \varLambda ^{(j)}}\quad \frac{1}{K} \sum _{k=1}^K \frac{1}{|\mathcal {D}_\text {valid}^{(k)}|} \sum _{i=1}^{|\mathcal {D}_\text {valid}^{(k)}|} \left( y_i - \hat{f}_{A_{\varvec{\varTheta }}^{(j)}}(\mathbf {x}_i\,|\,\mathcal {D}_\text {train}^{(k)}) , \right) ^2 \end{aligned}$$

(1)

where $\hat{f}_{A_{\varvec{\varTheta }}^{(j)}}(\mathbf {x}_i\,|\,\mathcal {D}_\text {train}^{(k)})$ denotes the prediction on the validation set of model $A^{(j)}$ with hyper-parameters $\varvec{\varTheta }$ and trained on $\mathcal {D}_\text {train}^{(k)}$. This optimization problem can be solved via Sequential Model-based Algorithm Configuration (SMAC), a technique for Bayesian black-box optimization that uses a random-forest-based surrogate model [6]. The main idea of SMAC is to use the surrogate model to predict an algorithm’s performance for a given hyper-parameter optimization. It is able to interpolate the performance of algorithms between observed hyper-parameter configurations and previously unseen configurations in the hyper-parameter space. Thus, it enables us to focus on promising hyper-parameter configurations.

We employed the auto-sklearn toolkit (version 0.5.0), which for a given user-provided computational budget in terms of run time and memory, auto-sklearn searches for the best machine learning pipeline to predict the residualized fluid intelligence score by combining components of the scikit-learn machine learning framework (version 0.18.2) [12]. Figure 1 depicts an overview of the AutoML framework. For data preprocessing, AutoML can choose from 11 algorithms for data transformations, such as principal component analysis. For feature preprocessing 6 feature-wise transformations are available, such as transforming each feature to have zero mean and unit variance. Finally, AutoML can choose from 13 regression models. After evaluating various machine learning pipelines, comprising data transformations, feature transformations, and regression model, the best M pipelines are combined via ensemble selection [2] to form the final prediction model. We used a budget that consisted of a total run time of 40 h, where each pipeline was limited to 6 min and 4 GB of memory. The final ensemble size was $M=50$.

3.3 Feature Importance

While complex prediction pipelines are potentially powerful, their black-box nature is often a barrier for employing such a model in clinical research. We use Shapley values to explain the predictions of our final ensemble of prediction pipelines. Shapley values are a classic solution in game theory to determine the distribution of credits to players participating in a cooperative game [16, 17]. They have first been proposed for linear models in the presence of multicollinearity [8]. A Shapley value assigns an importance value $\phi _j$ to each feature j that reflects its effect on the model’s prediction. To compute this effect, retraining the model $f(\cdot )$ on all possible feature subsets $\mathcal {S} \subseteq \mathcal {F} \backslash \{j\}$ of all features $\mathcal {F}$ is necessary. Given a feature vector $\mathbf {x} \in \mathbb {R}^{|\mathcal {F}|}$, the j-th Shapley value can then be computed as the weighted average of all prediction differences:

$$\begin{aligned} \phi _j(\mathbf {x}) = \sum _{\mathcal {S} \subseteq \mathcal {F} \backslash \{j\}} \frac{|\mathcal {S}|!(|\mathcal {F}|-|\mathcal {S}|-1)!}{|\mathcal {F}|!} \left( \hat{f}_{\mathcal {S} \cup \{j\}}( \mathbf {x}^{\mathcal {S} \cup \{j\}} ) - \hat{f}_\mathcal {S}( \mathbf {x}^{\mathcal {S}} ) \right) , \end{aligned}$$

(2)

where $\hat{f}_S( \mathbf {x}^{\mathcal {S}} )$ denotes the prediction of a model trained and evaluated on the feature subset $\mathcal {S}$. The exact computation of Shapley values requires evaluating all $2^{|\mathcal {F}|}$ possible feature subsets, which is only reasonable when data consists of not more than a few dozen features. To address this problem, we employ the recently proposed SHAP (SHapley Additive exPlanations) values, which belong to the class of additive feature importance measures [9]. The exact computation of SHAP values is prohibitive, therefore we approximate SHAP values using the model-agnostic KernelSHAP approach proposed in [9]. To obtain a global measure of feature importance, we compute the average magnitude of SHAP values across all N subjects in the data:

$$\begin{aligned} \bar{\phi }_j = \frac{1}{N} \sum _{i=1}^N | \phi _j(\mathbf {x}_i)|. \end{aligned}$$

(3)

4 Results

The performance of the final ensemble is summarized in Table 1. It reveals that predicting residualized fluid intelligence from MRI-derived volume measurements is a challenging task. In particular, the proposed model struggles to reliably predict residualized fluid intelligence at the extremes of the distribution, i.e., very low or very high values. Consequently, we observe a relatively high mean squared error, which is an order of magnitude larger than the mean absolute error. Moreover, the large difference between the performance on the training data and the validation data indicates that overfitting seems to be an issue.

Table 1. Performance on training, validation and test set. MSE: mean squared error. MAE: mean absolute error.

Full size table

In total, we evaluated 2,608 machine learning algorithms (see Table 2). The components of our final ensemble of 50 machine learning pipelines is summarized in Table 3. Principal component analysis [11] was selected most often (15 times) for data pre-processing. The final ensemble was comprised of linear and non-linear regression models with ensembles of randomized regression trees [4] being selected most frequently (14 times). Looking at the top-performing pipelines in the ensemble, we noticed that combining principal component analysis with a tree-based ensemble was a frequently selected combination (5 out of the top 10 performing pipelines).

Table 2. Summary of evaluated machine learning pipelines.

Full size table

Table 3. Overview of selected components in the final ensemble of $M=50$ pipelines selected by AutoML. Each pipeline consists of one data preprocessing step, one feature preprocessing step, and one regressor.

Full size table

Next, we inspected which MRI-derived feature the model deems most important by computing SHAP values for each feature and subject in the training data. Figure 2 lists the top 20 features by mean absolute SHAP value $\phi $. The top ranked feature is pons white matter volume ($\phi = 0.0183$), followed by left parahippocampal gyrus volume ($\phi = 0.0155$), and left lateral ventricle cerebral spinal fluid volume ($\phi = 0.0148$). However, we note that individual SHAP values are rather small, which is evidence that fluid intelligence is not strongly influenced by a single brain region, but a complex inter-relationship between different regions. Individual, subject-specific SHAP values depicted in Fig. 2b indicate that larger left and right parahippocampal gyrus volume are associated with a decrease in fluid intelligence, while larger pons white matter volume is associated with an increase.

5 Conclusion

We proposed an AutoML model for the prediction of fluid intelligence from T1-weighted magnetic resonance images based on more than 2,600 evaluated machine learning pipelines. Our experiments demonstrate that it is challenging for our ensemble to reliably predict fluid intelligence from MRI scans. In particular, errors on the validation and test data were more than four times higher than on the training data, which is evidence for overfitting. We analyzed the final model’s predictions using SHAP values. Results revealed that top ranked features still explain only a small fraction of the fluid intelligence score. Therefore, we concluded that current features derived from MRI are insufficient to robustly measure fluid intelligence. While current features are generic descriptors of the brain anatomy, we believe future research should focus on deriving tailor-made features from MRI, specific to the prediction of fluid intelligence, which could then be used to improve our understanding of the neurobiology underlying fluid intelligence.

Notes

1.
https://nda.nih.gov/edit_collection.html?id=3104.
2.
https://abcdstudy.org/images/Protocol_Imaging_Sequences.pdf.
3.
See https://nda.nih.gov/data_structure.html?short_name=btsv01 for a full list of volumes.

References

Barreiro, E., Munteanu, C.R., Cruz-Monteagudo, M., Pazos, A., González-Díaz, H.: Net-net auto machine learning (AutoML) prediction of complex ecosystems. Sci. Rep. 8(1), 12340 (2018)
Article Google Scholar
Caruana, R., Niculescu-Mizil, A.: Ensemble selection from libraries of models. In: Proceedings of the 21st International Conference on Machine Learning, p. 18 (2004)
Google Scholar
Feurer, M., Klein, A., Eggensperger, K., Springenberg, J., Blum, M., Hutter, F.: Efficient and robust automated machine learning. In: Advances in Neural Information Processing Systems 28, pp. 2962–2970 (2015)
Google Scholar
Geurts, P., Ernst, D., Wehenkel, L.: Extremely randomized trees. Mach. Learn. 63(1), 3–42 (2006)
Article Google Scholar
Guyon, I., et al.: Analysis of the AutoML challenge series 2015–2018. In: Hutter, F., Kotthoff, L., Vanschoren, J. (eds.) Automated Machine Learning. TSSCML, pp. 177–219. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-05318-5_10
Chapter Google Scholar
Hutter, F., Hoos, H.H., Leyton-Brown, K.: Sequential model-based optimization for general algorithm configuration. In: Coello, C.A.C. (ed.) LION 2011. LNCS, vol. 6683, pp. 507–523. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-25566-3_40
Chapter Google Scholar
Le, T.T., Fu, W., Moore, J.H.: Scaling tree-based automated machine learning to biomedical big data with a feature set selector. Bioinformatics, 1–7 (2019)
Google Scholar
Lipovetsky, S., Conklin, M.: Analysis of regression in game theory approach. Appl. Stoch. Models Bus. Ind. 17(4), 319–330 (2001)
Article MathSciNet Google Scholar
Lundberg, S.M., Lee, S.I.: A unified approach to interpreting model predictions. In: Advances in Neural Information Processing Systems 30, pp. 4765–4774 (2017)
Google Scholar
Orlenko, A., et al.: Considerations for automated machine learning in clinical metabolic profiling: altered homocysteine plasma concentration associated wtih metformin exposure. In: Pacific Symposium on Biocomputing, vol. 23. World Scientific (2017)
Google Scholar
Pearson, K.: On lines and planes of closest fit to systems of points in space. Lond. Edinburgh Dublin Philos. Mag. J. Sci. 2(11), 559–572 (1901)
Article Google Scholar
Pedregosa, F., et al.: Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
MathSciNet MATH Google Scholar
Pfefferbaum, A., et al.: Altered brain developmental trajectories in adolescents after initiating drinking. Am. J. Psychiatry 175(4), 370–380 (2018)
Article Google Scholar
Puri, M.: Automated machine learning diagnostic support system as a computational biomarker for detecting drug-induced liver injury patterns in whole slide liver pathology images. Assay Drug Dev. Technol. (2019)
Google Scholar
Rohlfing, T., Zahr, N.M., Sullivan, E.V., Pfefferbaum, A.: The SRI24 multichannel atlas of normal adult human brain structure. Hum. Brain Mapp. 31(5), 798–819 (2010)
Article Google Scholar
Shapley, L.S.: A value for n-person games. Contrib. Theory Games 2(28), 307–317 (1953)
MathSciNet MATH Google Scholar
Štrumbelj, E., Kononenko, I.: Explaining prediction models and individual predictions with feature contributions. Knowl. Inf. Syst. 41(3), 647–665 (2014)
Article Google Scholar

Download references

Acknowledgements

This research was partially supported by the Bavarian State Ministry of Education, Science and the Arts in the framework of the Centre Digitisation.Bavaria (ZD.B).

Author information

Authors and Affiliations

Artificial Intelligence in Medical Imaging (AI-Med), Department of Child and Adolescent Psychiatry, Ludwig Maximilian Universität, Munich, Germany
Sebastian Pölsterl, Benjamín Gutiérrez-Becker, Ignacio Sarasua, Abhijit Guha Roy & Christian Wachinger

Authors

Sebastian Pölsterl
View author publications
You can also search for this author in PubMed Google Scholar
Benjamín Gutiérrez-Becker
View author publications
You can also search for this author in PubMed Google Scholar
Ignacio Sarasua
View author publications
You can also search for this author in PubMed Google Scholar
Abhijit Guha Roy
View author publications
You can also search for this author in PubMed Google Scholar
Christian Wachinger
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sebastian Pölsterl .

Editor information

Editors and Affiliations

Stanford University, Stanford, CA, USA
Kilian M. Pohl
University of California, San Diego, La Jolla, CA, USA
Wesley K. Thompson
Stanford University, Stanford, CA, USA
Ehsan Adeli
Children’s National Health System, Washington, DC, USA
Marius George Linguraru

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Pölsterl, S., Gutiérrez-Becker, B., Sarasua, I., Guha Roy, A., Wachinger, C. (2019). An AutoML Approach for the Prediction of Fluid Intelligence from MRI-Derived Features. In: Pohl, K., Thompson, W., Adeli, E., Linguraru, M. (eds) Adolescent Brain Cognitive Development Neurocognitive Prediction. ABCD-NP 2019. Lecture Notes in Computer Science(), vol 11791. Springer, Cham. https://doi.org/10.1007/978-3-030-31901-4_12

Download citation

DOI: https://doi.org/10.1007/978-3-030-31901-4_12
Published: 10 October 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-31900-7
Online ISBN: 978-3-030-31901-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

An AutoML Approach for the Prediction of Fluid Intelligence from MRI-Derived Features

Abstract

Similar content being viewed by others

Deep learning of structural MRI predicts fluid, crystallized, and general intelligence

Predicting intelligence from brain gray matter volume

Discriminative-Region-Aware Residual Network for Adolescent Brain Structure and Cognitive Development Analysis

1 Introduction

2 Data