Abstract
We propose an AutoML approach for the prediction of fluid intelligence from T1-weighted magnetic resonance images. We extracted 122 features from MRI scans and employed Sequential Model-based Algorithm Configuration to search for the best prediction pipeline, including the best data pre-processing and regression model. In total, we evaluated over 2600 prediction pipelines. We studied our final model by employing results from game theory in the form of Shapley values. Results indicate that predicting fluid intelligence from volume measurements is a challenging task with many challenges. We found that our final ensemble of 50 prediction pipelines associated larger parahippocampal gyrus volumes with lower fluid intelligence, and higher pons white matter volume with higher fluid intelligence.
You have full access to this open access chapter, Download conference paper PDF
Similar content being viewed by others
1 Introduction
This paper describes our method submitted to the ABCD Neurocognitive Prediction Challenge 2019. The task of the challenge is to predict fluid intelligence solely from structural T1-weighted magnetic resonance images (MRI). The challenge uses data from the Adolescent Brain Cognitive Development (ABCD) Study.
In this approach, we first extract features from MRI scans and then use an automated machine learning approach for the prediction. For the feature extraction, we use volume measurements as provided by the challenge’s organizers. For the prediction, we use an automated machine learning (AutoML) approach, as determining a good machine learning pipeline is a tedious and error-prone task for humans. A typical ML pipeline includes various types of preprocessing that can be applied to input features. Afterwards, an appropriate classifier needs to be selected and the optimal hyperparameters selected to achieve high predictive performance. The goal of AutoML is to automate the whole machine learning pipeline. A recent overview of AutoML approaches together with an analysis of the results of ChaLearn AutoML Challenges over the last four years is given in [5]. AutoML has not yet been widely explored in the medical field, with PubMed listing only four articles [1, 7, 10, 14]; none of which study MRI or neuroscience.
2 Data
Data was provided by The Adolescent Brain Cognitive Development(ABCD) Study [13], which recruited children aged 9–10. Participants were given access to T1-weighted MRI scans from 3,736 children for training, 415 children for validation, and 4,402 children for testing. Fluid intelligence scores were residualized to account for confounding due to sex at birth, ethnicity, highest parental education, parental income, parental marital status, and image acquisition site. Residualized fluid intelligence scores were provided for the training and validation data, but not for the test data. All data was obtained from the National Institute of Mental Health Data Archive.Footnote 1
3 Methods
Our proposed pipeline for the prediction of fluid intelligence from T1-weighted MRI scans builds on the Automated Machine Learning (AutoML) framework summarized in Fig. 1. Scans were acquired according to the acquisition protocol of the Adolescent Brain Cognitive Development (ABCD) study protocol.Footnote 2 For parcellation of the brain and the estimation of volume of each region of interest, we relied on the work of the challenge’s organizers.
3.1 Feature-Preprocessing
We used volume measurements of 122 regions of interest extracted by the challenge’s organizers from each T1-weighted MRI scan based on the SRI24 atlas [15].Footnote 3 We normalized all volume measurements while accounting for outliers by subtracting the median and dividing by the range between the 5% and 95% percentile. Thus, we reduce the impact of outliers and still obtain approximately centered features with equal scale. Finally, the provided residualized fluid intelligence scores in the training data where standardized to zero mean and unit variance; the same transformation as derived from the training data was applied to features and scores in the validation and test data. Additional pre-processing steps were selected without human interaction as described in the next section.
3.2 Automated Machine Learning
For the prediction of residualized fluid intelligence score, we used automated machine learning that leverages recent advances in Bayesian optimization, meta-learning, and ensemble construction. For every machine learning task, the fundamental problem is to decide which machine learning algorithm to use and whether and how to pre-process features. This task is extremely challenging, because there is no single algorithm that performs best on all datasets and the performance of machine learning methods depends to a large extent on their hyper-parameter settings, which can vary from one task to the next. Here, we use AutoML for the prediction of the residualized fluid intelligence score by producing test set predictions without human input within a given computational budget. Specifically, we employ Combined Algorithm Selection and Hyperparameter (CASH) optimization [3].
Let \(\mathcal {A} = \{ A^{(1)}, \ldots , A^{(R)} \}\) be a set of machine learning algorithms, and \(\varLambda ^{(j)}\) be the domain of the hyper-parameters of each algorithm. Further, we define \(\mathcal {D}_\text {train} = \{ (\mathbf {x}_1, y_1), \ldots , (\mathbf {x}_n, y_n) \}\) to be the training set, which we split into K cross-validation folds to obtain \(\{ \mathcal {D}_\text {train}^{(1)}, \ldots , \mathcal {D}_\text {train}^{(K)} \}\) and \(\{ \mathcal {D}_\text {valid}^{(1)}, \ldots , \mathcal {D}_\text {valid}^{(K)} \}\) with \(\mathcal {D}_\text {train}^{(k)} = \mathcal {D}_\text {train} \backslash \mathcal {D}_\text {valid}^{(k)}\). For a particular hyper-parameter configuration \(\varvec{\varTheta }\), we solve the CASH optimization problem
where \(\hat{f}_{A_{\varvec{\varTheta }}^{(j)}}(\mathbf {x}_i\,|\,\mathcal {D}_\text {train}^{(k)})\) denotes the prediction on the validation set of model \(A^{(j)}\) with hyper-parameters \(\varvec{\varTheta }\) and trained on \(\mathcal {D}_\text {train}^{(k)}\). This optimization problem can be solved via Sequential Model-based Algorithm Configuration (SMAC), a technique for Bayesian black-box optimization that uses a random-forest-based surrogate model [6]. The main idea of SMAC is to use the surrogate model to predict an algorithm’s performance for a given hyper-parameter optimization. It is able to interpolate the performance of algorithms between observed hyper-parameter configurations and previously unseen configurations in the hyper-parameter space. Thus, it enables us to focus on promising hyper-parameter configurations.
We employed the auto-sklearn toolkit (version 0.5.0), which for a given user-provided computational budget in terms of run time and memory, auto-sklearn searches for the best machine learning pipeline to predict the residualized fluid intelligence score by combining components of the scikit-learn machine learning framework (version 0.18.2) [12]. Figure 1 depicts an overview of the AutoML framework. For data preprocessing, AutoML can choose from 11 algorithms for data transformations, such as principal component analysis. For feature preprocessing 6 feature-wise transformations are available, such as transforming each feature to have zero mean and unit variance. Finally, AutoML can choose from 13 regression models. After evaluating various machine learning pipelines, comprising data transformations, feature transformations, and regression model, the best M pipelines are combined via ensemble selection [2] to form the final prediction model. We used a budget that consisted of a total run time of 40 h, where each pipeline was limited to 6 min and 4 GB of memory. The final ensemble size was \(M=50\).
3.3 Feature Importance
While complex prediction pipelines are potentially powerful, their black-box nature is often a barrier for employing such a model in clinical research. We use Shapley values to explain the predictions of our final ensemble of prediction pipelines. Shapley values are a classic solution in game theory to determine the distribution of credits to players participating in a cooperative game [16, 17]. They have first been proposed for linear models in the presence of multicollinearity [8]. A Shapley value assigns an importance value \(\phi _j\) to each feature j that reflects its effect on the model’s prediction. To compute this effect, retraining the model \(f(\cdot )\) on all possible feature subsets \(\mathcal {S} \subseteq \mathcal {F} \backslash \{j\}\) of all features \(\mathcal {F}\) is necessary. Given a feature vector \(\mathbf {x} \in \mathbb {R}^{|\mathcal {F}|}\), the j-th Shapley value can then be computed as the weighted average of all prediction differences:
where \(\hat{f}_S( \mathbf {x}^{\mathcal {S}} )\) denotes the prediction of a model trained and evaluated on the feature subset \(\mathcal {S}\). The exact computation of Shapley values requires evaluating all \(2^{|\mathcal {F}|}\) possible feature subsets, which is only reasonable when data consists of not more than a few dozen features. To address this problem, we employ the recently proposed SHAP (SHapley Additive exPlanations) values, which belong to the class of additive feature importance measures [9]. The exact computation of SHAP values is prohibitive, therefore we approximate SHAP values using the model-agnostic KernelSHAP approach proposed in [9]. To obtain a global measure of feature importance, we compute the average magnitude of SHAP values across all N subjects in the data:
4 Results
The performance of the final ensemble is summarized in Table 1. It reveals that predicting residualized fluid intelligence from MRI-derived volume measurements is a challenging task. In particular, the proposed model struggles to reliably predict residualized fluid intelligence at the extremes of the distribution, i.e., very low or very high values. Consequently, we observe a relatively high mean squared error, which is an order of magnitude larger than the mean absolute error. Moreover, the large difference between the performance on the training data and the validation data indicates that overfitting seems to be an issue.
In total, we evaluated 2,608 machine learning algorithms (see Table 2). The components of our final ensemble of 50 machine learning pipelines is summarized in Table 3. Principal component analysis [11] was selected most often (15 times) for data pre-processing. The final ensemble was comprised of linear and non-linear regression models with ensembles of randomized regression trees [4] being selected most frequently (14 times). Looking at the top-performing pipelines in the ensemble, we noticed that combining principal component analysis with a tree-based ensemble was a frequently selected combination (5 out of the top 10 performing pipelines).
Next, we inspected which MRI-derived feature the model deems most important by computing SHAP values for each feature and subject in the training data. Figure 2 lists the top 20 features by mean absolute SHAP value \(\phi \). The top ranked feature is pons white matter volume (\(\phi = 0.0183\)), followed by left parahippocampal gyrus volume (\(\phi = 0.0155\)), and left lateral ventricle cerebral spinal fluid volume (\(\phi = 0.0148\)). However, we note that individual SHAP values are rather small, which is evidence that fluid intelligence is not strongly influenced by a single brain region, but a complex inter-relationship between different regions. Individual, subject-specific SHAP values depicted in Fig. 2b indicate that larger left and right parahippocampal gyrus volume are associated with a decrease in fluid intelligence, while larger pons white matter volume is associated with an increase.
5 Conclusion
We proposed an AutoML model for the prediction of fluid intelligence from T1-weighted magnetic resonance images based on more than 2,600 evaluated machine learning pipelines. Our experiments demonstrate that it is challenging for our ensemble to reliably predict fluid intelligence from MRI scans. In particular, errors on the validation and test data were more than four times higher than on the training data, which is evidence for overfitting. We analyzed the final model’s predictions using SHAP values. Results revealed that top ranked features still explain only a small fraction of the fluid intelligence score. Therefore, we concluded that current features derived from MRI are insufficient to robustly measure fluid intelligence. While current features are generic descriptors of the brain anatomy, we believe future research should focus on deriving tailor-made features from MRI, specific to the prediction of fluid intelligence, which could then be used to improve our understanding of the neurobiology underlying fluid intelligence.
Notes
References
Barreiro, E., Munteanu, C.R., Cruz-Monteagudo, M., Pazos, A., González-Díaz, H.: Net-net auto machine learning (AutoML) prediction of complex ecosystems. Sci. Rep. 8(1), 12340 (2018)
Caruana, R., Niculescu-Mizil, A.: Ensemble selection from libraries of models. In: Proceedings of the 21st International Conference on Machine Learning, p. 18 (2004)
Feurer, M., Klein, A., Eggensperger, K., Springenberg, J., Blum, M., Hutter, F.: Efficient and robust automated machine learning. In: Advances in Neural Information Processing Systems 28, pp. 2962–2970 (2015)
Geurts, P., Ernst, D., Wehenkel, L.: Extremely randomized trees. Mach. Learn. 63(1), 3–42 (2006)
Guyon, I., et al.: Analysis of the AutoML challenge series 2015–2018. In: Hutter, F., Kotthoff, L., Vanschoren, J. (eds.) Automated Machine Learning. TSSCML, pp. 177–219. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-05318-5_10
Hutter, F., Hoos, H.H., Leyton-Brown, K.: Sequential model-based optimization for general algorithm configuration. In: Coello, C.A.C. (ed.) LION 2011. LNCS, vol. 6683, pp. 507–523. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-25566-3_40
Le, T.T., Fu, W., Moore, J.H.: Scaling tree-based automated machine learning to biomedical big data with a feature set selector. Bioinformatics, 1–7 (2019)
Lipovetsky, S., Conklin, M.: Analysis of regression in game theory approach. Appl. Stoch. Models Bus. Ind. 17(4), 319–330 (2001)
Lundberg, S.M., Lee, S.I.: A unified approach to interpreting model predictions. In: Advances in Neural Information Processing Systems 30, pp. 4765–4774 (2017)
Orlenko, A., et al.: Considerations for automated machine learning in clinical metabolic profiling: altered homocysteine plasma concentration associated wtih metformin exposure. In: Pacific Symposium on Biocomputing, vol. 23. World Scientific (2017)
Pearson, K.: On lines and planes of closest fit to systems of points in space. Lond. Edinburgh Dublin Philos. Mag. J. Sci. 2(11), 559–572 (1901)
Pedregosa, F., et al.: Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
Pfefferbaum, A., et al.: Altered brain developmental trajectories in adolescents after initiating drinking. Am. J. Psychiatry 175(4), 370–380 (2018)
Puri, M.: Automated machine learning diagnostic support system as a computational biomarker for detecting drug-induced liver injury patterns in whole slide liver pathology images. Assay Drug Dev. Technol. (2019)
Rohlfing, T., Zahr, N.M., Sullivan, E.V., Pfefferbaum, A.: The SRI24 multichannel atlas of normal adult human brain structure. Hum. Brain Mapp. 31(5), 798–819 (2010)
Shapley, L.S.: A value for n-person games. Contrib. Theory Games 2(28), 307–317 (1953)
Štrumbelj, E., Kononenko, I.: Explaining prediction models and individual predictions with feature contributions. Knowl. Inf. Syst. 41(3), 647–665 (2014)
Acknowledgements
This research was partially supported by the Bavarian State Ministry of Education, Science and the Arts in the framework of the Centre Digitisation.Bavaria (ZD.B).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Pölsterl, S., Gutiérrez-Becker, B., Sarasua, I., Guha Roy, A., Wachinger, C. (2019). An AutoML Approach for the Prediction of Fluid Intelligence from MRI-Derived Features. In: Pohl, K., Thompson, W., Adeli, E., Linguraru, M. (eds) Adolescent Brain Cognitive Development Neurocognitive Prediction. ABCD-NP 2019. Lecture Notes in Computer Science(), vol 11791. Springer, Cham. https://doi.org/10.1007/978-3-030-31901-4_12
Download citation
DOI: https://doi.org/10.1007/978-3-030-31901-4_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-31900-7
Online ISBN: 978-3-030-31901-4
eBook Packages: Computer ScienceComputer Science (R0)