Abstract
Incomplete or inconsistent temporal neuroimaging records of patients over time pose a major challenge to accurately predict clinical scores for diagnosing Alzheimer’s Disease (AD). In this paper, we present an unsupervised method to learn enriched imaging biomarker representations that can simultaneously capture the information conveyed by all the baseline neuroimaging measures and the progressive variations of the available follow-up measurements of every participant. Our experiments on the Alzheimer’s Disease Neuroimaging Initiative (ADNI) dataset show improved performance in predicting cognitive outcomes thereby demonstrating the effectiveness of our proposed method.
ADNI—Data used in preparation of this article were obtained from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database (ad-ni.loni.usc.edu). As such, the investigators within the ADNI contributed to the design and implementation of ADNI and/or provided data but did not participate in analysis or writing of this report. A complete listing of ADNI investigators can be found at: https://adni.loni.usc.edu/wp-content/uploads/how to apply/ADNI Acknowledgement List.pdf.
You have full access to this open access chapter, Download conference paper PDF
Similar content being viewed by others
Keywords
1 Introduction
Alzheimer’s Disease (AD) is a chronic neurodegenerative disease that severely impacts patients’ thinking, memory, and behavior. AD is listed as the sixth leading cause of death in the United States of America, threatening 5.7 million American and 44 millions individuals worldwide [1]. Over the past decade, neuroimaging measures have been widely studied to predict disease status and/or cognitive performance [14]. However, there are some critical limitations in many predictive models, i.e., they routinely perform learning at each time points separately, ignoring the longitudinal variations among temporal brain phenotypes. First, since AD is a progressive neurodegenerative disorder, multiple consecutive neuroimaging records are usually required to monitor the disease progressions. It would be beneficial to explore the temporal relations among the longitudinal records of the biomarkers. Second, the records of neuroimaging biomarkers are often missing at some time points for some participants over the entire courses of the AD progressions, because it is difficult to conduct medical scans consistently across a large group of participants. To be more specific, higher mortality risk and cognitive impairment hinder older adults from staying in studies requiring multiple visits and thus result in incomplete data.
To overcome the first limitation of longitudinal data, many studies [3, 9, 17, 18] explore the temporal data structures of brain phenotypes over time. However, these models often formulate the temporal neuroimaging records as a tensor, which inevitably complicates the prediction problems. To handle the second limitation of data inconsistency, most longitudinal studies of AD only utilize the samples with complete temporal records for model analysis and ignore the samples with fewer time points. But discarding of observed samples may potentially neglects substantially valuable information in the data. To solve this problem, data imputation methods [4, 10] have been used to generate missing records over AD progressions, after which temporal regression studies can then be conducted. However, missing data imputation methods can introduce undesirable artifacts, which in turn can worsen the predictive power of the longitudinal models.
To deal with the longitudinal prediction problem with incomplete temporal inputs, we propose a novel method to learn an enriched biomarker representation to integrate the baseline records of the neuroimaging biomarkers of the entire cohort and the dynamic measures across all following time points of the individuals. First, it learns a global projection from the baseline records of the biomarkers from all the participants to preserve the global structure of a given neuroimaging dataset. Then, for each participant we learn a local projection from available follow-up neuroimaging records in the later couple of years to maintain the local data structure for the participant. Finally, a soft constraint is used to build consistency between the global and local projections. A schematic illustration of this formulation is shown in Fig. 1. Armed with the learned projection, we can transform the records with inconsistent sizes in a neuroimaging dataset into fixed-length vectors. With these fixed-length biomarker representations, we can make use of conventional regression methods to predict clinical scores.
2 Problem Formalization and the Solution Algorithm
In the task of representation learning for predicting cognitive outcomes, our goal is to learn a fixed-length biomarker representation vector for a participant from her or his baseline neuroimaging measures and all available follow-up measures. For the observed neuroimaging measures of a participant, we denote this information as: \(\mathcal { X } _ i = \{ \mathbf { x } _ i , \mathbf { X } _ i\}\), where \(i = 1,2, \cdots , m\) is the index of a participant in a neuroimaging dataset. In every \(\mathcal { X } _ i\), \(\mathbf { x } _ i\in \mathfrak {R}^ { d }\) is the baseline neuroimaging measure of the i-th participant where d denotes the dimensions of the neuroimaging feature set, and \(\mathbf { X } _ i = \left[ \mathbf { x } _ { 1 }^i, \ldots , \mathbf { x } _ {n_i}^i \right] \in \mathfrak {R}^ { d \times n_i }\) collects all available follow-up biomarker records of the i-th participant after the baseline where \(n_i\) denotes the number of available follow-up neuroimaging records of the same participant. Here we note that \(n_i\) varies over the dataset, because many participants of a cohort may have some missing records at certain time points.
Because the neuroimaging measures usually reside in a high-dimensional space, they can be redundant and noisy. Thus, we learn the compact representations of these measures via a global projection to only keep the most useful information of an input dataset. To achieve this goal, Principal Component Analysis (PCA) is the right tool, because it learns from the input data a linear projection to preserve as much information as possible in a low-dimensional projected subspace. Mathematically, PCA minimizes the reconstruction error via the projection \(\mathbf { W } \in \mathfrak {R}^ { d \times r }\) (usually \(r \ll d\)) that minimizes the following objective:
Usually the neuroimaging measures of each participant do not experience drastic changes in a short time, thus we still want to preserve the local consistency in the projected space for each participant. To uncover the local consistency among the neuroimaging records of each participant, we keep the local pairwise patterns of measures in the projected subspace. To achieve this, locality preserving projections (LPP) is the right tool to leverage. Given a pairwise similarity matrix \(\mathbf {S}_i \in \mathfrak {R}^{n_i \times n_i}\) of the i-th participant of the neuroimaging dataset, LPP preserves the local relationships and maximizes the smoothness of the manifold of the data in the embedding space by minimizing the following objective:
where \(s_{jk}^i\) is the pairwise relationship coefficient between the available records of the i-th participant at j-th and k-th time points.
To integrate the global and local consistencies of the neuroimaging measures at the same time, following the same idea in our previous works [6, 7, 15, 16] we can develop a combined objective using the above two objectives to minimize:
where we denote \(\mathcal {W} = \{ \mathbf {W} , \mathbf {W}_1 , \cdots , \mathbf {W}_m\}\). Through the third term of Eq. (3), the projections \(\mathbf {W}_i\) learned from each participant are aligned with the projection \(\mathbf {W}\) learned globally from the baseline measures of the entire dataset, such that the information conveyed by the global projection \(\mathbf {W}\) learned from all the participants as a whole can be transferred to each individual participant.
Taking into account that the squared \(\ell _2\)-norm is notorious for being very sensitive to the outliers in a dataset which usually results in the failure of traditional regression models, to improve the robustness of the learned enriched representations by our new method, we substitute the squared \(\ell _2\)-norm terms in Eq. (3) by their not-squared counterparts as follows:
Upon solving the optimization problem in Eq. (4), we learn a fixed-length representation for each participant by computing \(\{\mathbf {y}_i =\mathbf { W }_i^T \mathbf {x}_i \}_{i=1}^m\), which can be readily fed into traditional machine learning models.
Although the motivations of the formulation of our new method in Eq. (4) is clear and justifiable, it is a non-smooth objective, which is difficult to efficiently solve in general. Using the optimization framework presented in our earlier work [8] that solves non-smooth objectives using not-squared \(\ell _2\)-norm terms, we can solve Eq. (4) by an iterative procedure [8, Algorithm 1] in which the key step is to minimize the following objective:
where \(\mathbf {X} = [\mathbf {x}_1, \mathbf {x}_2, \cdots , \mathbf {x}_m] \in \mathfrak {R}^{d\times m}\) summarizes all the baseline measurements, and \(\mathbf {\Gamma } \in \mathfrak {R}^{m \times m }\) is a diagonal matrix whose i-th element is \(\gamma ^i =\frac{1}{2\sqrt{ \left\| \mathbf { x } _ { i } - \mathbf { W } \mathbf { W }^ { T } \mathbf { x } _ { i } \right\| _ { 2 } ^ { 2 } + \delta }}\). Defining \(\theta _{jk}^i = \frac{1}{2 \sqrt{ \left\| \mathbf {W}_i^T \mathbf {x}_{j}^i - \mathbf {W}_i^T \mathbf {x}_{k}^i \right\| _ { 2 } ^ { 2 } + \delta }}\) and \(\tilde{\mathbf {S}}_i \in \mathfrak {R}^{n_i \times n_i}\) whose element value is \(\tilde{s}_{jk}^i =\theta _{jk}^i s_{jk}^i \), in Eq. (5) we compute \(\mathbf {L}^i = \mathbf {D}^i -\tilde{\mathbf {S}}^i\) where \(\mathbf {D}^i\) is a diagonal matrix whose diagonal entries are the column (or row) sums of \(\tilde{\mathbf {S}}_i\), i.e., \(d_{jj}=\sum _{j} \tilde{s}_{jk} \).
The objective in Eq. (5) can be solved by the Alternating Direction Method of Multipliers (ADMM) [2] that minimizes the following equivalent objective:
where \(\mathcal {P}=\{\mathbf {P},\mathbf {P}_1,\mathbf {P}_2,\cdots ,\mathbf {P}_m\} \), \(\mathbf {\Lambda } \in \mathfrak {R}^{d \times r}\) is the Lagrangian multiplier for the constraint of \( \mathbf { W } = \mathbf {P} \), and \(\mathbf {\Lambda }_i \in \mathfrak {R}^{d \times r}\) is the Lagrangian multiplier for the constraint of \( \mathbf { W }_i = \mathbf {P}_i\). Algorithm (1) summaries the solution to Eq. (6).
3 Experiments
Data used in our experiments were obtained from the ADNI database (adni.loni.usc.edu). We downloaded 1.5T MRI scans and the demographic information for 821 ADNI-1 participants. We performed voxel-based morphometry (VBM) on the MRI data by following [12] and extracted mean modulated gray matter (GM) measures for 90 target regions of interest (ROI). These measures were adjusted for the baseline intracranial volume (ICV) using regression weights derived from the Healthy Control (HC) participants at the baseline. We also downloaded the longitudinal scores of the participants in the cognitive assessment of Mini-Mental State Examination (MMSE). The time points examined in this study for both imaging biomarkers and cognitive assessments include baseline (BL), Month 6 (M6), Month 12 (M12), Month 18 (M18) Month 24 (M24) and Month 36 (M36). All the participants’ data used in our enriched biomarker representation study are required to have a BL MRI measurement, BL cognitive score and at least two available measures from M6/M12/M18/M24/36. A total of 544 sample subjects are selected to perform in our study, among which we have 92 AD samples, and 205 Mild Cognitive Impairment (MCI) samples and 247 HC samples.
Experiment Settings. To validate the usefulness of our proposed method, we compare the performance to predict cognitive outcomes using two types of the neuroimaging inputs—the learned enriched representation and BL biomarker measurement. In our experiments, several methods proven to generalize well, such as ridge regression (RR), Lasso, support vector regression (SVR), and convolutional neural networks (CNN), are leveraged. For RR, Lasso and SVR models, we conduct a standard 5-fold cross-validation to fine tune the parameters and compute the root mean square error (RMSE) between the predicted values and ground truth values of the cognitive scores on the testing data. For the SVR model, the linear kernel is leveraged, for which the box constraint parameters are fine tuned by a grid search in standard 5-fold cross-validation as well. For the CNN regression model, we randomly select 70% of the neuroimaging measurements as the training set, 20% of the neuroimaging measurements as the validation set and the remaining 10% of the neuroimaging measurements as the testing set. The evaluation metrics are reported based on the results on the testing dataset. We construct a two-layer convolution architecture for the cognitive outcomes prediction and the dropout technique [13] is also leveraged to reduce overfitting in CNN models and prevent complex co-adaptations on training data. For the model parameters, \(\alpha ,\beta \) are fine tuned by searching the grid of \(\{10^{-5},\dots ,10^{-1},1,10,\cdots ,10^{5}\}\).
Experiment Results. From Table 1, we can see that the proposed enriched neuroimaging representation is consistently better than baseline representations when we use the four different methods, LR, RR Lasso, SVR and CNN. It can be attributed to the following reasons. First, the original baseline neuroimaging biomarker representation only contains the cognitive information of a participant at one single time point, which does not benefit from the correlation across different cognitive measures over the time. Instead, our proposed enriched neuroimaging biomarker representation could capture not only the baseline cognitive measurement, but also the temporal information conveyed by the longitudinal biomarkers over AD progressions. Our enriched neuroimaging representation could integrate the neuroimaging measurement at the fixed time point and the dynamic temporal changes. As AD is a progressively degenerative disease, incorporation of future information about subjects benefits the prediction model. Second, the original baseline neuroimaging measurements exhibits high dimensionality, which could be redundant and noisy. Thus the traditional methods easily suffer from “the curse of dimensionality”. Via the projection, we map the baseline cognitive measurement into a low dimensional space thereby mitigating the issue of high dimensionality.
Apart from the cognitive outcomes prediction task, another primary goal of our regression analysis is to identify a subset of imaging biomarkers which are highly correlated to AD progressions. Therefore, we examine the imaging biomarkers of each participant identified by the proposed methods encoded by the cognitive scores. A brief summary of all the patients’ projections can be conducted to determine the most important biomarkers across patients. We select the 10 most frequently appearing regions of interest from each patient’s projection coefficients to reconstruct a global frequency map, shown in Fig. 2. The bilateral hippocampus, amygdala, fusiform, putamen regions are found to be in the top selected biomarkers by our model, which are in nice accordance with the known clinical knowledge [5, 11].
4 Conclusion
Missing data is a critical challenge in longitudinal AD studies. In this paper, we proposed a formulation to learn a consistent-length representation for all the participants in the ADNI dataset. The enriched fixed length biomarker representation could capture the global consistency from baseline measurements and local pairwise pattern from available follow-up measurements of each participant at the same time. Our results show that the learned enriched representation beat the performance of the baseline measurement in predicting the clinical scores. Furthermore, the identified biomarkers are highly suggestive and strongly agree with existing research findings, which warrants the correctness of our approach.
References
Association, A., et al.: 2018 Alzheimer’s disease facts and figures. Alzheimer’s Dement. 14(3), 367–429 (2018)
Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J., et al.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends® Mach. Learn. 3(1), 1–122 (2011)
Brand, L., et al.: Joint high-order multi-task feature learning to predict the progression of Alzheimer’s disease. In: The Twenty-First International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2018), pp. 555–562 (2018)
Campos, S., Pizarro, L., Valle, C., Gray, K.R., Rueckert, D., Allende, H.: Evaluating imputation techniques for missing data in ADNI: a patient classification study. Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications. LNCS, vol. 9423, pp. 3–10. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-25751-8_1
De Jong, L.: Strongly reduced volumes of putamen and thalamus in Alzheimer’s disease: an MRI study. Brain 131(12), 3277–3285 (2008)
Liu, K., Wang, H., Han, F., Zhang, H.: Visual place recognition via robust \(\ell _2\)-norm distance based holism and landmark integration. In: The thirty-Third AAAI Conference on Artificial Intelligence (AAAI 2019) (2019)
Liu, K., Wang, H., Nie, F., Zhang, H.: Learning multi-instance enriched image representations via non-greedy ratio maximization of the \(\ell _1\)-norm distances. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2018 (CVPR 2018), pp. 7727–7735 (2018)
Liu, Y., Guo, Y., Wang, H., Nie, F., Huang, H.: Semi-supervised classifications via elastic and robust embedding. In: The thirty-First AAAI Conference on Artificial Intelligence (AAAI 2017) (2017)
Lu, L., Wang, H., Yao, X., Risacher, S., Saykin, A., Shen, L.: Predicting progressions of cognitive outcomes via high-order multi-modal multi-task feature learning. In: IEEE ISBI 2018, pp. 545–548. IEEE (2018)
Minhas, S., Khanum, A., Riaz, F., Alvi, A., Khan, S.A.: Early Alzheimer’s disease prediction in machine learning setup: empirical analysis with missing value computation. In: Jackowski, K., Burduk, R., Walkowiak, K., Woźniak, M., Yin, H. (eds.) IDEAL 2015. LNCS, vol. 9375, pp. 424–432. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24834-9_49
Poulin, S.P., Dautoff, R., Morris, J.C., Barrett, L.F., Dickerson, B.C., Initiative, A.D.N., et al.: Amygdala atrophy is prominent in early Alzheimer’s disease and relates to symptom severity. Psychiatry Res. Neuroimaging 194(1), 7–13 (2011)
Risacher, S.L., et al.: Longitudinal MRI atrophy biomarkers: relationship to conversion in the ADNI cohort. Neurobiol. Aging 31(8), 1401–1418 (2010)
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15, 1929–1958 (2014)
Stonnington, C.M., et al.: Predicting clinical scores from magnetic resonance scans in Alzheimer’s disease. Neuroimage 51(4), 1405–1413 (2010)
Wang, H., Huang, H., Ding, C.: Discriminant laplacian embedding. In: The twenty-Fourth AAAI Conference on Artificial Intelligence (AAAI 2010) (2010)
Wang, H., Nie, F., Huang, H.: Globally and locally consistent unsupervised projection. In: The twenty-Eighth AAAI Conference on Artificial Intelligence (AAAI 2014), pp. 1328–1333 (2014)
Wang, H., et al.: High-order multi-task feature learning to identify longitudinal phenotypic markers for Alzheimer’s disease progression prediction. In: NIPS, pp. 1277–1285 (2012)
Wang, X., et al.: Longitudinal genotype-phenotype association study via temporal structure auto-learning predictive model. In: Sahinalp, S.C. (ed.) RECOMB 2017. LNCS, vol. 10229, pp. 287–302. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-56970-3_18
Acknowledgements
L. Lu, S. Elbeleidy, L. Baker and H. Wang were partially supported by the National Science Foundation (NSF) under the grants of IIS 1652943 and IIS 1849359; H. Huang was partially supported by the National Institutes of Health (NIH) under the grant of NIH R01 AG049371 and by the NSF under the grants of NSF IIS 1836938, DBI 1836866, IIS 1845666, IIS 1852606, IIS 1838627, IIS 1837956; L. Shen was partially supported by the NIH under the grant of NIH R01 EB022574 and by the NSF under the grant of NSF IIS 1837964.
Author information
Authors and Affiliations
Consortia
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Lu, L. et al. (2019). Improved Prediction of Cognitive Outcomes via Globally Aligned Imaging Biomarker Enrichments over Progressions. In: Shen, D., et al. Medical Image Computing and Computer Assisted Intervention – MICCAI 2019. MICCAI 2019. Lecture Notes in Computer Science(), vol 11767. Springer, Cham. https://doi.org/10.1007/978-3-030-32251-9_16
Download citation
DOI: https://doi.org/10.1007/978-3-030-32251-9_16
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-32250-2
Online ISBN: 978-3-030-32251-9
eBook Packages: Computer ScienceComputer Science (R0)