Multicenter longitudinal neuroimaging has great potential to provide efficient and consistent biomarkers for research of neurodegenerative diseases and aging. In rare disease studies it is of primary importance to have a reliable tool that performs consistently for data from many different collection sites to increase study power. A multi-atlas labeling algorithm is a powerful brain image segmentation approach that is becoming increasingly popular in image processing. The present study examined the performance of multi-atlas labeling tools for subcortical identification using two types of in-vivo image database: Traveling Human Phantom (THP) and PREDICT-HD. We compared the accuracy (Dice Similarity Coefficient; DSC and intraclass correlation; ICC), multicenter reliability (Coefficient of Variance; CV), and longitudinal reliability (volume trajectory smoothness and Akaike Information Criterion; AIC) of three automated segmentation approaches: two multi-atlas labeling tools, MABMIS and MALF, and a machine-learning-based tool, BRAINSCut. In general, MALF showed the best performance (higher DSC, ICC, lower CV, AIC, and smoother trajectory) with a couple of exceptions. First, the results of accumben, where BRAINSCut showed higher reliability, were still premature to discuss their reliability levels since their validity is still in doubt (DSC < 0.7, ICC < 0.7). For caudate, BRAINSCut presented slightly better accuracy while MALF showed significantly smoother longitudinal trajectory. We discuss advantages and limitations of these performance variations and conclude that improved segmentation quality can be achieved using multi-atlas labeling methods. While multi-atlas labeling methods are likely to help improve overall segmentation quality, caution has to be taken when one chooses an approach, as our results suggest that segmentation outcome can vary depending on research interest.
Keywords: brain MRI; longitudinal data analysis; machine learning; multi-atlas label fusion; multicenter study; validation.