Abstract
A robust medical image computing infrastructure must host massive multimodal archives, perform extensive analysis pipelines, and execute scalable job management. An emerging data format standard, the Brain Imaging Data Structure (BIDS), introduces complexities for interfacing with XNAT archives. Moreover, workflow integration is combinatorically problematic when matching large amount of processing to large datasets. Historically, workflow engines have been focused on refining workflows themselves instead of actual job generation. However, such an approach is incompatible with data centric architecture that hosts heterogeneous medical image computing. Distributed automation for XNAT toolkit (DAX) provides large-scale image storage and analysis pipelines with an optimized job management tool. Herein, we describe developments for DAX that allows for integration of XNAT and BIDS standards. We also improve DAX’s efficiencies of diverse containerized workflows in a high-performance computing (HPC) environment. Briefly, we integrate YAML configuration processor scripts to abstract workflow data inputs, data outputs, commands, and job attributes. Finally, we propose an online database–driven mechanism for DAX to efficiently identify the most recent updated sessions, thereby improving job building efficiency on large projects. We refer the proposed overall DAX development in this work as DAX-1 (DAX version 1). To validate the effectiveness of the new features, we verified (1) the efficiency of converting XNAT data to BIDS format and the correctness of the conversion using a collection of BIDS standard containerized neuroimaging workflows, (2) how YAML-based processor simplified configuration setup via a sequence of application pipelines, and (3) the productivity of DAX-1 on generating actual HPC processing jobs compared with earlier DAX baseline method. The empirical results show that (1) DAX-1 converting XNAT data to BIDS has similar speed as accessing XNAT data only; (2) YAML can integrate to the DAX-1 with shallow learning curve for users, and (3) DAX-1 reduced the job/assessor generation latency by finding recent modified sessions. Herein, we present approaches for efficiently integrating XNAT and modern image formats with a scalable workflow engine for the large-scale dataset access and processing.
Similar content being viewed by others
Availability of Data and Materials
All associate data of the manuscript are archived XNAT.Vanderbilt.edu, which is operated by the VUIIS Center for Computational Imaging (VUIIS CCI) as part of the Human Imaging Core.
Code Availability
The DAX-1 is available at https://github.com/VUIIS/dax, which also stores the refined Xnatdownload tool. The tools can be installed with pip by pip install dax. The existing processor examples are available at https://github.com/VUIIS/dax_yaml_processor_examples. The full description usage of BIDSMapping is available at https://dax.readthedocs.io/en/latest/BIDS_walkthrough.html. The full description of DAX YAML processor is available at is available at https://dax.readthedocs.io/en/latest/processors.html.
References
D. S. Marcus, T. R. Olsen, M. Ramaratnam, and R. L. Buckner, The extensible neuroimaging archive toolkit, Neuroinformatics, vol. 5, no. 1, pp. 11-33, 2007.
F. Marco, XTENS—an eXTensible environment for neuroscience, Healthgrid Research, Innovation, and Business Case: Proceedings of HealthGrid, vol. 147, p. 127, 2009.
A. Scott et al., COINS: an innovative informatics and neuroimaging tool suite built for large heterogeneous datasets, Frontiers in neuroinformatics, vol. 5, p. 33, 2011.
K. J. Gorgolewski et al., The brain imaging data structure, a format for organizing and describing outputs of neuroimaging experiments, Scientific data, vol. 3, no. 1, pp. 1-9, 2016.
K. J. Gorgolewski et al., BIDS apps: Improving ease of use, accessibility, and reproducibility of neuroimaging data analysis methods, PLoS computational biology, vol. 13, no. 3, p. e1005209, 2017.
K. Gorgolewski et al., Nipype: a flexible, lightweight and extensible neuroimaging data processing framework in python, Frontiers in neuroinformatics, vol. 5, p. 13, 2011.
H. C. Achterberg, M. Koek, and W. J. Niessen, Fastr: A workflow engine for advanced data flows in medical image analysis, Frontiers in ICT, vol. 3, p. 15, 2016.
T. Van Mourik, L. Snoek, T. Knapen, and D. G. Norris, Porcupine: A visual pipeline tool for neuroimaging analysis, PLoS computational biology, vol. 14, no. 5, p. e1006064, 2018.
E. Deelman et al., Pegasus, a workflow management system for science automation, Future Generation Computer Systems, vol. 46, pp. 17-35, 2015.
T. Glatard, J. Montagnat, D. Lingrand, and X. Pennec, Flexible and efficient workflow deployment of data-intensive applications on grids with MOTEUR, The International Journal of High Performance Computing Applications, vol. 22, no. 3, pp. 347-360, 2008.
B. C. Lucas et al., The Java Image Science Toolkit (JIST) for rapid prototyping and publishing of neuroimaging software, Neuroinformatics, vol. 8, no. 1, pp. 5-17, 2010.
C. J. Goch et al., Automated Containerized Medical Image Processing Based on MITK and Python, in Bildverarbeitung für die Medizin 2018: Springer, 2018, pp. 315-315.
S. A. Mattonen, D. Gude, S. Echegaray, S. Bakr, D. L. Rubin, and S. Napel, Quantitative imaging feature pipeline: a web-based tool for utilizing, sharing, and building image-processing pipelines, Journal of Medical Imaging, vol. 7, no. 4, p. 042803, 2020.
J. Matelsky, G. Kiar, E. Johnson, C. Rivera, M. Toma, and W. Gray-Roncal, Container-based clinical solutions for portable and reproducible image analysis, Journal of digital imaging, vol. 31, no. 3, pp. 315-320, 2018.
T. Kroes et al., PIM: A visualization-oriented web application for monitoring and debugging of large-scale image processing studies, in Medical Imaging 2020: Imaging Informatics for Healthcare, Research, and Applications, 2020, vol. 11318: International Society for Optics and Photonics, p. 1131808.
Y. Huo et al., Towards portable large-scale image processing with high-performance computing, Journal of digital imaging, vol. 31, no. 3, pp. 304-314, 2018.
S. M. Damon, B. D. Boyd, A. J. Plassard, W. Taylor, and B. A. Landman, DAX-the next generation: towards one million processes on commodity hardware, in Medical Imaging 2017: Imaging Informatics for Healthcare, Research, and Applications, 2017, vol. 10138: International Society for Optics and Photonics, p. 101380C.
R. L. Harrigan et al., Vanderbilt University Institute of Imaging Science Center for Computational Imaging XNAT: A multimodal data archive and processing environment, NeuroImage, vol. 124, pp. 1097-1101, 2016.
S. de Lusignan, J. Metsemakers, P. Houwink, V. Gunnarsdottir, and J. VanDerLei, Routinely collected general practice data: goldmines for research? A report of the European Federation for Medical Informatics Primary Care Informatics Working Group (EFMI PCIWG) from MIE2006, Maastricht, the Netherlands, Journal of Innovation in Health Informatics, vol. 14, no. 3, pp. 203-209, 2006.
G. Lizarraga et al., A neuroimaging web services interface as a cyber physical system for medical imaging and data management in brain research: Design study, JMIR medical informatics, vol. 6, no. 2, p. e26, 2018.
R. Pienaar et al., CHIPS–A Service for Collecting, Organizing, Processing, and Sharing Medical Image Data in the Cloud, in VLDB Workshop on Data Management and Analytics for Medicine and Healthcare, 2017: Springer, pp. 29-35.
B. Fischl, FreeSurfer, Neuroimage, vol. 62, no. 2, pp. 774-781, 2012.
N. J. Tustison et al., Large-scale evaluation of ANTs and FreeSurfer cortical thickness measurements, Neuroimage, vol. 99, pp. 166-179, 2014.
O. Esteban, D. Birman, M. Schaer, O. O. Koyejo, R. A. Poldrack, and K. J. Gorgolewski, MRIQC: Advancing the automatic prediction of image quality in MRI from unseen sites, PloS one, vol. 12, no. 9, p. e0184661, 2017.
O. Esteban et al., fMRIPrep: a robust preprocessing pipeline for functional MRI, Nature methods, vol. 16, no. 1, pp. 111-116, 2019.
M. Cieslak et al., QSIPrep: an integrative platform for preprocessing and reconstructing diffusion MRI data, Nature methods, vol. 18, no. 7, pp. 775-778, 2021.
M. I. Restrepo. Behavioral Neuroimaging Core User Manual. https://docs.ccv.brown.edu/bnc-user-manual/ (accessed 07/30, 2021).
W. Gentzsch, Sun grid engine: Towards creating a compute power grid, in Cluster Computing and the Grid, 2001. Proceedings. First IEEE/ACM International Symposium on, 2001: IEEE, pp. 35–36.
A. B. Yoo, M. A. Jette, and M. Grondona, Slurm: Simple linux utility for resource management, in Workshop on job scheduling strategies for parallel processing, 2003: Springer, pp. 44-60.
G. Staples, Torque resource manager, in Proceedings of the 2006 ACM/IEEE conference on Supercomputing, 2006, pp. 8-es.
K. Czajkowski et al., A resource management architecture for metacomputing systems, in Workshop on Job Scheduling Strategies for Parallel Processing, 1998: Springer, pp. 62-82.
R. L. Henderson, Job scheduling under the portable batch system, in Workshop on Job Scheduling Strategies for Parallel Processing, 1995: Springer, pp. 279-294.
S. Zhou, Lsf: Load sharing in large heterogeneous distributed systems, in I Workshop on cluster computing, 1992, vol. 136.
B. Tyers, INI Files, in Practical GameMaker: Studio: Springer, 2016, pp. 155–160.
T. Preston-Werner, TOML-Tom’s Obvious, Minimal Language. https://github.com/toml-lang/toml, 2019.
T. Bray, J. Paoli, C. M. Sperberg-McQueen, E. Maler, and F. Yergeau, Extensible markup language (XML) 1.0, ed: W3C recommendation October, 2000.
P. Amstutz et al., Common workflow language, v1. 0, 2016.
L. Bassett, Introduction to JavaScript object notation: a to-the-point guide to JSON. O'Reilly Media, Inc., 2015.
O. Ben-Kiki, C. Evans, and B. Ingerson, Yaml ain't markup language (yaml™) version 1.1, Working Draft 2008–05, vol. 11, 2009.
L. Ferrucci, The Baltimore Longitudinal Study of Aging (BLSA): a 50-year-long journey and plans for the future, vol. 63, ed: Oxford University Press, 2008, pp. 1416–1419.
D. C. Van Essen et al., The Human Connectome Project: a data acquisition perspective, Neuroimage, vol. 62, no. 4, pp. 2222-2231, 2012.
B. R. Howell et al., The UNC/UMN Baby Connectome Project (BCP): An overview of the study design and protocol development, NeuroImage, vol. 185, pp. 891-905, 2019.
P. A. Harris, R. Taylor, R. Thielke, J. Payne, N. Gonzalez, and J. G. Conde, Research electronic data capture (REDCap)—a metadata-driven methodology and workflow process for providing translational research informatics support, Journal of biomedical informatics, vol. 42, no. 2, pp. 377-381, 2009.
D. H. Zald and B. B. Lahey, Implications of the hierarchical structure of psychopathology for psychiatric neuroimaging, Biological Psychiatry: Cognitive Neuroscience and Neuroimaging, vol. 2, no. 4, pp. 310-317, 2017.
D. J. Simmonds, J. J. Pekar, and S. H. Mostofsky, Meta-analysis of Go/No-go tasks demonstrating that fMRI activation associated with response inhibition is task-dependent, Neuropsychologia, vol. 46, no. 1, pp. 224-232, 2008.
B. Knutson, A. Westdorp, E. Kaiser, and D. Hommer, FMRI visualization of brain activity during a monetary incentive delay task, Neuroimage, vol. 12, no. 1, pp. 20-27, 2000.
S. B. Eickhoff et al., Co-activation patterns distinguish cortical modules, their connectivity and functional differentiation, Neuroimage, vol. 57, no. 3, pp. 938-949, 2011.
I. Lyu, H. Kang, N. D. Woodward, M. A. Styner, and B. A. Landman, Hierarchical Spherical Deformation for Cortical Surface Registration, Medical image analysis, 2019.
P. Parvathaneni et al., Cortical Surface Parcellation using Spherical Convolutional Neural Networks, arXiv preprint arXiv:1907.05395, 2019.
Y. Huo et al., Spatially localized atlas network tiles enables 3D whole brain segmentation from limited data, in International Conference on Medical Image Computing and Computer-Assisted Intervention, 2018: Springer, pp. 698-705.
Y. Huo et al., 3D whole brain segmentation using spatially localized atlas network tiles, NeuroImage, vol. 194, pp. 105-119, 2019.
Y. Huo, A. Carass, S. M. Resnick, D. L. Pham, J. L. Prince, and B. A. Landman, Combining multi-atlas segmentation with brain surface estimation, in Medical Imaging 2016: Image Processing, 2016, vol. 9784: International Society for Optics and Photonics, p. 97840E.
Y. Huo et al., Consistent cortical reconstruction and multi-atlas brain segmentation, NeuroImage, vol. 138, pp. 197-210, 2016.
B. Casey et al., The adolescent brain cognitive development (ABCD) study: imaging acquisition across 21 sites, Developmental cognitive neuroscience, vol. 32, pp. 43-54, 2018.
S. M. Resnick et al., One-year age changes in MRI brain volumes in older adults, Cerebral cortex, vol. 10, no. 5, pp. 464-472, 2000.
L. Y. Cai et al., PreQual: An automated pipeline for integrated preprocessing and quality assurance of diffusion weighted MRI images, bioRxiv, 2020.
Funding
This research was supported by the following: NSF CAREER 1,452,485 and NIH R01EB017230 (Landman). National Institutes of Health in part by the National Institute of Biomedical Imaging and Bioengineering training Grant T32-EB021937. The National Center for Research Resources, Grant UL1 RR024975-01, and is now at the National Center for Advancing Translational Sciences, Grant 2 UL1 TR000445-06. NIH S10 Shared Instrumentation Grant 1S10OD020154-01 (Smith).ACCRE’s Big Data TIPs Grant from the Vanderbilt University
Author information
Authors and Affiliations
Contributions
Dr. Shunxing Bao, Brian D. Boyd and Praitayini Kanakaraj contributed equally to this work. Dr. Shunxing Bao: conceptualization, methodology, software, validation, formal analysis, writing—original draft. Mr. Brian D. Boyd: software, validation. Ms. Praitayini Kanakaraj: software, validation. Mr. Karthik Ramadass: software. Mr. Francisco A. C. Meyer: software. Ms. Yuqian Liu: software. Mr. William E. Duett: software. Dr. Yuankai Huo: software, writing — review and editing. Dr. Ilwoo Lyu: software, writing — review and editing. Dr. David H. Zald: resources, data curation, writing — review and editing. Dr. Seth A. Smith: resources, writing — review and editing. Dr. Baxter P. Rogers: software, writing — review and editing, project administration. Dr. Bennett A. Landman: conceptualization, validation, investigation, resources, writing — review and editing, supervision, project administration.
Corresponding author
Ethics declarations
Ethics Approval
Not applicable.
Consent to Participate
Not applicable.
Consent for Publication
Not applicable.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Bao, S., Boyd, B.D., Kanakaraj, P. et al. Integrating the BIDS Neuroimaging Data Format and Workflow Optimization for Large-Scale Medical Image Analysis. J Digit Imaging 35, 1576–1589 (2022). https://doi.org/10.1007/s10278-022-00679-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10278-022-00679-8