Abstract
Recent advancements in systematic analysis of high resolution whole slide images have increase efficiency of diagnosis, prognosis and prediction of cancer and important diseases. Due to the enormous sizes and dimensions of whole slide images, the analysis requires extensive computing resources which are not commonly available. Images have to be tiled for processing due to computer memory limitations, which lead to inaccurate results due to the ignorance of boundary crossing objects. Thus, we propose a generic and highly scalable cloud-based image analysis framework for whole slide images. The framework enables parallelized integration of image analysis steps, such as segmentation and aggregation of micro-structures in a single pipeline, and generation of final objects manageable by databases. The core concept relies on the abstraction of objects in whole slide images as different classes of spatial geometries, which in turn can be handled as text based records in MapReduce. The framework applies an overlapping partitioning scheme on images, and provides parallelization of tiling and image segmentation based on MapReduce architecture. It further provides robust object normalization, graceful handling of boundary objects with an efficient spatial indexing based matching method to generate accurate results. Our experiments on Amazon EMR show that MaReIA is highly scalable, generic and extremely cost effective by benchmark tests.
Similar content being viewed by others
References
Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., Devin, M., Ghemawat, S., Irving, G., Isard, M., et al.: Tensorflow: a system for large-scale machine learning. OSDI 16, 265–283 (2016)
Aji, A., Wang, F., Saltz, J.H.: Towards building a high performance spatial query system for large scale medical imaging data. In: SIGSPATIAL/GIS, pp. 309–318. ACM (2012)
Aji, A., Wang, F., Vo, H., Lee, R., Liu, Q., Zhang, X., Saltz, J.: Hadoop-GIS: a high performance spatial data warehousing system over MapReduce. Proc. VLDB Endow. 6(11), 1009–1020 (2013)
Apache hadoop. http://hadoop.apache.org
Apache spark. http://spark.apache.org
Apache spark. http://storm.apache.org
Beckmann, N., Kriegel, H., Schneider, R., Seeger, B.: The r*-tree: an efficient and robust access method for points and rectangles. In: SIGMOD (1990)
Boost c++ libraries (2013). http://www.boost.org/
Clipper library. http://www.angusj.com/delphi/clipper.php
Cooper, L.A.D., Kong, J., Gutman, D.A., Wang, F., Gao, J., Appin, C., Cholleti, S., Pan, T., Sharma, A., Scarpace, L., Mikkelsen, T., Kurc, T., Moreno, C.S., Brat, D.J., Saltz, J.H.: Integrated morphologic analysis for the identification and characterization of disease subtypes. J. Am. Med. Inform. Assoc. 19(2), 317–323 (2012)
Cooper, L.A., Kong, J., Gutman, D.A., Dunn, W.D., Nalisnik, M., Brat, D.J.: Novel genotype-phenotype associations in human cancers enabled by advanced molecular platforms and computational analysis of whole slide images. Lab. Investig. 95(4), 366–376 (2015)
Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)
Foran, D.J., Yang, L., Chen, W., Hu, J., Goodell, L.A., Reiss, M., Wang, F., Kurç, T.M., Pan, T., Sharma, A., Saltz, J.H.: Imageminer: a software system for comparative analysis of tissue microarrays using content-based image retrieval, high-performance computing, and grid technology. JAMIA 18(4), 403–415 (2011)
Geospatial standard. http://www.opengeospatial.org/standards/sfs
Gu, L., Li, H.: Memory or time: Performance evaluation for iterative operation on hadoop and spark. In: IEEE 10th International Conference on High Performance Computing and Communications & 2013 IEEE International Conference on Embedded and Ubiquitous Computing (HPCC\_EUC), pp. 721–727. IEEE (2013)
Hare, J.S., Samangooei, S., Dupplaw, D.P.: Openimaj and imageterrier: Java libraries and tools for scalable multimedia analysis and indexing of images. In: Proceedings of the 19th ACM International Conference on Multimedia, pp. 691–694. ACM (2011)
Kong, L.C.J., Moreno, C., Wang, F., Kurc, T., Saltz, J., Brat, D.: In silico analysis of nuclei in glioblastoma using large-scale microscopy images improves prediction of treatment response. In: EMBC (2011)
Kong, J., Cooper, L.A.D., Wang, F., Gao, J., Teodoro, G., Scarpace, L., Mikkelsen, T., Schniederjan, M.J., Moreno, C.S., Saltz, J.H., Brat, D.J.: Machine-based morphologic analysis of glioblastoma using whole-slide pathology images uncovers clinically relevant molecular correlates. PLoS ONE 8(11), e81049 (2013)
Kothari, S., Phan, J.H., Stokes, T.H., Wang, M.D.: Pathology imaging informatics for quantitative analysis of whole-slide images. J. Am. Med. Inform. Assoc. 20(6), 1099–1108 (2013)
Liang, Y., Wang, F., Treanor, D., Magee, D., Roberts, N., Teodoro, G., Zhu, Y., Kong, J.: A framework for 3d vessel analysis using whole slide images of liver tissue sections. Int. J. Comput. Biol. Drug Des. 9(1–2), 102–119 (2016)
Markonis, D., Schaer, R., Eggel, I., Müller, H., Depeursinge, A.: Using mapreduce for large-scale medical image analysis (2015). arXiv:1510.06937
Sweeney, C., Liu, L., Arietta, S., Lawrence, J.: Hipi: A Hadoop Image Processing Interface for Image-based Mapreduce Tasks. University of Virginia, Chris (2011)
Teodoro, G., Pan, T., Kurc, T., Kong, J., Cooper, L., Podhorszki, N., Klasky, S., Saltz, J.: High-throughput analysis of large microscopy image datasets on cpu-gpu cluster platforms. In: IPDPS, pp. 103–114 (2013)
Vo, H., Kong, J., Teng, D., Liang, Y., Aji, A., Teodoro, G., Wang, F.: Cloud-based whole slide image analysis using mapreduce. In: VLDB Workshop on Data Management and Analytics for Medicine and Healthcare, pp. 62–77. Springer, New York (2016)
Wang, X.Z.F., Lee, R., Saltz, J.: Towards building high performance medical image management system for clinical trials. In: SPIE Medical, Imaging, pp. 762805–762811 (2011)
Wang, F., Oh, T.W., Vergara-Niedermayr, C., Kurc, T., Saltz, J.: Managing and querying whole slide images. In: SPIE Medical Imaging (2012)
Acknowledgements
This research is supported in part by Grants from National Science Foundation ACI 1443054 and IIS 1350885, National Institute of Health K25CA181503, and CNPq.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Vo, H., Kong, J., Teng, D. et al. MaReIA: a cloud MapReduce based high performance whole slide image analysis framework. Distrib Parallel Databases 37, 251–272 (2019). https://doi.org/10.1007/s10619-018-7237-1
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10619-018-7237-1