Abstract
In this work, we assess how pre-training strategy affects deep learning performance for the task of distinguishing false-recall from malignancy and normal (benign) findings in digital mammography images. A cohort of 1303 breast cancer screening patients (4935 digital mammogram images in total) was retrospectively analyzed as the target dataset for this study. We assessed six different convolutional neural network model structures utilizing four different imaging datasets (total > 1.4 million images (including ImageNet); medical images different in terms of scale, modality, organ, and source) for pre-training on six classification tasks to assess how the performance of CNN models varies based on training strategy. Representative pre-training strategies included transfer learning with medical and non-medical datasets, layer freezing, varied network structure, and multi-view input for both binary and triple-class classification of mammogram images. The area under the receiver operating characteristic curve (AUC) was used as the model performance metric. The best performing model out of all experimental settings was an AlexNet model incrementally pre-trained on ImageNet and a large Breast Density dataset. The AUC for the six classification tasks using this model ranged from 0.68 to 0.77. In the case of distinguishing recalled-benign mammograms from others, four out of five pre-training strategies tested produced significant performance differences from the baseline model. This study suggests that pre-training strategy influences significant performance differences, especially in the case of distinguishing recalled- benign from malignant and benign screening patients.
Similar content being viewed by others
References
Siu AL, on behalf of the U.S. Preventive Services Task Force: Screening for Breast Cancer: U.S. Preventive Services Task Force Recommendation Statement. Ann Intern Med. 164:279–296. https://doi.org/10.7326/M15-2886
Nelson HD, Tyne K, Naik A, Bougatsos C, Chan BK, Humphrey L: Screening for breast cancer: an update for the U.S. Preventive Services Task Force. Ann Intern Med. 151:727–737. https://doi.org/10.7326/0003-4819-151-10-200911170-00009
Hubbard RA, Kerlikowske K, Flowers CI, Yankaskas BC, Zhu W, Miglioretti DL: Cumulative probability of false-positive recall or biopsy recommendation after 10 years of screening mammography: a cohort study. Ann Intern Med. 155:481–492. https://doi.org/10.7326/0003-4819-155-8-201110180-00004
Brewer NT, Salz T, Lillie SE: Systematic review: the long-term effects of false-positive mammograms. Ann Intern Med. 146:502–510. https://doi.org/10.7326/0003-4819-146-7-200704030-00006
Lehman D, Arao RF, Sprague BL, et al: National performance benchmarks for modern screening digital mammography: update from the breast cancer surveillance consortium constance. Radiology. 283:(1)49-58, 2017
Litjens G, et al: A survey on deep learning in medical image analysis. Med Image Anal 42: 60-88, 2017
Samala RK, et al: Multi-task transfer learning deep convolutional neurals network: application to computer-aided diagnosis of breast cancer on mammograms. Phys Med Biol. 62: 23 8894, 2017
Shen L: End-to-end training for whole image breast cancer diagnosis using an all convolutional design. arXiv preprint arXiv:1708.09427, 2017
Aboutalib SS, et al: Deep learning to distinguish recalled but benign mammography images in breast cancer screening. Clinical Cancer Research, 2018
Hoo-Chang S, et al: Deep convolutional neural networks for computer-aided detection: CNN architectures, dataset characteristics and transfer learning. IEEE Trans Med Imaging 35: 5 1285, 2016
Tajbakhsh N, et al: Convolutional neural networks for medical image analysis: Full training or fine tuning?. IEEE Trans Med Imaging 35: 5 1299-1312, 2016
Clancy K, et al: Deep learning for identifying breast cancer malignancy and false recalls: a robustness study on training strategy. Medical Imaging 2019: Computer-Aided Diagnosis. Vol. 10950. International Society for Optics and Photonics, 2019
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M, Berg AC, and Fei-Fei L: (* = equal contribution) ImageNet Large Scale Visual Recognition
Wang X, et al: Chestx-ray8: Hospital-scale chest x-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases. Computer Vision and Pattern Recognition (CVPR), 2017 IEEE Conference on. IEEE, 2017
Heath M, et al: The digital database for screening mammography. Proceedings of the 5th international workshop on digital mammography. Medical Physics Publishing, 2000
Mohamed AA, et al: A deep learning method for classifying mammographic breast density categories. Med Phys 45: 1 314-321, 2018
Krizhevsky A, Sutskever I, Hinton GE: Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems. 2012.
He K, et al: Deep residual learning for image recognition. Proceedings of the IEEE conference on computer vision and pattern recognition. 2016
Bradski G: The OpenCV Library. Dr. Dobb’s Journal of Software Tools. 2000
Keller BM, et al: Estimation of breast percent density in raw and processed full field digital mammography images via adaptive fuzzy c-means clustering and support vector machine segmentation. Med Phys 39: 8 4903-4917, 2012
Jia Y, et al: Caffe: convolutional architecture for fast feature embedding. Proceedings of the 22nd ACM international conference on Multimedia. ACM, 2014
Landgrebe, TCW, Duin RPW: Approximating the multiclass ROC by pairwise analysis. Pattern Recog Lett 28.13 (2007): 1747-1758.
Robin X, et al: pROC: an open-source package for R and S+ to analyze and compare ROC curves. BMC Bioinformatics 12: 1 77, 2011
Yala A, et al: A deep learning mammography-based model for improved breast cancer risk prediction. Radiology. 182716, 2019
Rafferty EA, et al: Assessing radiologist performance using combined digital mammography and breast tomosynthesis compared with digital mammography alone: results of a multicenter, multireader trial. Radiology 266:1 104-113, 2013
Funding
This work was supported by the National Institutes of Health (NIH)/National Cancer Institute (NCI) grants (#1R01CA193603, #3R01CA193603-03S1, and #1R01CA218405), a Radiological Society of North America (RSNA) Research Scholar Grant (#RSCH1530), an Amazon AWS Machine Learning Research Award, and a University of Pittsburgh Physicians (UPP) Academic Foundation Award. We gratefully acknowledge the support of NVIDIA Corporation with the donation of the Titan X Pascal GPU used for this research. This work used the Extreme Science and Engineering Discovery Environment (XSEDE), which is supported by the National Science Foundation grant number ACI-1548562. Specifically, it used the Bridges system, which is supported by NSF award number ACI-1445606, at the Pittsburgh Supercomputing Center (PSC).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interest
On behalf of all authors, the corresponding author states that there is no conflict of interest.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Clancy, K., Aboutalib, S., Mohamed, A. et al. Deep Learning Pre-training Strategy for Mammogram Image Classification: an Evaluation Study. J Digit Imaging 33, 1257–1265 (2020). https://doi.org/10.1007/s10278-020-00369-3
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10278-020-00369-3