- Research
- Open access
- Published:
Artificial intelligence for non-mass breast lesions detection and classification on ultrasound images: a comparative study
BMC Medical Informatics and Decision Making volume 23, Article number: 174 (2023)
Abstract
Background
This retrospective study aims to validate the effectiveness of artificial intelligence (AI) to detect and classify non-mass breast lesions (NMLs) on ultrasound (US) images.
Methods
A total of 228 patients with NMLs and 596 volunteers without breast lesions on US images were enrolled in the study from January 2020 to December 2022. The pathological results served as the gold standard for NMLs. Two AI models were developed to accurately detect and classify NMLs on US images, including DenseNet121_448 and MobileNet_448. To evaluate and compare the diagnostic performance of AI models, the area under the curve (AUC), accuracy, specificity and sensitivity was employed.
Results
A total of 228 NMLs patients confirmed by postoperative pathology with 870 US images and 596 volunteers with 1003 US images were enrolled. In the detection experiment, the MobileNet_448 achieved the good performance in the testing set, with the AUC, accuracy, sensitivity, and specificity were 0.999 (95%CI: 0.997-1.000),96.5%,96.9% and 96.1%, respectively. It was no statistically significant compared to DenseNet121_448. In the classification experiment, the MobileNet_448 model achieved the highest diagnostic performance in the testing set, with the AUC, accuracy, sensitivity, and specificity were 0.837 (95%CI: 0.990-1.000), 70.5%, 80.3% and 74.6%, respectively.
Conclusions
This study suggests that the AI models, particularly MobileNet_448, can effectively detect and classify NMLs in US images. This technique has the potential to improve early diagnostic accuracy for NMLs.
Background
Breast tumors can present as either as mass breast lesions (MLs) or non-mass breast lesions (NMLs) [1]. NMLs manifest as confined asymmetry on two orthogonal planes without conspicuous margins or shapes, which fails to meet the strict criteria of “mass” defined by BI-RADS [2]. And it is a rare form of breast lesions, occurring in only 9.2% of all cases and less commonly than ML lesions [3]. Moreover, conventional US images show a lack of clear boundaries on both planes for NMLs, making them more difficult to identify than MLs [4]. Thus, accurate diagnose of NMLs is clinically important and a challenging task.
Currently, Mammography, Ultrasound (US) and Magnetic Resonance Imaging (MRI) are three valuable tools used in the early screening for breast tumors [5,6,7]. When selecting the optimal screening tools for breast tumors, there are many factors to consider, including patient age, breast density, and the presence of any symptoms [8]. Compared to other screening tools, US offers several advantages, such as lower cost, no ionizing radiation, and assessing images in real-time, especially suitable for Asian dense breast women [9, 10]. Though US is a valuable tool for diagnosing NMLs, it is not without its limitations. Accurately diagnosing NMLs using US can be challenging due to several factors, including the quality of the US equipment and the experience of the radiologist, as well as the lesion’s size and location [11]. In addition, given the overlapping between benign and malignant features of NMLs, it is difficult make a further accurate diagnosis [12,13,14,15]. Therefore, it is necessary to avoid missed diagnosis and improve the precision of diagnosis with a novel method.
Artificial intelligence (AI), a computer science subfield, has made a great breakthrough in the image recognition task. And medical image from routine clinical process is an important research field of AI [16]. Recently, the application of AI within the field of imaging diagnostics has led to remarkable achievements across a range of subfields, such as the diagnosis of lung cancer [17], skin cancer [18],and breast cancer [19]. Especially for breast cancer diagnosis, there are numerous studies reported an AI system achieved a superior breast radiologists’ level in diagnosing breast tumor [20]. However, these AI model mainly focused on MLs, while study on NMLs is rarely. Furthermore, whether AI model can help radiologist detect and classify NMLs is still to explore.
Therefore, this study proposed an AI model trained on MobileNet and DenseNet121 to detect and diagnose NMLs on US images. It is divided into two steps: firstly, to develop the AI model for detecting NMLs on the normal breast US images, and secondly, according to the prior works in AI for breast mass tumor [21], to investigate the efficacy of AI models trained on MLs US images to diagnosing the benign or malignant NMLs.
Methods
Patients
This retrospective study was approved by the Institutional Review Board of the Shenzhen People’s Hospital, specifically the Medical Ethics Committee of Shenzhen People ‘s Hospital. Informed consent was waived by the same ethics committee that approved the study. Consecutive patients at Shenzhen People’s Hospital from January 2020 to December 2022 were enrolled (Fig. 1). The inclusion criteria were: (a) Patients who received breast US examination and were found to have breast lesions.(b)The breast lesions on the US images were consistent with the features of NMLs as described in the literature by Choe et al [22]. (c) The NMLs were diagnosed as benign or malignant according pathology analysis or follow-up exceeding 2 years. The exclusion criteria were: (a) Breast MLs. (b) Lack of pathology. (c) Loss of follow up. (d) Breast lesions not detected by US. (e) Breast lesions of BI-RADS category 0 on US. In addition, female volunteers were included for breast US examination to obtain normal breast US images.
Data acquisition and processing
The US images were acquired with different equipment, including Mindray Resona 7 (Mindray, China, equipped with L11-3U linear array transducer)、Phillips EPIQ5 (Philips, The Netherlands, equipped with L12-5 linear array transducer and GE LOGIQ E9 (GE, USA, equipped with ML6-15-D linear array transducer) by two radiologists with 15 and 10 years of experience in breast US examinations. The US images were exported from the equipment as JPEG images.
In our previous work, we had collected MLs US images to develop AI model to diagnose breast mass tumor [21]. Based on the inclusion and exclusion criteria of MLs, 4988 patients with 13,247 MLs US images were finally included. In this study, we selected MLs US images as the data set to train the AI model.
The field-of-view (FOV) is obtained from the original US image removing device and patient-related information. Subsequently, the FOV are transformed into squares and scaled to 448 × 448, thus serving as the model training and testing data.
AI model
we utilized the current mainstream AI models, which included MobileNet (a lightweight convolutional neural network) and DenseNet121 (a well-known deep learning model with fewer parameters) to develop AI models in this study (Fig. 2). In order to differentiate them from other models, we named MobileNet and DenseNet121 based on the image size of input data as MobileNet_448 and DenseNet121_448, respectively.
Detection experiment
To select the optimal AI model for NMLs detection in the normal breast US images, we proposed training both MobileNet_448 and DenseNet121_448 in the dataset A (including normal breast US images and NMLs US images). First, the dataset A was split into training, validation and testing set based on a ratio of 7:1:2. Second, we developed AI model with the default setting of training 300 epochs, while setting Early Stopping, 15 epochs of validation set loss does not drop will end the training early. The batch size was set to 16, using Focal Loss as the loss function, and Adam as the optimizer with a learning rate of 0.001. Finally, we evaluated the performance of both models in the testing set to determine which one is better suited for NMLs detection experiment.
Classification experiment
For the classification experiment to diagnose benign and malignant NMLs, we developed AI model with MobileNet_448 and DenseNet121_448 in the dataset B (including MLs US images), similarly. Initially, the MLs US images in the dataset B was split into training, validation and internal testing set based on a ratio of 8:1:1. In addition, the dataset C (including NMLs US images) served as external testing set. Second, we developed AI model with the default setting of training 300 epochs, while setting Early Stopping, 15 epochs of validation set loss does not drop will end the training early. The batch size was set to 16, using Focal Loss as the loss function, and Adam as the optimizer with a learning rate of 0.001. To enhance model performance, we included a learning rate decay strategy: if the validation set loss didn’t decrease for 5 consecutive epochs, the learning rate was reduced to 1/10 of its original value.
The performance of AI model pretrained with MLs US images was evaluated in the internal testing set. Ultimately, we validated the diagnostic effectiveness of developed AI model for NMLs in external testing set. The Grad-CAM technique was utilized to explain how the optimal AI model discriminates benign and malignant NMLs.
Statistical analysis
Expression of continuous variable data is achieved using mean ± standard deviation. Expression of categorical variable data is achieved using percentage. Within-group differences were compared using the paired sample t-test. Statistical analysis was performed using R version 4.2.2 (R Core Team, 2021). Draw the receiver operating characteristic (ROC) curve, compute the area under the curve (AUC), and output the 95% confidence interval (95% CI). And then output the cut-off value, specificity, sensitivity, and accuracy. Statistical significance is determined at a p-value of less than 0.05.
Results
Clinical characteristics
In this study, 228 NMLs patients were included. The analyses of clinical characteristics of the NMLs patients are presented in Table 1. The clinical characteristics between malignant and benign NMLs patients did not show any statistically significant difference (p > 0.05).
In the dataset, 870 NMLs US images from NMLs patients,13,247 MLs US images from 3,447 MLs patients and 1003 normal breast US images from 596 volunteers were included. The distribution of the dataset in the detection and classification experiment are summarized in Table 2. In the dataset A, the training, validation, and internal testing sets contain 1310, 188, and 375 US images, respectively. In the dataset B, the training, validation, and internal testing set contain 10,619, 1289, and 1339 US images, respectively. The dataset C was inputted as external testing set containing 870 US images.
Detection performance of NMLs
The testing set results for the AI models are presented in Table 3. MobileNet_448 achieved the performance with an AUC of 0.999(95%CI: 0.997-1.000) along with accuracy, sensitivity and specificity of 96.5%,96.9% and 96.1%, respectively. DenseNet121_448 achieved the performance with an AUC of 0.999(95%CI: 0.998-1.000) and an accuracy, sensitivity and specificity of 96.5%,96.9% and 96.1%, respectively. There is no statistically in the AUC between MobileNet_448 and DenseNet121_448 (p>0.05). Figure 3(A and B) is the ROC curve of MobileNet_448 and DenseNet121_448 in the testing set.
Diagnostic performance of NMLs
The testing set results for the AI models are presented in Table 3. MobileNet_448 exhibited optimal diagnostic performance, achieving an AUC of 0.837 (95%CI: 0.810–0.863) and demonstrating an accuracy, sensitivity, and specificity of 68.8%, 68.9%, and 68.9%, respectively. Figure 3(C and D) is the ROC of MobileNet_448 and DenseNet121_448 in the external testing set.
Disscusion
Currently, US is commonly utilized as a primary screening approach for NMLs. However, the absence of international standards, such as BI-RADS [2], has potentially led to missed identification and impacted the precision of diagnoses of NMLs. In this study, we developed an AI model that includes a lightweight and efficient neural network (MobileNet_448) and a dense and heavily connected neural network (DenseNet121_448). The model was trained with US images to detect and evaluate benign and malignant NMLs. In the detection experiment, MobileNet_448 showed promising performance in discriminating NMLs in the testing set. It was no difference in AUC (0.999) compared to DenseNet121_448. In the classification experiment, MobileNet_448 achieved an AUC of 0.837 in diagnosing benign and malignant NMLs in the testing set, which exceeded DenseNet121_448’s AUC (0.738).
Early detection is the first step for diagnosing NMLs. Several studies have indicated that deep learning’s performance in the detection of NMLs could match that of expert radiologists [23, 24]. O. Hadad et al. used a cross-modal deep learning for breast MRI mass/non-mass lesions classification task. The cross-modal learning was achieved with accuracy of 0.94 and AUC of 0.98 [23]. Fernando Soares, et al. proposed the use of a support vector machine to differentiate and categorize regions from mammograms into either mass or non-mass. The classification of MLs and NMLs using the proposed methodology yielded an average accuracy of 98.88% [24]. These studies were consistent with our findings. However, some differences between the studies were noted. We validated that AI model can detect NMLs from normal breast US images with AUC of 0.999 that compared with previously reported in MRI and Mammograms. Thus, by using MobileNet_448 or DenseNet121_448 to identify potential areas of concern in US images, US radiologists can focus their attention on those areas and make more informed diagnoses. Ultimately, this can lead to better patient outcomes.
In clinical practice, considerable overlap exists between the conventional B-mode US features of malignant NMLs and those of benign NMLs, such as fibrocystic change, sclerosing adenosis, atypical ductal hyperplasia, and intraductal papilloma [12, 25, 26]. Correct identification is a challenging task that frequently results in missed diagnoses or misdiagnoses. In this study, the MobileNet_448 trained on MLs US images yielded an AUC of 0.837 and accuracy of 74.6% in the classification of benign and malignant NMLs in the testing set. These results indicated that the MobileNet_448 model can effectively differentiate between benign and malignant NMLs using US imaging. And the regions of interest that the models focus on for NMLs detection and classification is the interior of lesions (Fig. 4). Besides, the MobileNet_448 model obtained an AUC that was either equal to or higher than previous studies. M Lin, et al. investigated the positive predictive value of classification of NMLs on US images following BI-RADS [27]. Setting BI-RADS 4B as the threshold, the AUC was 0.62 in their research. Choi J S, et al. proposed a way to evaluate NMLs utilizing a combination of shear-wave elastography and color Doppler US. It can achieve an AUC of 0.801, indicating that the inclusion of additional information regarding the elasticity and vascularity of breast NMLs can improve the diagnostic performance [28]. Therefore, the MobileNet448 model has the potential to be a reliable tool to diagnose NMLs.
Moreover, this study also demonstrated that the MobileNet model is particularly effective in diagnosing breast tumors, which is consistent with previous research reports [3, 29,30,31]. In the task of diagnosing benign and malignant NMLs, the MobileNet_448 can effectively diagnose NMLs with the AUC of 0.837 and ACC of 0.746 that outperformed DenseNet121_448. We suggested that some common features in US images used to identify malignant breast lesions, which are seen in both MLs and NMLs. And these images features can be learned by MobileNet_448 to distinguish benign and malignant NMLs. Therefore, the MobileNet_448 model may provide a more accurate and reliable method for diagnosing NMLs, potentially leading to better patient outcomes.
However, this study has several limitations. First, we conducted the research with a limited sample size which included 228 patients and 228 NMLs. To address this limitation, a larger population must be included in a prospective study. In addition, only 2D grayscale images were included in the study, which potentially can misrepresent the US images characteristics of NMLs. We will include multimodality (color Doppler flow imaging, pulse-wave Doppler US, contrast enhanced US) in the future.
Conclusion
In this study, the MobileNet_448 and DenseNet121_448 we developed can effectively detect and classify NMLs on US images. By comparing, DenseNet121_448 has shown optimal performance in the early screening stages of NMLs, making it a valuable tool for assisting US radiologists in the future to better diagnose NMLs.
Data availability
The datasets used and analyzed during the current study are available from the corresponding author on reasonable request.
Abbreviations
- NMLs:
-
Non-mass breast lesions
- MLs:
-
Mass breast lesions
- AI:
-
Artificial intelligence
- US:
-
Ultrasound
- MRI:
-
Magnetic Resonance Imaging
- CI:
-
Confidence interval
- AUC:
-
Area under the receiver operating characteristic curve
References
Park KW, Park S, Shon I, Kim M-J, Han B-k, Ko EY, Ko ES, Shin JH. Kwon M-r, Choi JS: non-mass lesions detected by breast US: stratification of cancer risk for clinical management. Eur Radiol. 2021;31(3):1693–706.
Radiology A, D’Orsi CJA, American College of Radiology: ACR BI-RADS atlas: breast imaging reporting and data system; mammography, ultrasound, magnetic resonance imaging, follow-up and outcome monitoring, data dictionary. 2013.
Zhao Z, Hou S, Li S, Sheng D, Liu Q, Chang C, Chen J, Li JJUiM. Biology: application of deep learning to reduce the rate of malignancy among BI-RADS 4A breast lesions based on Ultrasonography. 2022, 48(11):2267–75.
Zhang F, Jin L, Li G, Jia C, Shi Q, Du L, Wu RJTBJoR. The role of contrast-enhanced ultrasound in the diagnosis of malignant non-mass breast lesions and exploration of diagnostic criteria. 2021, 94(1120):20200880.
Arleo EK, Hendrick RE, Helvie MA, Sickles EA. Comparison of recommendations for screening mammography using CISNET models. Cancer. 2017;123(19):3673–80.
Coleman C. Early detection and screening for breast Cancer. Semin Oncol Nurs. 2017;33(2):141–55.
Kuhl CK. Abbreviated magnetic resonance imaging (MRI) for breast Cancer screening: Rationale, Concept, and transfer to clinical practice. Annu Rev Med. 2019;70:501–19.
Lebron-Zapata L, Jochelson MSJPc. Overview of breast cancer screening and diagnosis. 2018, 13(3):301–23.
Mann RM, Athanasiou A, Baltzer PAT, Camps-Herrero J, Clauser P, Fallenberg EM, Forrai G, Fuchsjäger MH, Helbich TH, Killburn-Toppin F, et al. Breast cancer screening in women with extremely dense breasts recommendations of the european society of breast imaging (EUSOBI). Eur Radiol. 2022;32(6):4036–45.
Guo R, Lu G, Qin B, Fei B. Ultrasound Imaging Technologies for breast Cancer detection and management: a review. Ultrasound Med Biol. 2018;44(1):37–70.
Zhang J, Cai L, Pan X, Chen L, Chen M, Yan D, Liu J, Luo L. Comparison and risk factors analysis of multiple breast cancer screening methods in the evaluation of breast non-mass-like lesions. BMC Med Imaging 2022, 22(1).
Ko K-H, Hsu H-H, Yu J-C, Peng Y-J, Tung H-J, Chu C-M, Chang T-H, Chang W-C, Wu Y-C, Lin Y-PJEjor. Non-mass-like breast lesions at ultrasonography: feature analysis and BI-RADS assessment. 2015, 84(1):77–85.
Uematsu, TJBc. Non-mass-like lesions on breast ultrasonography: a systematic review. 2012, 19:295–301.
Zhang W, Xiao X, Xu X, Liang M, Wu H, Ruan J, Luo B. Non-mass breast lesions on Ultrasound: Feature Exploration and Multimode Ultrasonic diagnosis. Ultrasound Med Biol. 2018;44(8):1703–11.
Hong S, Li W, Gao W, Liu M, Song D, Dong Y, Xu J, Dong F. Diagnostic performance of elastography for breast non-mass lesions: a systematic review and meta-analysis. Eur J Radiol. 2021;144:109991.
Aerts HJWL. The potential of Radiomic-Based phenotyping in Precision Medicine: a review. JAMA Oncol. 2016;2(12):1636–42.
Coudray N, Ocampo PS, Sakellaropoulos T, Narula N, Snuderl M, Fenyö D, Moreira AL, Razavian N, Tsirigos A. Classification and mutation prediction from non-small cell lung cancer histopathology images using deep learning. Nat Med. 2018;24(10):1559–67.
Esteva A, Kuprel B, Novoa RA, Ko J, Swetter SM, Blau HM, Thrun S. Dermatologist-level classification of skin cancer with deep neural networks. Nature. 2017;542(7639):115–8.
Shen Y, Shamout FE, Oliver JR, Witowski J, Kannan K, Park J, Wu N, Huddleston C, Wolfson S, Millet A, et al. Artificial intelligence system reduces false-positive findings in the interpretation of breast ultrasound exams. Nat Commun. 2021;12(1):5645.
Qian X, Pei J, Zheng H, Xie X, Yan L, Zhang H, Han C, Gao X, Zhang H, Zheng W, et al. Prospective assessment of breast cancer risk from multimodal multiview ultrasound images via clinically applicable deep learning. Nat Biomed Eng. 2021;5(6):522–32.
Chen J, Jiang Y, Yang K, Ye X, Cui C, Shi S, Wu H, Tian H, Song D, Yao JJI. Feasibility of using AI to auto-catch responsible frames in ultrasound screening for breast cancer diagnosis. 2023, 26(1):105692.
Choe J, Chikarmane SA, Giess CS. Nonmass findings at breast US: definition, classifications, and Differential diagnosis. Radiographics: A Review Publication of the Radiological Society of North America Inc. 2020;40(2):326–35.
Hadad O, Bakalo R, Ben-Ari R, Hashoul S, Amit G. Classification of breast lesions using cross-modal deep learning. In: 2017 IEEE 14th International Symposium on Biomedical Imaging (ISBI 2017): 18–21 April 2017 2017; 2017: 109–112.
de Soares Sérvulo F, Oseas de Carvalho Filho A, Corrêa Silva A, Cardoso de Paiva A, Gattass M. Classification of breast regions as mass and non-mass based on digital mammograms using taxonomic indexes and SVM. Comput Biol Med. 2015;57:42–53.
Wang ZL, Li N, Li M, Wan WBJLrm. Non-mass-like lesions on breast ultrasound: classification and correlation with histology. 2015, 120:905–10.
Izumori A, Takebe K, Sato AJBC. Ultrasound findings and histological features of ductal carcinoma in situ detected by ultrasound examination alone. 2010, 17:136–41.
Lin M, Wu SJPo. Ultrasound classification of non-mass breast lesions following BI-RADS presents high positive predictive value. 2022, 17(11):e0278299.
Choi JS, Han B-K, Ko EY, Ko ES, Shin JH, Kim GRJEr. Additional diagnostic value of shear-wave elastography and color Doppler US for evaluation of breast non-mass lesions detected at B-mode US. 2016, 26:3542–9.
Bahareh B, Hamze R, Ali KZT, Hassan R. Deep classification of breast cancer in ultrasound images: more classes, better results with multi-task learning. In: ProcSPIE: 2021; 2021: 116020S.
Saxena A. Comparison of two deep learning methods for classification of dataset of breast ultrasound images. IOP Conf Series: Mater Sci Eng. 2021;1116(1):012190.
Dafni Rose J, VijayaKumar K, Singh L, Sharma SK. Computer-aided diagnosis for breast cancer detection and classification using optimal region growing segmentation with MobileNet model. 2022, 30(2):181–9.
Acknowledgements
Not applicable.
Funding
This work is supported by the following grant: Commission of Science and Technology of Shenzhen (GJHZ20200731095401004) and Research Fund Project of Guangdong (A2023436). The funding unit assumes the function of the research sponsor in this research.
Author information
Authors and Affiliations
Contributions
Study concept and design: GL, HT. Date extraction: HW, KY. Data quality: JL, YL. Data analysis and interpretation: CC, SS. Manuscript preparation: GL, ZH, HT. Manuscript review: JX, FD. Manuscript editing: GL. All authors approved the final manuscript for submission.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Informed consent
This study was conducted in accordance with the Declaration of Helsinki and was approved by the Institutional Review Board of the Shenzhen People’s Hospital. The need for informed consent was waived by Institutional Review Board of the Shenzhen People’s Hospital.
Ethics approval and consent to participate
This retrospective study was approved by the Institutional Review Board of the Shenzhen People’s Hospital. The need for informed consent was waived by Institutional Review Board of the Shenzhen People’s Hospital. All methods were carried out in accordance with relevant guidelines and regulations.
Consent for publication
Not applicable.
Funding
The authors state that this work has not received any funding.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
Li, G., Tian, H., Wu, H. et al. Artificial intelligence for non-mass breast lesions detection and classification on ultrasound images: a comparative study. BMC Med Inform Decis Mak 23, 174 (2023). https://doi.org/10.1186/s12911-023-02277-2
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s12911-023-02277-2