Abstract
We propose a deep learning-based technique for detection and quantification of abdominal aortic aneurysms (AAAs). The condition, which leads to more than 10,000 deaths per year in the United States, is asymptomatic, often detected incidentally, and often missed by radiologists. Our model architecture is a modified 3D U-Net combined with ellipse fitting that performs aorta segmentation and AAA detection. The study uses 321 abdominal-pelvic CT examinations performed by Massachusetts General Hospital Department of Radiology for training and validation. The model is then further tested for generalizability on a separate set of 57 examinations with differing patient demographics and acquisition characteristics than the original dataset. DeepAAA achieves high performance on both sets of data (sensitivity/specificity 0.91/0.95 and 0.85/1.0 respectively), on contrast and non-contrast CT scans and works with image volumes with varying numbers of images. We find that DeepAAA exceeds literature-reported performance of radiologists on incidental AAA detection. It is expected that the model can serve as an effective background detector in routine CT examinations to prevent incidental AAAs from being missed.
You have full access to this open access chapter, Download conference paper PDF
Similar content being viewed by others
Keywords
1 Introduction
Abdominal aortic aneurysms (AAAs), an enlargement or widening of the abdominal aorta, commonly occurs in males older than 65 years with a prevalence of 4 to 8% [5]. Untreated aneurysms tend to grow and eventually may rupture with mortality rates exceeding 90\(\%\). As most AAAs are asymptomatic until critical bleeding, incidental finding of AAAs becomes critical. However, on routine abdominal computed tomography (CT) exams, only 65\(\%\) of AAAs are incidentally identified [2]. This low reporting rate makes it difficult to provide timely intervention for patients. Indeed, it is common for AAAs to be first diagnosed at a point where a patient is already at risk for rupture [7]. Furthermore, in routine clinical practice, the size of AAAs is determined by manual measurement of the maximal aortic diameter, which is time-consuming and prone to high inter-reader variability.
Consequently, a variety of computer-aided diagnosis techniques have been proposed over the past decade for automated aorta segmentation. Many of these previous aids used classical computer vision techniques that required prior knowledge, such as external seed points for initialization [3]. Driven by the ever-increasing capability of deep learning, neural networks have recently been used for aorta segmentation on CT angiography [6]. However, these previous deep learning algorithms focused only on CT exams with contrast, while incidental identification of AAAs on scans without contrast is equally important but more challenging. Additionally, most of the previous works concentrated on the task of automated aortic segmentation [6, 9, 11], but there are very few studies investigating the more applied task of AAA detection, which has much greater clinical relevance than purely performing segmentation alone.
In this paper, we demonstrate a deep-learning solution (DeepAAA) for automated aorta segmentation and AAA detection on both contrast and non-contrast CT series. Specifically, we develop a variant of a 3D U-Net [1] for aorta segmentation on abdominal CT scans. The proposed method handles series with varying numbers of images. We then apply ellipse fitting to the segmented aortic contours and estimate the largest aortic diameter. DeepAAA is a general solution, achieving a high detection rate for AAAs on both contrast and non-contrast CT scans and working with variable image resolutions and slice thicknesses. Furthermore, our solution demonstrates strong generalizability and performance relative to literature-reported values for radiologist sensitivity at AAA detection.
2 Cohort and Annotation
Image data consisted of contrast and non-contrast CT examinations of the abdomen and pelvis performed between January 2005 and April 2017 by Massachusetts General Hospital Department of Radiology. The investigators obtained local Institutional Review Board approval for the project and selected two datasets from the database. The two datasets differ in terms of their capture dates and imaging equipment used as characterized in Table 1.
2.1 Primary Data Set
The primary dataset was used for the training and initial validation of the model and contained 321 studies (223 unique patients). These were selected based on a keyword search of study reports ensuring a mixture of positive and negative cases of AAA. The query was biased to largely include studies captured between 2005 and 2007. Of the studies selected, there were 217 (67.6\(\%\)) males and 104 (32.4\(\%\)) females with a mean age of 70.3 years; 153 (47.7\(\%\)) CT scans with contrast and 168 (52.3\(\%\)) without; 247 (76.9\(\%\)) studies with AAA present and 74 (23.1\(\%\)) without AAA. For each study, the axial series was used for aorta segmentation and AAA detection. Slice thickness of the images ranged from 2 to 10 mm, while the number of images for each series varied from 40 to 384.
To generate a ground-truth aortic segmentation, the abdominal aorta was manually contoured on the axial scans slice-by-slice until the aortic bifurcation. Each study was annotated by 1 to 4 CT technologists under supervision of 2 radiologists. Based on the clinical definition [2], the presence of AAA was determined by applying a 3.0 cm threshold to the maximum aortic diameter as defined by the manual segmentations.
As many exams were annotated by multiple annotators, a partial assessment of inter-rater variability was possible. Of the 153 contrast studies, 124 were annotated by at least 2 independent technologists, leading to 517 pairwise comparisons. The non-contrast data, however, contained only 10 studies where more than one segmentation was performed, resulting in only 16 pairwise comparisons. The average inter-rater Dice on contrast series was \(0.95\,\pm \,0.03\), while on noncontrast series, it was \(0.90\,\pm \,0.08\). Given the small number of samples, the inter-rater variability on non-contrast data should not be considered definitive but suggests roughly similar levels of agreement. For the subsequent analysis, one reference segmentation per dataset was selected randomly as ground truth.
2.2 Additional Validation Set
An additional validation set was used to test the robustness of the model to changes in imaging equipment, imaging department capture protocols, and patient demographics. All of these factors may vary significantly over time at a single site, and thus, we selected 57 studies (57 unique patients) predominantly captured between 2012 and 2016 for this dataset. The studies were selected to include a mixture of positive and negative cases of AAA through keyword search of study reports. All negative studies were manually verified to not contain a AAA. To assess the model against radiologist-reported ground truth and validate post-processing stages which generate the AAA measurement, the maximum aortic diameter and presence of AAA was sourced from radiology reporting rather than being derived from manual segmentations (as was done for the primary data set).
3 Methods
We achieve AAA detection via two sequential steps: (1) aorta segmentation (2) aorta contour fitting for the estimation of the largest cross-sectional diameter. For abdominal aortic segmentation, we developed a variant of a 3D U-Net [1] which accepts series with varying numbers of images. As discussed in Sect. 2, our dataset contained a wide distribution of image counts and slice thicknesses as abdominal studies may also cover other regions of the body, including the pelvis or thorax. It is thus essential to develop an algorithm adapts to variability along the axial dimension. The 3D U-Net architecture we used contained 4 down/upsampling modules (plus the bottleneck layer), 2 convolutional layers per module, and 32 initial features in the network. The convolutional kernel size was 3 \(\times \) 3 \(\times \) 3 in both the downsampling and upsampling path, while the 3D pooling kernels were 2 \(\times \) 2 \(\times \) 1 to preserve image count. Batch normalization was applied before each ReLU activation, and dropout regularization was utilized at the bottleneck layer with a dropout rate of 0.2. A 1 \(\times \) 1 \(\times \) 1 convolutional layer with softmax activation over two classes (background and aorta) was applied at the output layer and thresholded at 0.5 to generate the binary aorta mask.
The model was trained with the RMSprop optimizer using a learning rate of 0.0001. Weights selected for evaluation were those that minimized the loss on the validation set, which were not in general the last epoch weights. The loss function was a smoothed negative Dice coefficient:
similar, but not identical, to that used in [8]. The summation is over all N voxels in a scan, \(p_i\) is the predicted aorta probability and \(g_i\) is the ground truth classification for voxel i. The additional ones in the numerator and denominator avoid division by zero and yield a perfect score for a correct, empty segmentation.
In order to build a general AAA detector that worked with both contrast and non-contrast CT scans, we mixed both types of CT images for model training. All the experiments were implemented utilizing the Keras deep learning library with the Tensorflow backend on NVIDIA DGX-1 Volta.
After aorta segmentation, we applied ellipse fitting [4] image-by-image to the contours of the aorta. The largest aortic diameters (d) were thus assigned by the long axis of the ellipses. For the regions where the aorta was not parallel to the axial CT scans, angle correction was applied to retrieve the true aorta diameter, i.e. \(d \cos \theta \), where \(\theta \) was the angle between the secant plane of the aorta and the axial scan. Based on the definition of AAA, predicted positives were the studies where the largest diameter of the aorta segment was greater than 3 cm. We then compared the predicted results with the ground truth annotations.
4 Results
4.1 Training and Cross-validation on Primary Data Set
To assess model validity and repeatability, the primary dataset was divided into 5 folds such that no patient was repeated between folds. Cross validation was performed by selecting folds \(\{n,n+1,n+2\} \bmod 5\) as training, \(n+3 \bmod 5\) as validation and the remaining fold as test for \(n\in \{0..5\}\). For each combination, the weights with the best validation score after 100 epochs were selected.
Inference on each test study was evaluated in terms of Dice score relative to the reference segmentation and in terms of the maximum diameter of the aorta evaluated on the inferred segmentation versus the same calculation on the reference segmentation. The detailed results of this cross validation are presented in Table 2. Over the 5 folds, the average Dice score ranged from 0.883 to 0.894, with a average Dice score of \(0.887\pm 0.111\). The estimate of the diameter is consistently within one standard deviation of zero. There may be a slight bias towards smaller diameter, as 4 of the 5 folds had negative means but this bias is small with overall mean −1.3 mm ± 7.3.
For a final set of weights, the complete primary dataset was randomly split into training (80%), validation (10%), and test sets (10%). Training was performed for 300 epochs and the weights with lowest validation loss were selected.
As shown in Fig. 1, DeepAAA successfully segments the aorta on both contrast and non-contrast CT images, and works well with more challenging cases where blood-clots are present or the aortic boundary is unclear in the images. We achieve high performance on aortic segmentation with an average Dice coefficient of 0.91, which yields high sensitivity (0.91) and high specificity (0.95) on AAA detection (Table 3). We further examine the error in the largest aortic diameter measurement (\(d_{pred}\) – \(d_{true}\)). We find that the algorithm tends to underestimate the aorta size, but the 2.02 mm average discrepancy is well within the 10 mm gradations on which clinical decisions are generally based.
4.2 Testing Model Robustness on the Additional Validation Set
Using the final model trained in Sect. 4.1, we performed inference on studies from the additional validation set described in Sect. 2.2. Each study was labelled for the presence of a AAA via the radiology report, and for those studies with positive findings, the maximum aortic diameter was also extracted.
For each study, the model’s outputs were compared to the study labels and the model’s overall performance was measured in terms of sensitivity/specificity for detecting AAA and mean error in the maximum diameter. Table 3, last row, summarizes these results, along with a comparison to the model’s performance on the held-out test set for the same metrics. During the process we noted that some studies in this additional validation set extended into thoracic anatomy, and model inference of this region was removed manually in post-processing.
5 Discussion
While AAAs are rarely missed when the leading indication for a study, the rate of detection significantly decreases when the AAA is an incidental finding. DeepAAA aims to provide a “second set of eyes” and reduce the rate of missed incidental findings. Therefore, to properly contextualize model performance, it is important to quantify this rate of misdiagnosis. Claridge et al., in a retrospective analysis of 3246 abdominal CT scans and their reports, found that only 65% of AAAs were detected by radiologists [2]. DeepAAA exceeds the sensitivity they found (Table 4) while achieving a high specificity (Table 3) and localizes the suspected AAA for radiologist confirmation. Thus, a parallel read from our algorithm could potentially provide a significant reduction in missed AAAs and offer significant clinical value, enabling early detection and treatment of AAA.
Many observers have noted that machine learning models applied to radiology may not generalize well [10]. Changing the equipment used to capture input images and changing the demographics of the underlying patient cohorts tend to reduce model performance. This lack of generalizability would significantly hamper a model’s clinical utility because deployment at sites other than where the model was trained may result in surprising under-performance. To test DeepAAA’s ability to generalize, we simulated a significant change in input data by creating a second cohort of validation data (Sect. 2.2) acquired from different patients using different equipment more than five years after the original training data were acquired. The model showed higher specificity (100%) and reduced mean error in diameter prediction with only slightly lower sensitivity (85%) - essentially demonstrating that the model is robust and has not over-fit to any cohort- or equipment-related idiosyncrasies of the original training data.
Future work would involve extending the DeepAAA model beyond the abdominal region to include segmentation of the thoracic aorta. Thoracic aortic aneurysms (TAA), although not nearly as prevalent as AAA, are still a significant source of mortality and generally affect a younger population. In addition, models to predict AAA growth or rupture would be of significant clinical value in guiding more targeted surveillance programs and therapy.
References
Çiçek, Ö., Abdulkadir, A., Lienkamp, S.S., Brox, T., Ronneberger, O.: 3D U-Net: learning dense volumetric segmentation from sparse annotation. In: Ourselin, S., Joskowicz, L., Sabuncu, M.R., Unal, G., Wells, W. (eds.) MICCAI 2016. LNCS, vol. 9901, pp. 424–432. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46723-8_49
Claridge, R., Arnold, S., Morrison, N., van Rij, A.M.: Measuring abdominal aortic diameters in routine abdominal computed tomography scans and implications for abdominal aortic aneurysm screening. J. Vasc. Surg. 65(6), 1637–1642 (2017). https://doi.org/10.1016/j.jvs.2016.11.044
de Bruijne, M., van Ginneken, B., Viergever, M., Niessen, W.: Interactive segmentation of abdominal aortic aneurysms in CTA images. Med. Image Anal. 8(2), 127–138 (2004). https://doi.org/10.1016/j.media.2004.01.001
Fitzgibbon, A.W., Pilu, M., Fisher, R.B.: Direct least squares fitting of ellipses. In: Proceedings of 13th International Conference on Pattern Recognition, vol. 1, pp. 253–257, August 1996. https://doi.org/10.1109/ICPR.1996.546029
Lindholt, J., Juul, S., Fasting, H., Henneberg, E.: Screening for abdominal aortic aneurysms: single centre randomised controlled trial. BMJ 330(7494), 750 (2005). https://doi.org/10.1136/bmj.38369.620162.82
López-Linares, K., et al.: Fully automatic detection and segmentation of abdominal aortic thrombus in post-operative CTA images using deep convolutional neural networks. Med. Image Anal. 46, 202–214 (2018). https://doi.org/10.1016/j.media.2018.03.010
Mell, M.W., Hlatky, M.A., Shreibati, J.B., Dalman, R.L., Baker, L.C.: Late diagnosis of abdominal aortic aneurysms substantiates underutilization of abdominal aortic aneurysm screening for Medicare beneficiaries. J. Vasc. Surg. 57(6), 1519–1523 (2013). https://doi.org/10.1016/j.jvs.2012.12.034
Milletari, F., Navab, N., Ahmadi, S.A.: V-Net: fully convolutional neural networks for volumetric medical image segmentation. In: 4th International Conference on 3D Vision (3DV), pp. 565–571 (2016)
Siriapisith, T., Kusakunniran, W., Haddawy, P.: Outer wall segmentation of abdominal aortic aneurysm by variable neighborhood search through intensity and gradient spaces. J. Digit. Imaging 31(4), 490–504 (2018). https://doi.org/10.1007/s10278-018-0049-z
Zech, J.R., Badgeley, M.A., Liu, M., Costa, A.B., Titano, J.J., Oermann, E.K.: Variable generalization performance of a deep learning model to detect pneumonia in chest radiographs: a cross-sectional study. PLoS Med. 15(11), 1–17 (2018). https://doi.org/10.1371/journal.pmed.1002683
Zhuge, F., Rubin, G.D., Sun, S., Napel, S.: An abdominal aortic aneurysm segmentation method: level set with region and statistical information. Med. Phys. 33(5), 1440–1453 (2006). https://doi.org/10.1118/1.2193247
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Lu, JT. et al. (2019). DeepAAA: Clinically Applicable and Generalizable Detection of Abdominal Aortic Aneurysm Using Deep Learning. In: Shen, D., et al. Medical Image Computing and Computer Assisted Intervention – MICCAI 2019. MICCAI 2019. Lecture Notes in Computer Science(), vol 11765. Springer, Cham. https://doi.org/10.1007/978-3-030-32245-8_80
Download citation
DOI: https://doi.org/10.1007/978-3-030-32245-8_80
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-32244-1
Online ISBN: 978-3-030-32245-8
eBook Packages: Computer ScienceComputer Science (R0)