Abstract
Screening of pulmonary nodules in computed tomography (CT) is important for early detection and treatment of lung cancer. Many existing works utilize faster RCNN (regions with convolutional neural network or region proposal network) for this task. However, their performance is often limited, especially for detecting small pulmonary nodules (<4 mm). In this work, we propose a new cascade paradigm called “Relu cascade” to detect pulmonary nodules. The training of “Relu cascade” is similar to the conventional cascade learning approach. First, a detection network is trained using limited positive annotations (nodules) and randomly sampled negative samples (background). Then, a second detection network is trained with the same amount of positives and false positives produced by the first network. By repeating this process, multiple detection networks can be trained with subsequent detection networks tuned specifically to eliminate the false positives produced by previous detection networks. The novelty of “Relu cascade” lies in the way of chaining these networks into a cascade. Different from the conventional cascade learning, each level filters out false positive detections independently in the testing phase, which is prone to overfitting as later levels are very specific to a small amount of negative samples. In “Relu cascade”, nodule likelihoods at all previous levels are aggregated, based on which false positives are identified and filtered out. Experimental results on 606 CT scans of different patients show that “Relu cascade” greatly improves the detection performance of conventional cascade learning.
You have full access to this open access chapter, Download conference paper PDF
Similar content being viewed by others
Keywords
1 Introduction
Lung cancer is one of the common cancers and the leading causes of cancer-related death worldwide. Clinical trials [1] show that early screening of pulmonary nodules for high-risk subjects would allow detecting lung cancer at early stages, thus greatly reducing death rates caused by lung cancer. Low-dose CT scan is an effective and recommended way for pulmonary nodule screening. However, reading hundreds of slice images per CT scan is time-consuming, radiologists under labor-intensive work only spend very limited time on each scan, which will result in missed detections of pulmonary nodules, and hence decreasing the efficacy of CT pulmonary nodule screening. Therefore, it is highly desirable to develop a computer-aided approach to automatically discovering pulmonary nodules in clinical CT scans.
CT pulmonary nodule detection is technically challenging because (1) nodule sizes vary to a great extent, i.e., 3 mm to 3 cm; and (2) the appearance of nodules is inconsistent, as shown in Fig. 1. Most existing computer aided detection (CAD) systems for pulmonary nodules consist of two stages [2]: (1) candidate generation; and (2) false positive (FP) reduction. In the first stage, a detection network (e.g., faster RCNN [3]) is often applied to find all nodules in lung fields while filtering out as many background positions (negatives) as possible. In order to keep a high sensitivity, detection precision is often compromised, and many false positives are generated in the first stage. In the second stage, a binary classifier (e.g., CNN) is trained to separate these false positive detections produced in the first stage.
In this work, we formulate CT pulmonary nodule detection as a cascade learning problem. Instead of using a two-stage approach, we train multiple stages of detection networks in a cascade fashion and chain them into an ensemble model to gradually separate background from pulmonary nodules. Compared to existing pulmonary nodule detection systems, our work has two major contributions. First, we use detection networks not only in the first stage but also in the subsequent stages, as we empirically found that a binary CNN classifier is prone to overfitting in FP reduction. Second, a feature pyramid network [4] is used as the backbone detection network because it performs detection on multi-scale feature maps, which can better handle the large size variation of pulmonary nodules than faster RCNN [3] that performs detection on a single-scale feature map. In addition, we propose a new cascade paradigm called “Relu cascade”. At each cascade level, we use a Relu strategy to aggregate the judgements from all previous cascade levels before filtering false positives. Validated on a large internal CT dataset, the Relu strategy greatly improves the detection performance of pulmonary nodules compared to the conventional cascade that filters false positives independently at each level in the inference phase.
2 Method
2.1 3D Feature Pyramid Network for Pulmonary Nodule Detection
The sizes of pulmonary nodules vary greatly from 3 mm to 3 cm. Conventional deep learning approaches, e.g., faster RCNN [3], are not suitable for detecting objects with large scale variations, because they detect objects at a single scale. For example, the faster RCNN uses the interleaving of convolutions and down-sampling to extract context information, and detects objects on the final coarse feature maps. Using such methods for pulmonary nodule detection will inevitably miss small nodules. To detect nodules of all sizes, detection has to be performed on various scales. To this end, we extend feature pyramid network (FPN) [4] for 3D pulmonary nodule detection. We derive our FPN network architecture from the popular segmentation network, V-Net [5], as shown in Fig. 2. The difference is that the top level in the ascending path of V-Net is removed, as this level contains very fine-grained features, which is generally unnecessary for object detection. In FPN, detection is performed separately on feature maps of four different scales by attaching a detection head onto each feature map. The detection head consists of two convolution layers at each scale, and it outputs an anchor likelihood map \( P \) and an anchor regression map \( R \). The nodule detection process follows the conventional region proposal network [3] by first identifying positive anchor boxes on \( P \) and then adjusting these positive boxes using the predicted anchor regression maps \( R \). Finally, redundant bounding boxes are suppressed using non-maximum suppression.
Training Loss of FPN:
The training loss of FPN is the summation of losses of all four different scales. The loss at a single scale is given as:
where \( L_{\text{cls}} \) and \( L_{\text{reg}} \) are anchor classification and regression losses, respectively, and \( \lambda \) is a balancing weight between them. Conventionally, the classification loss is measured over all anchor boxes. However, considering that the positive and negative anchor boxes are highly imbalanced (1:\( 10^{6} \)) in this task, we use online hard negative example mining (OHEM) to pick the top K hard negative anchors in a min-batch, and then use focal loss [6] to measure classification losses for all positive anchors and the top K hard negative anchors. Mathematically, the classification loss is defined as,
where \( N_{\text{pos}} \) and \( N_{\text{neg}} \) are the numbers of positive and negative anchors, \( p_{\text{i}} \) is the classification likelihood of the \( i \)-th anchor, \( 1_{i \in Top K} \) is an indicator function to pick only the top K hard negative anchors into loss calculation. \( \upgamma \) and \( \upalpha \) are parameters of focal loss, respectively. We set \( K = 5000N_{pos} \), \( \upalpha = 0.9 \), \( \upgamma = 2 \) in our experiments.
The regression loss \( L_{\text{reg}} \) follows the definition of faster RCNN:
where \( 1_{{l_{i} = 1}} \) is an indicator function to pick only the positive anchors, \( L1_{smooth} \) is the smooth L1 loss, \( r_{i}^{*} \) is a 6-tuple (\( \varDelta x^{*} ,\varDelta y^{*} , \varDelta z^{*} , \varDelta w^{*} , \varDelta h^{*} , \varDelta d^{*} \)) defined for the \( i \)-th positive anchor box, and \( r_{i} \) is the same 6-tuple predicted by the network. The definitions of \( \varDelta^{*} \) values are given below:
where \( x_{a} \), \( y_{a} \), \( z_{a} \), \( w_{a} \), \( h_{a} \) and \( d_{a} \) are the 3D box center and size of the positive anchor box, \( x^{*} \), \( y^{*} \), \( z^{*} \), \( w^{*} \), \( h^{*} \), and \( d^{*} \) are the 3D box center and size of the ground-truth box that has the largest overlap ratio with the positive anchor box.
2.2 Relu Cascade of FPNs for Pulmonary Nodule Detection
Compared to the background (negatives), the number of pulmonary nodules (positives) per CT scan is very limited, e.g., 2–3 per scan. Given the imbalanced positive and negative training dataset, it is difficult for a single detection network (essentially a classifier) to clearly separate massive background positions from limited nodule positions. A classic solution to this imbalanced problem is cascade learning [7], which trains multiple classifiers one by one to gradually filter massive negatives from limited positives.
When combined with FPNs, the workflow is shown in Fig. 3. Specifically, a first FPN is trained with positive patches sampled around all nodule locations and random negative patches sampled from the background. Then, the trained FPN forms a cascade with one level. It is applied to all training images and a likelihood threshold is determined by choosing a maximum value that correctly classifies all positive anchor boxes (100% sensitivity). Doing so clearly produces many false positive detections. In the next round, negative patches are randomly sampled only from these false positive locations, and a second FPN is trained aiming to reduce these false positives. By chaining two FPNs together, the second FPN can filter out negatives that the first FPN fails to do (i.e., false positives). The same process continues until all false positives are eliminated or a specific number of FPNs is obtained.
The drawback of the conventional cascade in the inference can be illustrated in Fig. 3, where \( {\text{Heaviside}} = {\text{Heaviside}}\left( {P_{i} - T_{i} } \right) \), \( P_{i} \) and \( T_{i} \) are the anchor likelihood map and threshold at level \( i \), respectively, the circled red cross is an element-wise product operator. As we can see, each level in the conventional cascade filters out negatives based on its own judgement without referring to peer levels. As long as one level classifies a sample as negative, the sample location is filtered out. This strategy can be very risky as FPNs trained in the later cascade are more specific and easier to overfit the training data than FPNs in the beginning cascade levels. The reason is because the negative training samples that later FPNs trained with are extracted from false positives left by the previous cascade levels. As the level of cascade increases, the number of false positives decreases. The latter FPNs are exposed to limited negative samples, thus very sensitive and prone to overfitting.
To address this problem, we propose a new cascade called “Relu cascade” as shown in Fig. 4. The training process is the same with the conventional cascade. The difference lies in the inference stage, where Relu cascade introduces two operators to aggregate the judgements from all previous cascade levels before filtering out negatives.
-
(1)
Threshold Relu operator is defined as \( P_{i}^{relu} = max\left( {0,P_{i}^{avg} - T_{i} } \right) + 1_{P^{avg}_i \geq T_i} * T_{i} \), where \( P_{i}^{avg} \), \( P_{i}^{relu} \) and \( T_{i} \) are the input anchor likelihood map, output anchor likelihood map, and likelihood threshold at cascade level i, respectively.
-
(2)
Conditional mean operator is defined as:
Unlike the conventional cascade (Fig. 3) that uses Heaviside function for binary judgement, our Relu cascade (Fig. 4) uses an “threshold Relu” operator to filter out negative samples while still preserving anchor likelihoods estimated by previous cascade levels. The preserved anchor likelihoods are treated as prior information. The conditional mean operator then combines this prior with the current likelihood estimation to derive more reliable anchor likelihoods. Since each cascade level in Relu cascade utilizes not only its own judgement but also considers the judgements from previous levels, the overfitting problem of conventional cascade can be largely relieved, and the detection performance can be further boosted.
3 Results
Dataset and Parameter Setting:
Our experimental data consists of 2595 CT scans from 2595 patients. The in-plane resolution is 0.6–0.8 mm, and the slice thickness is 5 mm. Pulmonary nodules at each CT scan were first annotated as 3D bounding boxes by a senior radiologist using our internal tool. The annotations were then reviewed and confirmed by another senior radiologist with more than 20 years’ experience. The distribution of pulmonary nodule sizes is shown in Fig. 5 (right). We can see that the majority of pulmonary nodule sizes in our dataset is less than 4 mm (47.7%), which poses a great challenge for automatic nodule detection algorithms.
In the experiments, we randomly split the dataset into two parts: (1) training set of 1989 CT scans with 2932 pulmonary nodules; (2) testing set of 606 CT scans with 1076 pulmonary nodules. All CT scans are first resampled to the same spatial resolution (i.e., 0.7 mm × 0.7 mm × 1.0 mm) before any processing step. During training, due to the limitation of GPU memory, we sample random 3D patches from CT scans to train FPNs. The 3D patch size is 96 × 96 × 96. \( \uplambda \) in the FPN loss is 1000. We evaluate the detection performance of different algorithms using two metrics: (1) average precision (AP) of precision-recall (PR) curve; (2) free-response ROC curve (FROC).
Impact of Multi-scale Feature Maps:
We first evaluated the importance of multi-scale feature maps in pulmonary nodule detection by comparing faster RCNN and FPN. Faster RCNN utilizes only the coarsest feature maps for pulmonary nodule detection. It tends to miss small pulmonary nodules. By detecting pulmonary nodules at various scales of feature maps, FPN outperforms faster RCNN by a large margin (0.12 increase in AP) as shown in both Fig. 5 (left) and Table 1, which justifies the importance of multi-scale feature maps.
Evaluation of Different Loss Functions:
We evaluated the detection performance of different loss functions under the FPN framework. The loss functions under comparison are: (1) weighted cross entropy loss, (2) focal loss, (3) cross entropy loss with online hard negative example mining (cross entropy + OHEM), and (4) focal loss with OHEM (focal loss + OHEM). The best performance of each loss function is reported in Fig. 6. Focal loss with OHEM achieves the best detection performance. Without OHEM, the number of negatives is way too many than the number of positives in a min-batch, which limits training efficacy. With OHEM, the detection performance of focal loss is slightly better than that of cross entropy as shown in Fig. 6 (right). Besides, we found that the convergence is accelerated using focal loss compared with cross entropy loss.
Evaluation of Cascade Learning:
We evaluated the performance gain using cascade learning in Tables 1, 2 and Fig. 7. As we can see from Table 2, cascade learning improves the detection performance gradually by adding more FPNs. In the experiments, we trained up to 5 FPNs considering the performance gain in the expense of inference time increase. Comparing with the conventional cascade, Relu cascade is clearly better at all cascade levels, which justifies the effectiveness of aggregating judgements from previous cascade levels in Relu cascade.
Inference Time and GPU Memory Cost:
We use pytorch to implement the training part of our algorithm, and build the inference part from scratch using CUDNN. This allows us to fully optimize the inference speed and GPU memory cost of the entire algorithm. After optimization, the runtime of our algorithm is 6 to 12 s using a Geforce Titan Xp graphics card. The runtime GPU memory cost is about 3 to 5 gigabytes depending on the size of lung fields.
4 Conclusion and Discussion
In this paper, we propose a new paradigm of cascade using Relu cascade. We combine this strategy with feature pyramid networks (FPNs) for pulmonary nodule detection. We show that, compared with faster RCNN, FPN improves the detection performance greatly in our dataset that contains many small pulmonary nodules. Using our proposed Relu cascade, the detection performance of cascade can be further boosted. The methodology proposed in this paper is very general, and we will validate its performance in other computer-aided detection problems in the near future.
References
National Lung Screening Trial Research Team: The national lung screening trial: overview and study design. Radiology 258, 243–253 (2011)
Shen, W., et al.: Multi-crop convolutional neural networks for lung nodule malignancy suspiciousness classification. Pattern Recognit. 61, 663–673 (2017)
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, pp. 91–99 (2015)
Lin, T.-Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2117–2125 (2017)
Milletari, F., Navab, N., Ahmadi, S.-A.: V-net: fully convolutional neural networks for volumetric medical image segmentation. In: 2016 Fourth International Conference on 3D Vision (3DV), pp. 565–571. IEEE (2016)
Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2980–2988 (2017)
Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple features. CVPR. 1, 511–518 (2001)
Acknowledgement
This work was partially supported by the Shanghai Municipal Commission of Economy and Informatization (2017RGZN01026) and the National Key Research and Development Program of China (2018YFC0116400).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Mu, G., Chen, Y., Wu, D., Zhan, Y., Zhou, X.S., Gao, Y. (2019). Relu Cascade of Feature Pyramid Networks for CT Pulmonary Nodule Detection. In: Suk, HI., Liu, M., Yan, P., Lian, C. (eds) Machine Learning in Medical Imaging. MLMI 2019. Lecture Notes in Computer Science(), vol 11861. Springer, Cham. https://doi.org/10.1007/978-3-030-32692-0_51
Download citation
DOI: https://doi.org/10.1007/978-3-030-32692-0_51
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-32691-3
Online ISBN: 978-3-030-32692-0
eBook Packages: Computer ScienceComputer Science (R0)