Abstract
The diagnostic decision for chest X-ray image generally considers a probable change in a lesion, compared to the previous examination. We propose a novel algorithm to detect the change in longitudinal chest X-ray images. We extract feature maps from a pair of input images through two streams of convolutional neural networks. Next we generate the geometric correlation map computing matching scores for every possible match of local descriptors in two feature maps. This correlation map is fed into a binary classifier to detect specific patterns of the map representing the change in the lesion. Since no public dataset offers proper information to train the proposed network, we also build our own dataset by analyzing reports in examinations at a tertiary hospital. Experimental results show our approach outperforms previous methods in quantitative comparison. We also provide various case examples visualizing the effect of the proposed geometric correlation map.
You have full access to this open access chapter, Download conference paper PDF
Similar content being viewed by others
Keywords
1 Introduction
Chest X-ray (CXR) is the most commonly used radiological examination detecting a wide range of pulmonary diseases, such as pneumonia, tuberculosis, pleural effusion, pneumothorax, cardiomegaly, and lung cancer. Thanks to its short scan time and the low cost, most of the diagnostic routines include CXR as a basic screening tool, producing a massive amount of images to be read by radiologists.
The worldwide shortage of skilled radiologists has led to rising demand for a computer-aided detection system for CXR. Several algorithms using neural networks have recently been proposed to demonstrate diagnostic performance close to the radiologist-level [5, 7, 8]. However, those methods provide analysis for a limited set of disease classes with a cross-sectional single input image, while the radiologists make diagnostic decisions for every possible disease; and it is usually based on the longitudinal analysis that takes into account probable changes in the lesion, compared to the previous examination.
The longitudinal change in a lesion often plays a decisive role in diagnosis. If a mid-size lung nodule remains unchanged for a while, a routine follow-up will be recommended. But if the nodule suddenly appears or rapidly grows, we may need additional computed tomography examinations for careful management. In this regard, many radiologic reports include comments clarifying the changes, e.g., “no change since last study”.
Despite this clinical significance, relatively little research has been introduced to exploit the longitudinal analysis. In [11], the modified Long Short-Term Memory (LSTM) algorithm decodes the pattern in sequential examinations to classify the disease in the latest examination. The goal of this method is, however, improving the accuracy of classification rather than detecting the change. A solution to find the change was proposed in [13] by categorizing each image in the longitudinal image sequence. The focus here is on the change in disease class, but clinically important is the change in lesion on subsequent examinations, regardless of the disease class.
This work aims to detect the longitudinal change in a lesion given two consecutive images of a patient. We investigate a large collection of reports attached to CXR examinations at a tertiary referral hospital and build our own dataset for training and testing. We then propose a novel neural network architecture that generates a map describing the geometric correlation between images and detects specific patterns of the map indicating the change.
2 Dataset
To the best of our knowledge, none of the public datasets for CXR provides sufficient information to train a neural network detecting longitudinal change. Two well-known public datasets, the ChestX-ray14 [16] and the CheXpert [2] contain 112,120 and 224,316 CXR images tagged with 14 disease classes, but they do not present information whether the lesion of the disease is changing over time. Other datasets introduced in [3] neither provide the information nor contain sufficient images for training.
Addressing this challenge, we built our own dataset based on the CXR images stored in a tertiary hospital. The institutional review board waived informed consent due to the retrospective study design and the use of anonymized patient data. We found more than 1.8 million images taken from 2003 to 2017, together with available reports confirmed by board-certified radiologists in the routine practice.
We analyzed the reports to find out examinations including longitudinal diagnostic decision. As some sentences repeatedly appear in the reports, we first decomposed all the report text into sentences. We then counted the frequency of each sentence in every report, ignoring minor variations like spaces, punctuation or line breaks. In total 252,209 such unique sentences, only 590 sentences (0.2%) have presented in reports more than 50 times. Interestingly, the number of reports containing those sentences was 1.4 million, exceeding 77% of all reports. The most frequently used sentence was “no active lesion in the lung” presented in more than 470,000 reports. Through a full survey of 590 sentences, we accepted 193 sentences that explicitly describe changes in the lesion by time-related keywords (such as aggravation, increment, disappear, decrease, stable.) We understand the rest of 252,016 sentences (including 397 high-frequency sentences) may also contain some indication of the changes, but we leave this issue for the future work, e.g., employing natural language processing.
We divided the 193 sentences into two classes: 155 sentences for the change (shown in 18,911 reports;) and 38 sentences for the no-change (shown in 302,456 reports.) Several examples for the most frequent sentences in each class are demonstrated as follows: “decreased amount of bilateral pleural effusion”; “improving pulmonary edema”; “mild improvement of consolidation in both lungs” for the change class and “no interval change since last study”; “no change of stable tuberculosis”; “emphysema, no interval change” for the no-change class. Finally, we randomly selected examinations for each class and found corresponding previous examinations to create image pairs with at least 30 days interval: 1,751 pairs for the change class and 3,721 pairs for the no-change class, yielding total 5,472 pairs (10,944 images) finally included in our dataset. The classes of diseases shown in our dataset includes pleural effusion, pulmonary edema, pneumothorax, pleural thickening, haziness and so on.
3 Method
We design a novel neural network model to tackle the change detection. The overall architecture of our approach is outlined in Fig. 1. Given a pair of input images \((\mathcal {I}_0, \mathcal {I}_1)\) representing CXR images for previous and current examinations, we formulate the longitudinal change detection as a binary classification problem: change vs. no-change. We first extract features from both input images through two streams of convolutional neural networks, producing a pair of feature maps \((\mathcal {F}_0, \mathcal {F}_1)\) where \(\mathcal {F}_{\{0,1\}} \in \mathbb {R}^{h \times w \times d}\) with \(h,w,d \in \mathbb {N}\). These feature maps can be interpreted as a set of d-dimensional local descriptors defined on each pixel of \(h \times w\) resolution image. Next, we apply a correlation score calculator for every possible match of local descriptors between two feature maps. That is, each descriptor in one feature map yields matching scores with every descriptor in another feature map. This operation generates a set of score maps which we called as geometric correlation map. In the final step, this correlation map is provided as an input to a binary classifier and we train the classifier to determine whether the map indicates a certain change between the input images. The following subsections describe the details of each step.
Two-Stream Feature Extraction
To extract feature maps from the input image pair, we adopt squeeze and excitation network (SENet) [1] that has been showing the state-of-the-art performance in many computer vision problems [14, 15]. The network consists of attention blocks consisting of two modules: squeeze module that summarizes local information; and excitation module that scales the importance based on the local information.
Of note, the two networks are identical (i.e., sharing weights) so that visually similar image patches can produce highly correlated descriptors. We implement five attention blocks with 128, 160, 192, 224 and 256 channels. The max pooling (\(2 \times 2\)) is applied at the end of each layer.
Normalized Geometric Correlation Map
Next, we generate a matching pattern of the extracted two feature maps. We measure how strong a local descriptor in one feature map correlates with a local descriptor in the other feature map [9, 10]. Since we cannot expect the two feature maps are originally aligned (due to patient posture, scan angle, etc.), we compute the correlation scores for every possible pairing of descriptors. A descriptor in \(\mathcal {F}_0\) indexed by (i, j) yields a score map \(\mathcal {S}^{i,j} \in \mathbb {R}^{h \times w}\) as it matches with every descriptors in \(\mathcal {F}_1\); that is,
where \(\tau \) is the correlation score function accepting two descriptor vectors as input arguments. Iterating all possible (i, j), we finally obtain a stack of \(h \times w\) score maps that comprise the geometric correlation map \(\mathcal {G}\), such that,
For the correlation score function \(\tau \), we employ a simple-yet-powerful calculator based on an inner product. If both descriptor vectors indicate a similar direction in d-dimensional space, they are highly correlated and the score converges to one; otherwise, it converges to zero. We additionally apply normalization and zero-out negative values, and finally define:
Binary Classifier
The geometric correlation map is expected to show specific patterns according to the longitudinal change. Figure 2 visualizes the map in 3-dimensional space as a stack of score maps. The blank spaces in the map indicate no positive correlation while the vivid blue color means a strong correlation. In case the input images do not contain the change (first two rows), the correlation map seems to show high scores along the diagonal line. Otherwise (last two rows), the map tends to present relatively distracted scores.
We attach a binary classifier designed to detect such patterns. We employ a fully convolutional network (FCN) [6] composed of three convolutional layers without padding, and stride equal to one.
4 Experiment
We randomly split total 5,472 image pairs in our dataset into three sets for training (4,370), validation (551), and test (551). The proposed network is trained from scratch in end-to-end manner. Evaluation is conducted on the test set only. All images are normalized and resized to \(256 \times 256\). The algorithm parameters are empirically fixed as \(h=8\), \(w=8\) and \(d=256\) throughout the experiments.
We compare our technique with recent longitudinal analysis methods. We first construct a baseline method using single stream network: passing the input image pair as a two-channel image, directly followed by FCN without any matching module. To show the effect of the geometric correlation map, we implement another method replacing the map module with channel-wise concatenation [12]. We also reproduce a method based on t-LSTM [11], modified to predict the presence of change instead of disease class.
Table 1 shows the performance comparison evaluated by the area under the ROC curve (AUC) with 95% confidence interval. The baseline method with single stream presents the lowest performance, as it does not contain any explicit design for image matching. The modified LSTM [11] provides almost the same AUC as the baseline method. This is probably because the sequence of two feature maps is too short for the LSTM algorithm, but we rarely consider more than two previous examinations in the routine practice. Channel-wise concatenation [12] shows substantial enhancement compared to the baseline. Separate feature maps through the two identical network streams may give an implicit matching effect even with the simple concatenation strategy. Finally the proposed approach with the geometric correlation map yields outstanding performance with AUC = 0.89 (sensitivity = 0.83 and specificity = 0.82 on Youden’s Index) as it employs a proper algorithm to tackle image matching.
Figure 2 shows examples of the geometric correlation map generated by our algorithm. The first and second columns present the input images for previous and current examinations respectively. The lesion in each image is located by the red box. The third column visualize the geometric correlation maps. As mentioned earlier, the maps seem to show specific patterns according to the longitudinal change: concentrated (first two rows, change) vs. distracted (last two rows, no-change). Detailed description is provided for each case with the actual reporting sentences: The first row presents images for the normal chest with “no significant interval change since last study”. The case in the second contains “pulmonary TB (tuberculosis), stable state.” We add the third row to show “improving state of consolidation” in the sequential examinations. In the last row, “increased amount of left pleural effusion” is demonstrated as the lesion is shown to be aggravated in the current image.
5 Discussion
We present a novel approach to detect the change in a lesion in two longitudinal CXR examinations of a patient. Our method is not limited by pre-defined classes of disease but is designed to sense any substantial changes in the consecutive image pair. This property is an important factor for building a triage system to help radiologists with massive CXR reading assignments.
In future work, we will try to improve the detection performance through several approaches: First, we plan to apply a spatial regularization prior to correlation matching. As our current algorithm theoretically allow infeasible twisted matching, we expect the regularization constraint can improve registration quality and help to detect subtle or smaller changes. Next, we hope to enrich our database in general. The labeling accuracy can be improved by more sophisticated natural language analysis tools [4]. The number of cases can increase through multi-center study. We also plan to open the database to the public, prepared with a larger size of samples and more accurate labels.
References
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018
Irvin, J., et al.: CheXpert: a large chest radiograph dataset with uncertainty labels and expert comparison. arXiv preprint arXiv:1901.07031 (2019)
Jaeger, S., Candemir, S., Antani, S., Wáng, Y.X.J., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant. Imaging Med. Surg. 4(6), 475 (2014)
Joulin, A., Grave, E., Bojanowski, P., Mikolov, T.: Bag of tricks for efficient text classification. In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers, pp. 427–431. Association for Computational Linguistics, April 2017
Lakhani, P., Sundaram, B.: Deep learning at chest radiography: automated classification of pulmonary tuberculosis by using convolutional neural networks. Radiology 284(2), 574–582 (2017)
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015)
Nam, J.G., et al.: Development and validation of deep learning-based automatic detection algorithm for malignant pulmonary nodules on chest radiographs. Radiology 180237 (2018)
Rajpurkar, P., et al.: CheXnet: radiologist-level pneumonia detection on chest X-rays with deep learning. arXiv preprint arXiv:1711.05225 (2017)
Rocco, I., Arandjelovic, R., Sivic, J.: Convolutional neural network architecture for geometric matching. In: Proceedings of the CVPR, vol. 2 (2017)
Rocco, I., Arandjelovic, R., Sivic, J.: End-to-end weakly-supervised semantic alignment. In: Proceedings of the CVPR (2018)
Santeramo, R., Withey, S., Montana, G.: Longitudinal detection of radiological abnormalities with time-modulated LSTM. In: Stoyanov, D., et al. (eds.) DLMIA/ML-CDS -2018. LNCS, vol. 11045, pp. 326–333. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00889-5_37
Setio, A.A.A., et al.: Pulmonary nodule detection in CT images: false positive reduction using multi-view convolutional networks. IEEE Trans. Med. Imaging 35(5), 1160–1169 (2016)
Singh, R., et al.: Deep learning in chest radiography: detection of findings and presence of change. PloS One 13(10), e0204155 (2018)
Wang, F., et al.: Residual attention network for image classification. arXiv preprint arXiv:1704.06904 (2017)
Wang, X., Girshick, R., Gupta, A., He, K.: Non-local neural networks. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), vol. 1, p. 4 (2018)
Wang, X., Peng, Y., Lu, L., Lu, Z., Bagheri, M., Summers, R.M.: ChestX-ray8: hospital-scale chest X-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2097–2106 (2017)
Acknowledgement
This work was supported by the Industrial Strategic technology development program (10072064) funded by the Ministry of Trade, Industry and Energy (MI, Korea) and by grant (no. 13-2019-006) from the SNUBH Research Fund.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Oh, D.Y., Kim, J., Lee, K.J. (2019). Longitudinal Change Detection on Chest X-rays Using Geometric Correlation Maps. In: Shen, D., et al. Medical Image Computing and Computer Assisted Intervention – MICCAI 2019. MICCAI 2019. Lecture Notes in Computer Science(), vol 11769. Springer, Cham. https://doi.org/10.1007/978-3-030-32226-7_83
Download citation
DOI: https://doi.org/10.1007/978-3-030-32226-7_83
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-32225-0
Online ISBN: 978-3-030-32226-7
eBook Packages: Computer ScienceComputer Science (R0)