Abstract
Creating large scale high-quality annotations is a known challenge in medical imaging. In this work, based on the CycleGAN algorithm, we propose leveraging annotations from one modality to be useful in other modalities. More specifically, the proposed algorithm creates highly realistic synthetic CT images (SynCT) from prostate MR images using unpaired data sets. By using SynCT images (without segmentation labels) and MR images (with segmentation labels available), we have trained a deep segmentation network for precise delineation of prostate from real CT scans. For the generator in our CycleGAN, the cycle consistency term is used to guarantee that SynCT shares the identical manually-drawn, high-quality masks originally delineated on MR images. Further, we introduce a cost function based on structural similarity index (SSIM) to improve the anatomical similarity between real and synthetic images. For segmentation followed by the SynCT generation from CycleGAN, automatic delineation is achieved through a 2.5D Residual U-Net. Quantitative evaluation demonstrates comparable segmentation results between our SynCT and radiologist drawn masks for real CT images, solving an important problem in medical image segmentation field when ground truth annotations are not available for the modality of interest.
You have full access to this open access chapter, Download conference paper PDF
Similar content being viewed by others
Keywords
1 Introduction
Prostate segmentation from radiology scans is often necessary for radiotherapy, prostatectomy, and calculation of prostate-specific antigen (PSA) density [1]. Among imaging modalities, magnetic resonance imaging (MRI) provides the best soft tissue contrast and yields the most accurate estimation on prostate volume, consistent with prostatectomy specimen volumes [2]. Unlike MRI, computed tomographic (CT) scans have difficulties to distinguish the boundaries of prostates and other adjacent tissues during segmentation [3]. Despite this, in current clinical practice, prostate radiation therapy dose calculations is primarily based on CT scans as it is the only modality that can derive electron density needed for the dosimetry calculations [4]. Therefore, planning systems generally require anatomical information to be delineated on CT scans.
In this study, we address a practical yet still very challenging issue of prostate segmentation from CT images when there are no ground truth CT annotations to supervise the segmentation algorithm. Instead, we target utilizing segmentation labels from widely available MRI data sets, and propose a two step knowledge transfer algorithm to map the segmentation labels from MRI to CT scans. The correspondence between MRI to CT is established through a CycleGAN algorithm [5] with a structural similarity preserving cost function. Highly realistic synthetic CT scans generated in the first step are then used to supervise a deep segmentation network in the second step. The training for the segmentation network is performed only on the synthetic images while testing is done on both synthetic and real CT scans for evaluation. While our framework does not enforce the use of any specific segmentation network to finalize the delineation process, we choose 2.5D Res-U-Net to accomplish this task with faster convergence, and higher accuracy.
2 Methods
The proposed workflow includes two main steps as demonstrated in Fig. 1. First step is to generate high-quality and reliable CT images (SynCT) from MR images. Previous work [6] has shown that domain adaptation from MR images to CT images is feasible using the CycleGAN architecture. We used a similar CycleGAN approach as baseline to create high-quality knowledge transfer between unpaired MRI and CT.
Second step is to conduct automatic segmentation of prostate. We trained a U-Net based segmentation network to delineate the whole prostate area but with two main differences from the existing literature: (i) we used SynCT in training and real CT scans in testing, and (ii) we modified the U-Net [7] to increase the segmentation performance by adding residual blocks into the segmentation network. For better 3D information fusion, we also modified the segmentation architecture to utilize two additional adjacent slices in its input (i.e., 3-channel input).
2.1 Data
We used a total of three different data sets for our experiments and evaluations. For cycleGAN training, 346 T2 weighted MRI scans from publicly available PROSTATEx-challenge data [9] was used. T2-weighted images were acquired using a turbo spin echo sequence with in-plane resolution of 0.4–0.6 mm, slice thickness of 3.6 mm and zero gap. Secondly, the testing data set for CycleGAN included 60 prostate MRI cases along with their high-quality delineation obtained from publicly available NCI-ISBI 2013 challenge data [10]. This data was used for generating the synthetic CT scans. We used 6-fold stratified cross validation for evaluation of the algorithms. Third, for real CT scans, as part of retrospective IRB approved study, we acquired prostate CT data from 120 anonymized patients from our institution with resolution (\(0.8\times 0.8\times 1\) mm\(^3\)). CT intensity was clipped to \(-500\)HU to 500HU to reveal more soft tissue contrast similar to a soft tissue CT window. Prostate MRI and CT data are completely different from each other, namely unpaired. Among in-house collected CT data, we chose 19 of them to be manually segmented by a board certified radiologist for Dice score (DSC) comparison with our automatic segmentation method.
2.2 Synthetic CT Network: CycleGAN
The synthetic CT images were generated by the CycleGAN model [5], which consisted of two pairs of generative adversarial networks (GAN) and two extra generators that convert generated data back to the original domain enforcing cycle consistency. In our study, the forward-direction GAN has a generator, \(G_{CT}(MR)\), that generate synthetic CT as real as possible such that a discriminator, \(D_{CT}\) cannot distinguish it from the real CT. The discriminator is to ensure the likeness of generated data with original data, hence, the reliability of the generated data heavily depends on the performance of the discriminator, the discriminator loss is described by Eq. 1.
Where \(I_{CT}^{j}\) denotes the j-th true CT slice; \(I_{MR}^{i}\) represents the i-th MRI slice; \(G_{CT}(I_{MR}^{i})\) represents the generated image by generator \(G_{CT}(MR)\) from \(I_{MR}^{i}\); \(D_{CT}\) represents the discriminator who is trying to differentiate the generated image from CT images, if the discriminator cannot distinguish the generated image, it is labeled 1, which means the discriminator recognized this generated image as true CT image, otherwise a 0 label is given.
The generator \(G_{MR}(SynCT)\) is translating the SynCT back to its’ original data domain (MR domain). By minimizing the difference between the reconstructed data and the original data (cycle-consistency loss), a powerful constraint has been enforced on the model to prevent generated data deviation from ground-truth. The cycle-consistency loss is express as Eq. 2 here.
where P is the image patch, N is number of pixels in P, and p is the index of pixel; SSIM, for a pixel p, is defined as in Eq. 3. Where \(\mu _{x}, \mu _{y}\) and \(\sigma _{x}, \sigma _{y}\) denotes mean pixel intensity and the standard deviations of pixel intensity in a local image patch centering at either x or y. Also, \(C_{1}\) and \(C_{1}\) are small constants being added for stability. The cycle loss compares the reconstructed MRI with the true MRI slices in a pixel by pixel manner. In our new formulation, instead of computing mean-square-error (MSE), we propose to use structural similarity index (SSIM) that takes into account the context of the images at a higher level than pixel-level MSE [11].
2.3 Segmentation Network: 2.5D Res-U-Net
The U-Net architecture [7] has long skip connections to preserve spatial information during down-sampling. Besides long skip connections, short skip connections were also added forming residual blocks to prevent vanishing gradient and increase the convergence speed, the U-Net with short skip connections is called Res-U-Net [8]. Also, the proposed 2.5D input technique loads multiple slices simultaneously, which includes one central slice and its adjacent slices in out-of-plane direction. The number of channels is determined as the sum of central slice and the adjacent slices (\(channel\,No. = central\,slice + adjacent\,slices\)). The number of adjacent slices is defined through a designated context number which can query adjacent slices in both positive and negative directions (\(adjacent\,slices = 2 \times context\,No\)). For instance, if the context number is set to be 1, the selected adjacent slices will include +1 and \(-1\) slices adjacent to the central slices. The context number can be adjusted in order to optimized the segmentation results.
3 Results
The CycleGAN model was trained using Adam optimizer for 200 epochs with initial learning rate 0.0002; the 2.5D Res-U-Net model was trained using Adam optimizer for 300 epochs and binary cross entropy loss function was used because there are only two classes, masks and non-masks. Training took about 24 h for CycleGAN to generate SynCT and about 12 h for 2.5D Res-U-Net on a DGX-station with 4x Tesla V100 GPUs each with 32 GB RAM. The segmentation results are displayed in Fig. 2. For data augmentation, rotation, flipping, and random crops from ratio 1 (no crop) to 0.5 (half crop) of original images were performed during training.
2.5D Res-U-Net trained and tested on MRI data illustrates the upper bounds of performance, network trained on CT/SynCT data will intuitively be lower than 0.9 (Table 1). SynCTs paired with MRI segmentations were used to train the automatic segmentation network. For SynCT generated from default CycleGAN setting (MSE loss, random crop with fix ratio, 284 to 256 pixels) and no intensity clipping, we achieved \(0.83\pm 0.13\) and \(0.45\pm 0.29\) DSC for SynCT and CT testing set, respectively; for Soft-tissue SynCT (intensity clipped from \(-500\) HU to 500 HU), we achieved \(0.82\pm 0.12\) and \(0.62\pm 0.15\) DSC for SynCT and CT testing set, respectively. More aggressive data augmentation (random crop with random ratio, rotation, flipping) also adapted to generate higher quality SynCT from CycleGAN, which achieved \(0.65\pm 0.09\) and \(0.68\pm 0.09\) DSC for SynCT and CT segmentation testing set, respectively. To increase the structure accuracy, the cycle loss has replaced into structural similarity index (SSIM), the 2.5D Res-U-Net trained with SynCT-SSIM achieved \(0.80\pm 0.12\) and \(0.73\pm 0.09\) DSC for SynCT and CT testing set, respectively. Note that the DSC of SynCT decrease and the DSC of CT increase to reach a compatible point with no statistical difference (\(p>0.05\)), also the standard deviations are converging. This tendency indicated our SynCT gradually reached a point where there was no difference with true CT from 2.5D Res-U-Net network perspective.
4 Discussion and Concluding Remarks
Intensive studies have been made regarding prostate CT automatic segmentation. Recently, the reported highest DSC is \(0.88 \pm 0.03\) by Liu et al. [12] using U-Net and 1114 ture CT cases. Our average result is \(0.73 \pm 0.09\) which is compatible with Burgos et al. [13] using multi-atlas based SynCT (0.73 DSC). We have shown that the SynCT and the CT testing results have no statistical difference indicating the feasibility of using SynCT to train a neural network for a very challenging segmentation task. In some cases DCS is low but not due to low performance of the proposed network. The low DSC is sometimes due to noise in the contouring in the hand-drawn CT ground-truth segmentation and large anatomical and pathological variations (see Fig. 2).
Data Augmentation: We used MRI and CT scans from different data sources, MRI have smaller field-of-view (FOV) compared to CT. Inconsistent FOV encouraged CycleGAN to shift the anatomy without focusing on anatomical details. To generate high-quality SynCT, we central cropped the CT images by 50% to remove the surrounding air and scanning table. Then augment the data with random ratio (1–0.5) random crop, rotation, and flipping to reduce certain geometry tendency affecting the learning process.
2.5D Technique: 2.5D multi-slices input technique can affect the segmentation network performance as Fig. 3 shows here. For SynCT, from single slice to 3-slices, DSC increases significantly (\(p<0.05\)) by \(19.11\%\), from 3-slices to 5-slices no significant difference was found, from 5-slices to 7-slices, DSC decreased \(12.5\%\); for CT, from single slice to 3-slices, DSC increase significantly by \(24.17\%\), from 3-slices to 5-slices no significant difference found, from 5-slices to 7-slices, DSC drop significantly by \(40.93\%\). Therefore, to optimized the performance of 2.5D Res-U-Net and also save training time, context number 1 (3-slices input) was used for all experiments.
In summary, we proposed a novel approach to segment prostate from CT scans when the ground-truth was absent. Synthetic CT scans that share high-quality segmentation with MRI were used to train a deep-learning based automatic segmentation network (2.5D Res-U-Net). The testing results on true CT achieved 0.73 DSC which is comparable with SynCT. We also examined and identified the optimal numbers of multiple slices input, which are 3 or 5 slices. Future steps will include 3D volume assessment and continue improvement of the quality of synthetic CT generation.
References
Nordstrm, T., et al.: Prostate-specific antigen (PSA) density in the diagnostic algorithm of prostate cancer. Prostate Cancer Prostatic Dis. 21(1), 57–63 (2017)
Smith, W.L., et al.: Prostate volume contouring: a 3D analysis of segmentation using 3DTRUS, CT, and MR. Int. J. Radiat. Oncol. Biol. Phys. 67(4), 1238–1247 (2007)
Rasch, C., et al.: Definition of the prostate in CT and MRI: a multi-observer study. Int. J. Radiat. Oncol. Biol. Phys. 43(1), 57–66 (1999)
Chowdhury, N., et al.: Concurrent segmentation of the prostate on MRI and CT via linked statistical shape models for radiotherapy planning. Med. Phys. 39(4), 2214–2228 (2012)
Zhu, J.-Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. arXiv:1703.10593 (2017)
Wolterink, J.M., Dinkla, A.M., Savenije, M.H.F., Seevinck, P.R., van den Berg, C.A.T., Išgum, I.: Deep MR to CT synthesis using unpaired data. In: Tsaftaris, S.A., Gooya, A., Frangi, A.F., Prince, J.L. (eds.) SASHIMI 2017. LNCS, vol. 10557, pp. 14–23. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-68127-6_2. https://arxiv.org/abs/1708.01155
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28. arXiv:1505.04597
Drozdzal, M., Vorontsov, E., Chartrand, G., Kadoury, S., Pal, C.: The importance of skip connections in biomedical image segmentation. In: Carneiro, G., et al. (eds.) LABELS/DLMIA -2016. LNCS, vol. 10008, pp. 179–187. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46976-8_19
Litjens, G., Debats, O., Barentsz, J., Karssemeijer, N., Huisman, H.: SPIE-AAPM PROSTATEx Challenge Data (2017). https://doi.org/10.7937/K9TCIA.2017.MURS5CL
Bloch, N., et al.: NCI-ISBI 2013 Challenge: Automated Segmentation of Prostate Structures. The Cancer Imaging Archive (2015). https://doi.org/10.7937/K9/TCIA.2015.zF0vlOPv
Zhao, H., et al.: Loss functions for image restoration with neural networks. IEEE Trans. Comput. Imaging 3(1), 47–57 (2017)
Liu, C., et al.: Automatic segmentation of the prostate on CT images using deep neural networks (DNN). Int. J. Radiat. Oncol. Biol. Phys. 104(4), 924–932 (2019)
Burgos, N., et al.: Iterative framework for the joint segmentation and CT synthesis of MR images: application to MRI-only radiotherapy treatment planning. Phys. Med. Biol. 62, 4237–4253 (2017)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Liu, Y. et al. (2019). Cross-Modality Knowledge Transfer for Prostate Segmentation from CT Scans. In: Wang, Q., et al. Domain Adaptation and Representation Transfer and Medical Image Learning with Less Labels and Imperfect Data. DART MIL3ID 2019 2019. Lecture Notes in Computer Science(), vol 11795. Springer, Cham. https://doi.org/10.1007/978-3-030-33391-1_8
Download citation
DOI: https://doi.org/10.1007/978-3-030-33391-1_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-33390-4
Online ISBN: 978-3-030-33391-1
eBook Packages: Computer ScienceComputer Science (R0)