1 Introduction

MRI-guided radiotherapy is an emerging technology for improving treatment accuracy over conventional CT-based radiotherapy due to better soft-tissue contrast in MR compared to CT images. Real-time and accurate tumor segmentation on MRI can help to deliver high dose to tumors while reducing normal tissue dose. However, as MRI-guided radiotherapy is not used in standard-of-care, only very few MRIs are available for training. Therefore, we developed an adversarial domain adaptation from large CT datasets for tumor segmentation on MRI.

Although deep neural networks excel in learning from large amounts of (labeled) data, their accuracy is reduced when applied to novel datasets or domains [1]. Differences between source and target domain distribution is called domain shift. Typically used fine-tuning methods require prohibitively large labeled data in the target domain. As an alternative, domain adaptation methods attempt to minimize domain shift either by feature sharing [2] or by learning to reconstruct the target from source domain [3, 4]. In essence, domain adaptation methods learn the marginal distributions [5] to transform source to target domain.

The problems of domain shift are exacerbated in medical images, where imaging modalities capture physical properties of the underlying anatomy differently (eg. CT vs. MRI). For example, whereas bones appear hyper-dense on CT and dark on MRI, tumors appear with similar contrast as normal soft-tissue on CT but have a distinct appearance on MRI (Fig. 1(a) and (b)). Consequently, learning the marginal distributions of the domains alone may not be sufficient.

Fig. 1.
figure 1

MRI synthesized from a representative (a) CT image using (c) cycle-GAN [5] and (d) proposed method. The corresponding MRI scan for (a) is shown in (b). As shown, the proposed method (d) using tumor-aware loss helps to fully preserve tumor in the synthesized MRI compared with (c).

Cross-domain adaptation of highly different modalities, has been applied in medical image analysis for image synthesis using paired images [6] and unpaired images [7], as well as for segmentation [8, 9]. However, all aforementioned approaches aim to only synthesize images that match the marginal but not the structure-specific conditional distribution such as tumors. Therefore, segmentation/classification using such synthetic images will lead to lower accuracy.

Therefore, we introduced a novel target specific loss, called tumor-aware loss, for unsupervised cross-domain adaptation that helps to preserve tumors on synthesized MRIs produced from CT images (Fig. 1(d)), which cannot be captured with just the cycle-loss (Fig. 1(c)).

2 Method

Our objective is to solve the problem of learning to segment tumors from MR images through domain adaptation from CT to MRI, where we have access to a reasonably sized labeled data in the source domain \((X_{CT}, y_{CT})\) but are provided with very limited number of target samples \(X_{MRI} \ll X_{CT}\) and fewer labels \(y_{MR}\). Our solution first employs tumor-aware unsupervised cross-domain adaptation to synthesize a reasonably large number of MRI from CT through adversarial training. Second, we combine the synthesized MRI with a fraction of real MRI with corresponding labels and train a Unet [10] for generating tumor segmentation as outlined in Fig. 2.

Fig. 2.
figure 2

Approach overview. \(X_{CT}\) and \(X_{MRI}\) are the real CT and MRI; \(X_{CT}^{MRI}\) and \(X_{MRI}^{CT}\) are the synthesized MR and CT images; \(y_{CT}\) is the CT image label; \(G_{CT \rightarrow MRI}\) and \(G_{MRI \rightarrow CT}\) are the CT and MRI transfer networks; \(\tilde{X}_{MRI}\) and \(\tilde{y}_{MRI}\) are a small sample set from the real MRI, used to train semi-supervised segmentation.

2.1 Step 1: MRI Synthesis Using Tumor-Aware Unsupervised Cross Domain Adaptation

The first step is to learn a mapping \(G_{CT \rightarrow MRI}\) that synthesizes MRI from the CT images to fool a discriminator \(D_{MRI}\) using adversarial training [11]. Additionally, we compute an adversarial loss \(L^{CT}_{adv}\) for synthesizing CT from MRI by simultaneously training a network that learns a mapping \(G_{MRI \rightarrow CT}\). The adversarial loss, \(L^{MRI}_{adv}\), for synthesizing MRI from CT, and \(L^{CT}_{adv}\), for synthesizing CT from MRI, are computed as:

$$\begin{aligned} \begin{aligned}&L^{MRI}_{adv}(G_{CT \rightarrow MRI}, D_{MRI}, X_{MRI}, X_{CT})= \mathbb {E}_{x_{m} \sim X_{MRI}} [log(D_{MRI}(x_{m}))] \\ +&\mathbb {E}_{x_{c} \sim X_{CT}} [log(1-(D_{MRI}(G_{CT \rightarrow MRI}(x_{c}))] \\&L^{CT}_{adv}(G_{MRI \rightarrow CT}, D_{CT}, X_{CT}, X_{MRI}) = \mathbb {E}_{x_{c} \sim X_{CT}} [log(D_{CT}(x_{c}))] \\ +&\mathbb {E}_{x_{m} \sim X_{MRI}} [log(1-(D_{CT}(G_{MRI \rightarrow CT}(x_{m}))] \\ \end{aligned} \end{aligned}$$
(1)

where \(x_{c}\) and \(x_{m}\) are real images sampled from the CT (\(X_{CT}\)) and MRI (\(X_{MRI}\)) domains, respectively. The total adversarial loss (Fig. 2 (purple ellipse)) is then computed as the summation of the two losses as \(L_{adv}=L^{MRI}_{adv}+L^{CT}_{adv}\). We also compute a cycle consistency loss [5] to regularize the images synthesized through independent training of the two networks. By letting the synthesized images be \(x^{'}_{m}=G_{CT \rightarrow MRI}(x_{c})\) and \(x^{'}_{c}=G_{MRI \rightarrow CT}(x_{m})\), the cycle consistency loss \(L_{cyc}\) is calculated as:

$$\begin{aligned} \begin{aligned} L_{cyc}(G_{CT \rightarrow MRI}, G_{MRI \rightarrow CT}, X_{CT}, X_{MRI}&) = \mathbb {E}_{x_{c} \sim X_{CT}}\left[ \left\| G_{MRI \rightarrow CT}(x^{'}_{m}) - x_{c}\right\| _{1}\right] \\&+ \mathbb {E}_{x_{m} \sim X_{MRI}}\left[ \left\| G_{CT \rightarrow MRI}(x^{'}_{c}) - x_{m}\right\| _{1}\right] . \end{aligned} \end{aligned}$$
(2)

The cycle consistency and adversarial loss only constrain the model to learn a global mapping that matches the marginal distribution but not the conditional distribution pertaining to individual structures such as the tumors. Therefore, a model trained using these losses does not need to preserve tumors, which can lead to either deterioration or total loss of tumors in the synthesized MRIs (Fig. 1(c)). Therefore, we introduced a tumor-aware loss that forces the network to preserve the tumors. To be specific, the tumor-aware loss is composed of a tumor loss (Fig. 2 (red ellipse)) and a feature loss (Fig. 2 (orange ellipse)). We compute the tumor loss by training two parallel tumor detection networks using simplified models of the Unet [10] for CT (\(U_{CT}\)) and the synthesized MRI (\(U_{MRI}\)). The tumor loss constrains the CT and synthetic MRI-based Unets to produce similar tumor segmentations, thereby, preserving the tumors and is computed as:

$$\begin{aligned} \begin{aligned} L_{tumor}&= \mathbb {E}_{x_{c} \sim X_{CT}, y_{c} \sim y_{CT}}[logP(y_{c}|G_{CT \rightarrow MRI}(x_{c}))] \\&\quad +\mathbb {E}_{x_{c} \sim X_{CT}, y_{c} \sim y_{CT}}[logP(y_{c}|X_{CT})]. \end{aligned} \end{aligned}$$
(3)

On the other hand, the tumor feature loss \(L_{feat}\) forces the high-level features of \(X_{CT}\) and \(X_{CT}^{MRI}\) to be shared by using a constraint inspired by [12] as:

$$\begin{aligned} L_{feat}(x_{c} \sim X_{CT})=\frac{1}{C \times H \times W }\left\| \phi _{CT}(x_{c})-\phi _{MRI}(G_{CT \rightarrow MRI}(x_{c}))\right\| ^{2}. \end{aligned}$$
(4)

where \(\phi _{CT}\) and \(\phi _{MRI}\) are the high-level features extracted from the \(U_{CT}\) and \(U_{MRI}\), respectively; C, H and W indicate the size of the feature. The total loss is then expressed as:

$$\begin{aligned} L_{total}=L_{adv}+\lambda _{cyc}{L_{cyc}}+\lambda _{tumor}{L_{tumor}}+\lambda _{feat}{L_{feat}}, \end{aligned}$$
(5)

where \(\lambda _{cyc}\), \(\lambda _{tumor}\) and \(\lambda _{feat}\) are the weighting coefficients for each loss. During training, we alternatively update the domain transfer or generator network G, the discriminator D, and the tumor constraint network U with the following gradients, \(-\varDelta _{\theta _{G}}(L_{adv}+ \lambda _{cyc}{L_{cyc}}+ \lambda _{tumor}{L_{tumor}}+ \lambda _{feat}L_{feat})\), \(-\varDelta _{\theta _{D}}(L_{adv})\) and \(-\varDelta _{\theta _{U}}(L_{tumor}+\lambda _{feat}L_{feat})\).

2.2 Step 2: Semi-supervised Tumor Segmentation from MRI

The synthesized MRI from the first step were combined with a small set of real MRI with labels (\(\tilde{X}_{MRI}\) and \(\tilde{y}_{MRI}\) in Fig. 2) to train a U-net [10] using Dice loss [13] (Fig. 2 (blue ellipse)) to generate tumor segmentation. Adversarial network optimization for MRI synthesis was frozen prior to semi-supervised tumor segmentation training to prevent leakage of MRI label information.

2.3 Network Structure and Implementation

The generators G and discriminators D for CT and MRI synthesis networks were implemented similar to that in [5]. We tied the penultimate layer in \(U_{MRI}\) and \(U_{CT}\). The details of all networks are shown in the supplementary documents. Pytorch library [14] was used for implementing the proposed networks, which were trained on Nvidia GTX 1080Ti of 12 GB memory with a batch size of 1 during image transfer and batch size of 10 during semi-supervised segmentation. The ADAM algorithm [15] with an initial learning rate of 1e-4 was used during training. We set \(\lambda _{cyc}=10\), \(\lambda _{tumor}=5\) and \(\lambda _{feat}=1\).

3 Experiments and Results

3.1 Ablation Tests

We tested the impact of adding tumor-aware loss to the cycle loss (proposed vs. cycle-GAN [5] vs. masked-cycle-GAN [8]). Images synthesized using aforementioned networks were trained to segment using semi-supervised learning by combining with a limited number of real MRI. We call adversarial synthesis [8] that combined tumor labels as an additional channel with the original images as masked-cycle-GAN. We also evaluated the effect of adding a limited number of original MRI to the synthesized MRI on segmentation accuracy (tumor-aware with semi-supervised vs. tumor-aware with unsupervised training). We benchmarked the lowest achievable segmentation accuracy by training a network with only the pre-treatment (or week one) MRI.

3.2 Datasets

The image synthesis networks were trained using contrast-enhanced CT images with expert delineated tumors from 377 patients with non-small cell lung cancer (NSCLC) [16] available from The Cancer Imaging Archive (TCIA) [17], and an unrelated cohort of 6 patients scanned with T2w MRI at our clinic before and during treatment every week (n = 7) with radiation therapy. Masked cycle-GANs used both tumor labels and the images as additional channels even for image synthesis training. Image regions enclosing the tumors were extracted and rescaled to \(256\times {256}\) to produce 32000 CT image slices and 9696 T2w MR image slices. Only 1536 MR images from pre-treatment MRI were used for semi-supervised segmentation training of all networks. Segmentation validation was performed on the subsequent on-treatment MRIs (n = 36) from the same 6 patients. Test was performed using 28 MRIs consisting of longitudinal scans (7, 7, 6) from 3 patients and pre-treatment scans from 8 patients not used in training. Tumor segmentation accuracy was evaluated by comparing to expert delineations using the Dice Score Coefficient (DSC), and the Hausdorff Distance 95% (HD95).

3.3 MR Image Synthesis Results

Figure 3 shows the representative qualitative results of synthesized MRI produced using only the cycle-GAN (Fig. 3(b)), masked cycle-GAN (Fig. 3(c)) and using our method (Fig. 3(d)). As seen, our method best preserves the anatomical details between CT and MRI. Quantitative evaluation using the Kullback - Leibler (KL) divergence computed from tumor regions between synthesized and original MRI, used for training, confirmed that our method resulted in the best match of tumor distribution with the lowest KL divergence of 0.069 compared with those obtained using the cycle-GAN (1.69) and masked cycle-GAN (0.32).

Fig. 3.
figure 3

MRI synthesized from CT using different deep learning methods. The red contour indicates the manually delineated tumor region in the NSCLC datasets [16]. (a) CT image; (b) Cycle-GAN [5]; (c) Masked cycle-GAN [8]; (d) Proposed.

3.4 Segmentation Results

Figure 4 shows the segmentations generated using the various methods (yellow contours) for three representative cases from the test and validation sets, together with the expert delineations (red contours). As shown in Table 1, our approach outperformed cycle GAN irrespective of training without (unsupervised) or with (semi-supervised) labeled target data. Semi-supervised segmentation outperformed all methods in both test and validation datasets.

Table 1. Segmentation accuracy
Fig. 4.
figure 4

Segmentation results on the representative examples from the validation and test set of different methods. The red contour stands for the expert delineations and the yellow contour stands for the segmentation results. (a) segmentation with only week 1 MRI; (b) segmentation using MRI synthesized by cycle-GAN [5]; (c) segmentation using MRI synthesized by masked cycle-GAN [8]; (d) tumor-aware unsupervised learning; (e) tumor-aware semi-supervised learning

4 Discussion

In this work, we introduced a novel target-specific, tumor-aware loss for synthesizing MR images from unpaired CT datasets using unsupervised cross-domain adaptation. The tumor-aware loss forces the network to retain tumors that are typically lost when using only the cycle-loss and leads to accurate tumor segmentation. Although applied to lung tumors, our method is applicable to other structures and organs. Segmentation accuracy of our approach trained with only synthesized MRIs exceeded other methods trained in a semi-supervised manner. Adding small set of labeled target domain data further boosts accuracy. The validation set produced lower but not significantly different (p = 0.1) DSC accuracy than the test set due to significantly smaller (p = 0.0004) tumor volumes in validation (mean 37.66cc) when compared with the test set (mean 68.2cc). Our results showed that masked-cycle-GAN produced lower test performance compared to basic cycle-GAN, possibly due to poor modeling from highly unbalanced CT and MR datasets. As a limitation, our approach only forces the synthesized MRIs to preserve tumors but not the MR intensity distribution within tumors. Such modeling would require learning the mapping for individual scan manufacturers, magnet strengths and coil placements which was outside the scope of this work. Additionally, synthesized images irrespective of the chosen method do not produce a one-to-one pixel mapping from CT to MRI similar to [8]. There is also room for improving the segmentation accuracy by exploring more advanced segmentation models, e.g. boundary-aware fully convolutional networks (FCN) [18].

5 Conclusions

In this work, we proposed a tumor-aware, adversarial domain adaptation method using unpaired CT and MR images for generating segmentations from MRI. Our approach preserved tumors on synthesized MRI and generated the best segmentation performance compared with state-of-the-art adversarial cross-domain adaptation. Our results suggest feasibility for lung tumor segmentation from MRI trained using MRI synthesized from CT.