Craniomaxillofacial Bony Structures Segmentation from MRI with Deep-Supervision Adversarial Learning

Miaoyun Zhao¹⁸,
Li Wang¹⁸,
Jiawei Chen¹⁸,
Dong Nie¹⁸,
Yulai Cong¹⁹,
Sahar Ahmad¹⁸,
Angela Ho²⁰,
Peng Yuan²⁰,
Steve H. Fung²⁰,
Hannah H. Deng²⁰,
James Xia²⁰ &
…
Dinggang Shen¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11073))

Included in the following conference series:

International Conference on Medical Image Computing and Computer-Assisted Intervention

10k Accesses
23 Citations
1 Altmetric

Abstract

Automatic segmentation of medical images finds abundant applications in clinical studies. Computed Tomography (CT) imaging plays a critical role in diagnostic and surgical planning of craniomaxillofacial (CMF) surgeries as it shows clear bony structures. However, CT imaging poses radiation risks for the subjects being scanned. Alternatively, Magnetic Resonance Imaging (MRI) is considered to be safe and provides good visualization of the soft tissues, but the bony structures appear invisible from MRI. Therefore, the segmentation of bony structures from MRI is quite challenging. In this paper, we propose a cascaded generative adversarial network with deep-supervision discriminator (Deep-supGAN) for automatic bony structures segmentation. The first block in this architecture is used to generate a high-quality CT image from an MRI, and the second block is used to segment bony structures from MRI and the generated CT image. Different from traditional discriminators, the deep-supervision discriminator distinguishes the generated CT from the ground-truth at different levels of feature maps. For segmentation, the loss is not only concentrated on the voxel level but also on the higher abstract perceptual levels. Experimental results show that the proposed method generates CT images with clearer structural details and also segments the bony structures more accurately compared with the state-of-the-art methods.

You have full access to this open access chapter, Download conference paper PDF

Segmenting Bones from Brain MRI via Generative Adversarial Learning

Segmentation of Craniomaxillofacial Bony Structures from MRI with a 3D Deep-Learning Based Cascade Framework

Deep Learning Techniques for 3D-Volumetric Segmentation of Biomedical Images

1 Introduction

Generating a precise three-dimensional (3D) skeletal model is an essential step during craniomaxillofacial (CMF) surgical planning. Traditionally, computed tomography (CT) images are used in CMF surgery. However, a patient has to be exposed under radiation [1]. Magnetic Resonance Imaging (MRI), on the other hand, provides a safer scanning without radiation and non-invasive way to render CMF anatomy. However, it is extremely difficult to accurately segment CMF bony structures from MRI due to the confusing boundaries between bones and air (both appearing to be black in MRI), low signal-to-noise ratio, and partial volume effect.

Recently, deep learning has demonstrated outstanding performance in a wide range of computer vision and image analysis applications. With a properly designed loss function, deep learning methods can automatically learn complex hierarchical features for a specific task. In particular, fully convolutional neural network (FCN) [2] was proposed to perform image segmentation by down-sampling and up-sampling streams. U-Net based methods further proposed skip connections to concatenate the lower fine feature maps to the higher coarse feature maps [3]. Nie et al. proposed a 3D deep-learning based cascade framework, in which a 3D U-Net is used to train a coarse segmentation and then a CNN is cascaded for fine-grained segmentation [4]. However, most of the previous works typically perform segmentation on the original MRI with low contrast for bony structures. Inspired by great success of Generative Adversarial Network (GAN) [5] in generating realistic images, we hypothesize that the segmentation problem can also be treated as an estimation problem, i.e., generating realistic CT images from MRIs and performing segmentation from the generated CT images. In this paper, we propose a framework of deep-supervision adversarial learning for CMF structure segmentation on the MR images. Our proposed framework consists of two major steps: (1) a simulation GAN to estimate a CT image from an MR image, and (2) a segmentation GAN to segment CMF bony structures based on both the original MR image and the generated CT image. Specifically, a CT image is first generated from a given MR image by a deep-supervision discriminative GAN, where a perceptive loss strategy is developed to obtain the knowledge from the real CT image in terms of both local detailed information and global structures. Furthermore, in segmentation task, with the proposed perceptive loss strategy, the discriminative GAN evaluates the segmentation results with the feature maps at different layers and the feedback structure information from both the original MR image and the generated CT image.

2 Method

In this section, we propose a cascaded generative adversarial network with deep-supervision discriminators (Deep-supGAN) to perform CMF bony structures segmentation from the MR image and generated CT image. The proposed framework is shown in Fig. 1. It includes two parts: (1) a simulation GAN that estimates a CT image from an MR image and (2) a segmentation GAN that segments the CMF bony structures based on both the original MR image and the generated CT image. The simulation GAN consists of the deep-supervision discriminators designed at each convolution layer to evaluate the quality of the generated image. In segmentation GAN, the deep-supervision perception loss is employed to evaluate the segmentation at multiple levels. Note that, for the discriminators of both parts, we utilize the first four convolution layers of a VGG-16 network [6] pre-trained on the ImgeNet dataset to extract the feature maps.

2.1 Simulation GAN

The simulation GAN for generating CT from MRI is shown in the upper portion of Fig. 1. Considering $ z $ as a ground-truth MRI patch, $ x $ as a ground-truth CT patch, and $ x^{{\prime }} $ as a generated CT patch, we design a generator $ G_{c} \left( z \right) $ to map a given MR image patch into a CT image patch. To make the generated CT image patch similar to the ground-truth CT image in terms of both local details and global structures, we design multiple deep-supervision discriminator $ D_{c}^{l} \left( x \right),\,\left( {l = 1,2,3, \cdots } \right) $. Here, $ D_{c}^{l} \left( x \right) $ is a discriminator at the l-th layer of a pre-trained VGG-16 network, where each layer can extract features with different scales, from local details to global structures. Thus, each discriminator compares the generated CT with the ground-truth CT in different scales, resulting in an accurate simulation. To match the generated CT with the ground-truth CT, an adversarial game is played between $ G_{c} \left( z \right) $ and $ D_{c}^{l} \left( x \right) $. The loss function for the game is described as:

$$ \begin{aligned} \mathop {\hbox{min} }\limits_{{G_{c} }} \,\mathop {\hbox{max} }\limits_{{D_{c}^{l} }} {\mathbb{E}}_{{x{\sim}p\left( x \right)}} & \left[ {\sum\nolimits_{l} {\sum\nolimits_{i,\,j} {\log \left( {\left[ { D_{c}^{l} \left( x \right)} \right]_{i,\,j} } \right)} } } \right] \\ + {\mathbb{E}}_{{z{\sim}q\left( z \right)}} \left[ {\sum\nolimits_{l} {\sum\nolimits_{i,\,j} {\log \left( {1 - \left[ {D_{c}^{l} \left( {G_{c} \left( z \right)} \right)} \right]_{i,\,j} } \right)} } } \right] \\ \end{aligned} $$

(1)

where $ p\left( x \right) $ is the distribution of the original CT data, $ q\left( z \right) $ is the distribution of the original MRI data, $ \left[ { D_{c}^{l} \left( x \right)} \right]_{i,\,j} $ is the $ \left( {i,\,j} \right) $-th element in matrix $ D_{c}^{l} \left( x \right) $, and $ L $ is the number of layers connected with discriminator.

2.2 Segmentation GAN

Similarly, with the generated CT $ x^{{\prime }} $ from $ G_{c} \left( z \right) $, we can construct a segmentation GAN $ G_{s} \left( {z,\,x^{'} } \right) $, which learns to predict a bony structures segmentation $ y^{{\prime }} $. Then, the ground-truth $ y $ and the predicted segmentation $ y^{{\prime }} $ are forwarded to the discriminator $ D_{s} \left( y \right) $ to get an evaluation. Note that, different from the discriminator $ D_{c}^{l} $ in the simulation GAN, the discriminator $ D_{s} \left( y \right) $ is only designed for the feature map at the last layer of the pre-trained VGG-16 net. The adversarial game for segmentation is as follows:

$$ \mathop {\hbox{min} }\nolimits_{{G_{s} }} \,\mathop {\hbox{max} }\nolimits_{{D_{s} }} \,{\mathbb{E}}_{{y{\sim}p\left( y \right)}} \left[ {\log D_{s} \left( y \right)} \right] + {\mathbb{E}}_{{z,x^{{\prime }} {\sim}q\left( {z,x^{{\prime }} } \right)}} \left[ {\log \left( {1 - D_{s} \left( {G_{s} \left( {z,\,x^{{\prime }} } \right)} \right)} \right)} \right] $$

(2)

where $ {\text{p}}\left( y \right) $ is the distribution of ground-truth segmentation images, and $ q\left( {z,\,x^{{\prime }} } \right) $ is the joint distribution of the original MRI and the generated CT data. For the segmentation results, a voxel-wise loss is intuitively considered as follows:

$$ {\mathcal{L}}_{vox} = {\mathbb{E}}_{{z,\,x^{{\prime }} {\sim}q\left( {z,\,x^{{\prime }} } \right)}} \left\| {G_{s} \left( {z,\,x^{{\prime }} } \right) - y} \right\|^{2} $$

(3)

Moreover, we also consider a perceptual loss $ L_{percp}^{l} $ to encourage the consistence of features maps from generated segmentation and ground-truth segmentation. To this end, the pre-trained part of the discriminator is utilized to extract multi-layer feature maps from the generated segmentation and ground-truth segmentation. Taking $ \varphi_{l} \left( y \right) $ as the feature map of input y at the l-th layer of the feature extraction network, and $ N_{l} $ as the number of voxels in feature map $ \varphi_{l} \left( y \right) $, we can obtain the perceptual loss for the l-th layer as follows:

$$ {\mathcal{L}}_{percp}^{l} = {\mathbb{E}}_{{z,\,x^{{\prime }} {\sim}q\left( {z,\,x^{{\prime }} } \right)}} \left[ {\frac{1}{{N_{l} }}\left\| {\varphi_{l} \left( {G_{s} \left( {z,\,x^{{\prime }} } \right)} \right) - \varphi_{l} \left( y \right)} \right\|^{2} } \right] $$

(4)

In summary, the total loss function with respect to the generator is:

$$ \mathop {\hbox{min} }\nolimits_{{G_{s} }} \,{\mathbb{E}}_{{z,\,x^{{\prime }} {\sim}q\left( {z,\,x^{{\prime }} } \right)}} \left[ { - \log D_{s} \left( {G_{s} \left( {z,\,x^{{\prime }} } \right)} \right)} \right] + \lambda_{1} {\mathcal{L}}_{vox} + \lambda_{2} \sum\nolimits_{l = 1}^{L} {{\mathcal{L}}_{percp}^{l} } $$

(5)

where parameters $ \lambda_{1} $ and $ \lambda_{2} $ are utilized to balance the importance of the three loss functions.

3 Experimental Results

3.1 Dataset

The experiments were conducted on the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database [7]. It consists of 16 subjects with paired MRI and CT scans. The MRI scans were obtained by a Siemens Triotim scanner, with a voxel size of 1.2 × 1.2 × 1 mm³, TE 2.95 ms, TR 2300 ms, and flip angle 9. The CT scans were obtained from a Siemens Somatom scanner, with a voxel size of 0.59 × 0.59 × 3 mm³.

The preprocessing was conducted as follows. Both MRI and CT scans were resampled to size 152 × 184 × 149 with a voxel size of 1 × 1× 1 mm³. Each CT was aligned with its corresponding MRI. All intensities of MRI and CT were rescaled into [− 1, 1]. To be compatible with VGG-16 net, both MRI and CT data were cropped into patches of size 152 × 184 × 3 for training. The experiments were conducted on the 16 subjects in a leave-one-out cross validation. To measure the quality of the generated CT, we used the mean absolute error (MAE) and peak-signal-to-noise-ratio (PSNR). To measure the segmentation accuracy, we used Dice similarity coefficient (DSC). We adopted TensorFlow to implement the proposed framework. The network was trained using Adam with a learning rate of 1e−4 and a momentum of 0.9. In the experiments, we empirically set the parameters in the proposed method as: $ L = 4,\,\,\lambda_{1} = 1 $ and $ \lambda_{2} = 1 $.

3.2 Impact of Deep-Supervision Feature Maps

To evaluate the effectiveness of the deep-supervision strategy on the simulation GAN, we train the network with the discriminator in different layers of the pre-trained VGG-16 network. The results are shown in Fig. 2. It is obvious that the lower layer the discriminator is applied, the clearer the results will be. A quantitative comparison is shown in Table 1, indicating that, when the lower layer is connected with discriminator, the PSNR is bigger and the MAE is smaller.

Table 1. PSNR and MAE with different layer connected with the discriminator.

Full size table

To evaluate the effectiveness of the deep-supervision strategy on the segmentation GAN, we train the network with the perception reconstruction loss in different layers of the pre-trained VGG-16 network. As shown in Fig. 3, the results with higher layer connected with perceptual loss, $ {\mathcal{L}}_{percp}^{2} $ and $ {\mathcal{L}}_{percp}^{3} $, are more smooth and accurate in thin structures, as shown in the yellow rectangles. The DSC of different layer connected with perceptual loss is provided in Table 2, which again indicates that the deep-supervision perceptual loss enhances the performance greatly.

Table 2. DSC (%) of proposed Deep-supGAN with different layers connected with the perceptual loss.

Full size table

3.3 Impact of Generated CT

To evaluate the contribution of generated CT to the segmentation results, the segmentation results only with MRI as input (denoted as with MRI) is shown in Fig. 4. The segmentation result with both original MRI and generated CT as input (denoted as with MRI + CT) is more smooth and complete for thin structures, especially in the regions indicated by the yellow rectangles. The quantitative comparison in terms of DSC is shown in Table 3. It can be seen that the performance is significantly improved with the generated CT.

Table 3. DSC (%) of compared methods on 16 subjects using leave-one-out cross validation.

Full size table

3.4 Impact of Pre-trained VGG-16 Network

Here we compare the generated CT with two different training settings: (1) learning the discriminator from scratch (denoted as Scratch) and (2) utilizing a pre-trained VGG-16 network (denoted as VGG-16) for the discriminator. As shown in Fig. 5, the CT generated with pre-trained VGG-16 is much clearer and more realistic than that trained from scratch.

3.5 Comparison with State-of-the-Art Segmentation Methods

To illustrate the advantage of our method on bony structures segmentation, we also compared it with two widely-used deep learning methods, i.e., U-Net [3] based segmentation method and Generative Adversarial Network based semantic segmentation method [8] (denoted as GanSeg, a traditional GAN with the generator designed as segmentation network). Comparison results on a typical subject are shown in Fig. 4. It can be seen that both U-Net and GanSeg failed to accurately segment bony structures, as indicated by yellow rectangles. Compared with these two methods, our proposed method can achieve more accurate segmentation. The quantitative comparison in terms of DSC is shown in Table 3. It clearly demonstrates the advantage of our proposed method in terms of segmentation accuracy.

4 Conclusion

In this paper, we proposed a cascade GAN network, Deep-supGAN, to segment CMF bony structures from the combination of an original MRI and a generated CT image. A GAN with deep-supervision discriminator is designed to generate a CT image from an MRI. With the generated CT image, a GAN with deep-supervision perceptual loss is designed to perform bony structures segmentation using both original MRI and the generated CT image. The combination of MRI and CT image can provide complementary information about bony structures for the segmentation task. Comparisons with the state-of-the-art methods demonstrate the advantage of our proposed method in terms of segmentation accuracy.

References

Brenner, D.J., Hall, E.J.: Computed tomography-an increasing source of radiation exposure. N. Engl. J. Med. 357(22), 2277–2284 (2007)
Article Google Scholar
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on CVPR, pp. 3431–3440 (2015)
Google Scholar
Çiçek, Ö., Abdulkadir, A., Lienkamp, S.S., Brox, T., Ronneberger, O.: 3D U-Net: learning dense volumetric segmentation from sparse annotation. In: Ourselin, S., Joskowicz, L., Sabuncu, Mert R., Unal, G., Wells, W. (eds.) MICCAI 2016. LNCS, vol. 9901, pp. 424–432. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46723-8_49
Chapter Google Scholar
Nie, D., et al.: Segmentation of craniomaxillofacial bony structures from MRI with a 3D deep-learning based cascade framework. In: Wang, Q., Shi, Y., Suk, H.-I., Suzuki, K. (eds.) MLMI 2017. LNCS, vol. 10541, pp. 266–273. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-67389-9_31
Chapter Google Scholar
Goodfellow, I., Pouget-Abadie, J., Mirza, M.: Generative adversarial nets. In: Advances in Neural Information Processing Systems, pp. 2672–2680 (2014)
Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Trzepacz, P.T., Yu, P., Sun, J., et al.: Comparison of neuroimaging modalities for the prediction of conversion from mild cognitive impairment to Alzheimer’s dementia. Neurobiol. Aging 35(1), 143–151 (2014)
Article Google Scholar
Luc, P., Couprie, C., Chintala, S.: Semantic segmentation using adversarial networks. arXiv preprint arXiv:1611.08408 (2016)

Download references

Author information

Authors and Affiliations

Department of Radiology and BRIC, University of North Carolina at Chapel Hill, Chapel Hill, USA
Miaoyun Zhao, Li Wang, Jiawei Chen, Dong Nie, Sahar Ahmad & Dinggang Shen
Department of Electrical and Computer Engineering, Duke University, Durham, USA
Yulai Cong
Houston Methodist Hospital, Houston, TX, USA
Angela Ho, Peng Yuan, Steve H. Fung, Hannah H. Deng & James Xia

Authors

Miaoyun Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Li Wang
View author publications
You can also search for this author in PubMed Google Scholar
Jiawei Chen
View author publications
You can also search for this author in PubMed Google Scholar
Dong Nie
View author publications
You can also search for this author in PubMed Google Scholar
Yulai Cong
View author publications
You can also search for this author in PubMed Google Scholar
Sahar Ahmad
View author publications
You can also search for this author in PubMed Google Scholar
Angela Ho
View author publications
You can also search for this author in PubMed Google Scholar
Peng Yuan
View author publications
You can also search for this author in PubMed Google Scholar
Steve H. Fung
View author publications
You can also search for this author in PubMed Google Scholar
Hannah H. Deng
View author publications
You can also search for this author in PubMed Google Scholar
James Xia
View author publications
You can also search for this author in PubMed Google Scholar
Dinggang Shen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to James Xia or Dinggang Shen .

Editor information

Editors and Affiliations

University of Leeds, Leeds, UK
Alejandro F. Frangi
King’s College London, London, UK
Julia A. Schnabel
University of Pennsylvania, Philadelphia, PA, USA
Christos Davatzikos
Universidad de Valladolid, Valladolid, Spain
Carlos Alberola-López
Queen’s University, Kingston, ON, Canada
Gabor Fichtinger

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhao, M. et al. (2018). Craniomaxillofacial Bony Structures Segmentation from MRI with Deep-Supervision Adversarial Learning. In: Frangi, A., Schnabel, J., Davatzikos, C., Alberola-López, C., Fichtinger, G. (eds) Medical Image Computing and Computer Assisted Intervention – MICCAI 2018. MICCAI 2018. Lecture Notes in Computer Science(), vol 11073. Springer, Cham. https://doi.org/10.1007/978-3-030-00937-3_82

Download citation

DOI: https://doi.org/10.1007/978-3-030-00937-3_82
Published: 13 September 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-00936-6
Online ISBN: 978-3-030-00937-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Craniomaxillofacial Bony Structures Segmentation from MRI with Deep-Supervision Adversarial Learning

Abstract

Similar content being viewed by others

Segmenting Bones from Brain MRI via Generative Adversarial Learning

Segmentation of Craniomaxillofacial Bony Structures from MRI with a 3D Deep-Learning Based Cascade Framework

Deep Learning Techniques for 3D-Volumetric Segmentation of Biomedical Images

1 Introduction