Unsupervised Segmentation of Micro-CT Images of Lung Cancer Specimen Using Deep Generative Models

Takayasu Moriya¹⁶,
Hirohisa Oda¹⁶,
Midori Mitarai¹⁶,
Shota Nakamura¹⁷,
Holger R. Roth¹⁶,
Masahiro Oda¹⁶ &
…
Kensaku Mori^16,18,19

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11769))

Included in the following conference series:

International Conference on Medical Image Computing and Computer-Assisted Intervention

10k Accesses
7 Citations

Abstract

This paper presents a novel unsupervised segmentation method for the three-dimensional microstructure of lung cancer specimens in micro-computed tomography (micro-CT) images. Micro-CT scanning can nondestructively capture detailed histopathological components of resected lung cancer specimens. However, it is difficult to manually annotate cancer components on micro-CT images. Moreover, since most of the recent segmentation methods using deep neural networks have relied on supervised learning, it is also difficult to cope with unlabeled micro-CT images. In this paper, we propose an unsupervised segmentation method using a deep generative model. Our method consists of two phases. In the first phase, we train our model by iterating two steps: (1) inferring pairs of continuous and categorical latent variables of image patches randomly extracted from an unlabeled image and (2) reconstructing image patches from the inferred pairs of latent variables. In the second phase, our trained model estimates te probabilities of belonging to each category and assigns labels to patches from an entire image in order to obtain the segmented image. We apply our method to seven micro-CT images of resected lung cancer specimens. The original sizes of the micro-CT images were $1024 \times 1024 \times (544{-}2185)$ voxels, and their resolutions were 25–30 $\upmu $m/voxel. Our aim was to automatically divide each image into three regions: invasive carcinoma, noninvasive carcinoma, and normal tissue. From quantitative evaluation, mean normalized mutual information scores of our results are 0.437. From qualitative evaluation, our segmentation results prove helpful for observing the anatomical extent of cancer components. Moreover, we visualize the degree of certainty of segmentation results by using values of categorical latent variables.

You have full access to this open access chapter, Download conference paper PDF

Semi-supervised COVID-19 CT image segmentation using deep generative models

Article Open access 17 August 2022

Training of head and neck segmentation networks with shape prior on small datasets

Article 17 June 2020

Effects of sample size and data augmentation on U-Net-based automatic segmentation of various organs

Article 12 July 2021

1 Introduction

Recent research on deep generative models, such as variational auto-encoders (VAEs) [1] and generative adversarial networks (GANs) [2], have accelerated studies of unsupervised representation learning using neural networks. Meanwhile, however, most of the segmentation methods using neural networks still rely on supervised learning using manually labeled data. In this study, we present a deep generative model that can be used as a segmentation method by setting the number of categories in a dataset.

The proposed deep generative model has two major advantages. The first is that we can learn representations by considering the number of categories. In previous unsupervised segmentation methods using deep neural networks, the number of clusters had to be set to a value larger than the number of actual categories in a dataset [3, 4]. Therefore, learned features are not necessarily suitable for segmenting into the desired number of categories. Our deep generative models overcome this problem. The second advantage is that we can assign labels to images without the use of external clustering algorithms such as k-means [5]. Previous unsupervised segmentation methods have adopted different methods for the feature learning and for the segmentation respectively (e.g. [3]). In that case, the features learned by unsupervised methods are not necessarily suitable for a clustering algorithm used in segmentation. Using our deep generative models, we eliminate the need to choose an appropriate clustering algorithm.

In this work, we evaluate our method on the micro-computed tomography (micro-CT) images of lung cancer specimens. Micro-CT scanning of resected lung specimens can nondestructively capture detailed structures of lung cancer specimens. If the observation of cancer components in 3D became possible with micro-CT images, histopathological diagnosis using micro-CT images could be used to support the current histopathological diagnosis based on microscopic images [6]. However, due to the difficulty of manually annotating cancer components on micro-CT images, it is not easy to acquire a sufficiently large dataset for training supervised segmentation methods using deep neural networks. Therefore, micro-CT images of lung cancer specimens are suitable for applying our proposed unsupervised segmentation method.

2 Methods

2.1 Overview

The underlying idea of our method is that deep generative models that allow the inference of the posterior category probabilities are applicable to image segmentation. Our method consists of a training phase and a segmentation phase. In the training phase, our method iterates two steps: (1) inferring of pairs of continuous and categorical latent variables of image patches randomly extracted from unlabeled 3D images and (2) reconstructing image patches from the inferred pairs of latent variables. In the segmentation phase, our trained network assigns labels to patches from a target image.

2.2 Deep Generative Model with Categorical Latent Variables

The proposed deep generative model learns feature representation for reconstructing an input image with categorical latent variables representing posterior category probabilities. Our deep generative model consists of three networks: (1) the encoder, (2) the generator, and (3) the code discriminator. The training process of our deep generative model is illustrated in Fig. 1.

The encoder estimates both continuous latent variables $\varvec{z} \in \mathbb {R}^{d}$ and categorical variables $\varvec{y} \in \mathbb {R}^{k}$ from observed input $\varvec{x}$. In the training phase, categorical variables are intended to play the role of controlling which categories of images the generator creates, while in the segmentation phase they become the posterior probabilities for determining segmentation labels. Note that the number of dimensions of the categorical variables is the same as the desired number of categories k. The generator reconstructs images based on both continuous latent variables $\varvec{z}$ and categorical latent variables $\varvec{y}$ of corresponding inputs.

The code discriminator distinguishes between the estimated latent variables and a certain prior distribution. It is used to impose latent variables on a more tractable distribution than the true distribution. In our method, a tractable prior of a continuous latent variable $\varvec{z}$ is noise from the Gaussian distribution, and a prior of a categorical latent variable $\varvec{y}$ is a one-hot vector from the uniform distribution. Unlike the previous code discriminators used in [7] and [8], our code discriminator copes with both continuous and categorical latent variables. To distinguish our code discriminator from the previous one, we call the new one a unified code discriminator. The architecture of the unified code discriminator is based on a projection discriminator [9], which distinguishes between real and generated images given both the image and the corresponding label. In the unified code discriminator, we take inner product between the embedded categorical latent variables and the intermediate features of continuous latent variables. There are two reason why we simultaneously cope with both latent variables in one network. One reason is that we can reduce parameters of our model. The other reason is to learn feature space where features belonging to different categories can be transformed smoothly and continuously each other. An explicit constraint of both latent variables is difficult, we adopt adversarial training.

2.3 Objective Function

Let $q_{\varvec{\eta }}(\varvec{y}, \varvec{z}|\varvec{x})$, $G_{\varvec{\theta }}(\varvec{y}, \varvec{z})$ and $C_{\omega }(\varvec{y}, \varvec{z})$ denote the encoder, generator and unified code discriminator, respectively. Note that $q_{\varvec{\eta }}(\varvec{y}, \varvec{z}|\varvec{x})$ is represented as an approximation of the true posterior $p(\varvec{y}, \varvec{z}|\varvec{x})$. Moreover, let the distribution of the observed data be $p(\varvec{x})$. The training of our the deep generative model is carried out to optimize the parameters of the encoder $\varvec{\eta }$, the generator $\varvec{\theta }$, and the unified code discriminator $\varvec{\omega }$. An objective function $\mathcal {L}$ of our deep generative model is defined as

$$\begin{aligned} \begin{aligned} \mathcal {L}(\varvec{\theta }, \varvec{\eta }, \varvec{\omega })&= \mathbb {E}_{p(\varvec{x})} \left[ \mathbb {E}_{q_{\varvec{\eta }}(\varvec{y}, \varvec{z}|\varvec{x})} \left[ \lambda \Vert \varvec{x} - {G}_{\varvec{\theta }}(\varvec{y}, \varvec{z})\Vert _{1} \right] \right. \\&\left. + D(q_{\varvec{\eta }}(\varvec{y}, \varvec{z})||p(\varvec{y}, \varvec{z})) \right] , \end{aligned} \end{aligned}$$

(1)

where D is a divergence between $q_{\varvec{\eta }}(\varvec{y}, \varvec{z}|\varvec{x})$ and $p(\varvec{y}, \varvec{z})$, and $\lambda $ is a hyperparameter for adjusting the $L_{1}$-norm reconstruction loss. While the VAE is not theoretically suitable for handing two type of latent variables, our objective function is similar to that of the Wasserstein auto-encoder (WAE) [10], that is, to minimize the optimal transport cost between the distributions of real images and generated images. Our deep generative model can be viewed as an extension of WAE designed to cope with categorical latent variables. In order to optimize the objective function, we alternately update $\varvec{\theta }$, $\varvec{\eta }$ and $\varvec{\omega }$. As a training dataset, we extract N patches of $w\times w \times w$ voxels from the unlabeled target image to which we apply segmentation.

2.4 Segmentation

After training the deep generative models, we conduct labeling to obtain a segmentation result using a trained encoder. We first extract a certain number of patches of $w\times w \times w$ voxels from the target image separated by s voxels each and input them into the trained encoder. Given each patch, the trained encoder produces categorical latent variable $\varvec{y}$, which represents the posterior category probability. We then choose the index $h ~ (1 \le h \le k)$ of the maximum values of $\varvec{y} = \left( y_{1}, \dots , y_{k}\right) ^{\text {T}}$, which is used as a segmentation label. We project each label onto a subpatch of $s\times s \times s$ voxels centered at the corresponding patch in order to obtain a segmented image. Additionally, we can obtain probability maps of each cluster. The values of each dimension of the categorical latent variable $y_{1}, \dots , y_{k}$ represent the probability of belonging to the corresponding cluster. Using the value of $y_{h}$, we produce maps that visualize the probability of belonging to a certain cluster for each voxel.

3 Experiments

3.1 Datasets

We used seven specimens of resected lung cancer tissue scanned with a micro-CT scanner (inspeXio SMX-90CT Plus, Shimadzu Corporation, Kyoto, Japan) to evaluate our proposed method. The lung cancer specimens from different patients were scanned with similar isotropic resolutions in the range of 25–30 $\upmu $m. Each micro-CT volume consists of 544, 1083 and 1625 slices of $1024\times 1024$ pixels. In order to eliminate background regions, which are not relevant to our task, and to reduce computational cost, we used cropped images. Detailed information for each image is shown in Table 1. We attempted to divide the images into three histopathological regions: (a) invasive carcinoma, (b) noninvasive carcinoma, and (c) normal tissue.

Table 1. Sizes and resolutions of images used in our experiments.

Full size table

3.2 Parameter Settings

For training the deep generative model, we used 10,000 randomly extracted patches setting N to 10,000 and w to 32. We set the dimension of continuous and categorical latent variables d and k to 10 and 3, respectively. After the training, we extracted $32 \times 32 \times 32$-voxel patches from unlabeled micro-CT images, setting a stride s to 5. We input them into the trained encoder and then assigned labels based on the values of categorical latent variables to obtain a segmented image. We also obtained probability maps directly using the values of categorical latent variables.

3.3 Evaluations

For quantitative evaluation, we used normalized mutual information (NMI). The range of NMI is 0 to 1, where a larger NMI value means better segmentation results. We measured mean NMI of several manually annotated slices. We used five annotated slices for Lung-A, six for Lung-B, seven for Lung-C and Lung-D, one for Lung-E, and 10 for Lung-F and Lung-G. We compared the proposed method with k-means segmentation and the multithreshold Otsu method. Table 2 shows comparisons of NMIs. Our proposed methods accomplished higher NMIs for all specimens.

Table 2. Comparison of mean NMIs.

Full size table

Figure 2 shows the segmentation results and probability maps of Lung-F. In the segmentation results, the colors red, green, and blue are assigned to invasive carcinoma, noninvasive carcinoma, and normal tissue, respectively, after obtaining segmentation results whose colors were initially assigned at random. In the probability maps, voxel values represent probability, and their range is 0 to 1. Here, the red parts represent higher probability and the dark blue parts represent lower probability. The segmentation results show that our method effectively divided images into the three histopathological regions. Moreover, the probability map of each histopathological region shows high probabilities in that region.

We also visually evaluated the segmentation results by 3D volume rendering. Figure 3 shows the volume rendering of Lung-F and the results of labeling the segmentation results of the proposed methods. We only overlaid labels of invasive carcinoma and noninvasive carcinoma to emphasize cancer regions. In the volume rendering, we can observe the anatomical extent of lung cancer. Concretely, regions labeled as invasive carcinoma are seen as solid components, while regions labeled as noninvasive carcinoma are seen as thick walls of alveoli.

4 Discussion

The segmentation results show that our method succeeded in learning suitable features for separating histopathological regions. Furthermore, they suggest that features learned by our models are suitable not only for image reconstruction but also for unsupervised segmentation. The probability maps, moreover, show that the result of our representation learning is close to the human sense of visual recognition. While the highest values of the probability maps of invasive carcinoma and normal tissue exceeded 0.9, those of noninvasive carcinoma were about 0.8. Generally, distinguishing noninvasive carcinoma from other regions on micro-CT images is more difficult for humans, even medical experts, than distinguishing invasive carcinoma from normal tissue. In our probability maps of each histopathological region, furthermore, the closer to the boundary of the region, the lower the probability becomes. These results suggest that regions with high probability mostly coincide with regions that are easy for humans to identify. Interestingly, our deep generative model mimics the human ability of visual recognition on micro-CT images without comparing image patches using an external clustering algorithm or using annotated labels.

5 Conclusion

We proposed a deep generative model specifically designed for unsupervised segmentation. Our method produced promising segmentation results for micro-CT images of lung cancer specimens. Moreover, the probability maps of the three histopathological regions functioned in a highly similar way to the capabilities of the human sense of visual recognition.

References

Kingma, D.P., Welling, M.: Auto-encoding variational bayes (2014)
Google Scholar
Goodfellow, I., et al.: Generative adversarial nets. In: Advances in Neural Information Processing Systems, pp. 2672–2680 (2014)
Google Scholar
Moriya, T., et al.: Unsupervised segmentation of 3D medical images based on clustering and deep representation learning. In: Medical Imaging 2018: Biomedical Applications in Molecular, Structural, and Functional Imaging, vol. 10578, p. 1057820. International Society for Optics and Photonics (2018)
Google Scholar
Bulten, W., Litjens, G.: Unsupervised prostate cancer detection on H&E using convolutional adversarial autoencoders. In: Medical Imaging with Deep Learning (2018)
Google Scholar
MacQueen, J., et al.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Oakland, CA, USA, vol. 1, pp. 281–297 (1967)
Google Scholar
Nakamura, S., et al.: Micro-computed tomography of the lung: imaging of alveolar duct and alveolus in human lung. In: D55. Lab Methodology and Bioengineering: Just Do It, pp. A7411–A7411. American Thoracic Society (2016)
Google Scholar
Makhzani, A., Shlens, J., Jaitly, N., Goodfellow, I.: Adversarial autoencoders. In: International Conference on Learning Representations (2016)
Google Scholar
Rosca, M., Lakshminarayanan, B., Warde-Farley, D., Mohamed, S.: Variational approaches for auto-encoding generative adversarial networks. CoRR abs/1706.04987 (2017)
Google Scholar
Miyato, T., Koyama, M.: cGANs with projection discriminator. In: International Conference on Learning Representations (2018)
Google Scholar
Tolstikhin, I., Bousquet, O., Gelly, S., Schoelkopf, B.: Wasserstein auto-encoders. In: International Conference on Learning Representations (2018)
Google Scholar

Download references

Acknowledgements

Parts of this work was supported by MEXT/JSPS KAKENHI (26108006, 17H00867, 17K20099, 26560255, 15H01116, 15K19933), the JSPS Bilateral Joint Research Project, AMED (19lk1010036h0001) and the Hori Sciences and Arts Foundation.

Author information

Authors and Affiliations

Graduate School of Informatics, Nagoya University, Nagoya, Japan
Takayasu Moriya, Hirohisa Oda, Midori Mitarai, Holger R. Roth, Masahiro Oda & Kensaku Mori
Nagoya University Graduate School of Medicine, Nagoya, Japan
Shota Nakamura
Information Technology Center, Nagoya University, Nagoya, Japan
Kensaku Mori
Research Center for Medical Bigdata, National Institute of Informatics, Tokyo, Japan
Kensaku Mori

Authors

Takayasu Moriya
View author publications
You can also search for this author in PubMed Google Scholar
Hirohisa Oda
View author publications
You can also search for this author in PubMed Google Scholar
Midori Mitarai
View author publications
You can also search for this author in PubMed Google Scholar
Shota Nakamura
View author publications
You can also search for this author in PubMed Google Scholar
Holger R. Roth
View author publications
You can also search for this author in PubMed Google Scholar
Masahiro Oda
View author publications
You can also search for this author in PubMed Google Scholar
Kensaku Mori
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kensaku Mori .

Editor information

Editors and Affiliations

University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
Dinggang Shen
University of Georgia, Athens, GA, USA
Tianming Liu
Western University, London, ON, Canada
Terry M. Peters
Yale University, New Haven, CT, USA
Lawrence H. Staib
University of Strasbourg, Illkirch, France
Caroline Essert
United Imaging Intelligence, Shanghai, China
Sean Zhou
University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
Pew-Thian Yap
Western University, London, ON, Canada
Ali Khan

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Moriya, T. et al. (2019). Unsupervised Segmentation of Micro-CT Images of Lung Cancer Specimen Using Deep Generative Models. In: Shen, D., et al. Medical Image Computing and Computer Assisted Intervention – MICCAI 2019. MICCAI 2019. Lecture Notes in Computer Science(), vol 11769. Springer, Cham. https://doi.org/10.1007/978-3-030-32226-7_27

Download citation

DOI: https://doi.org/10.1007/978-3-030-32226-7_27
Published: 10 October 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-32225-0
Online ISBN: 978-3-030-32226-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The Medical Image Computing and Computer Assisted Intervention Society (opens in a new tab)