Abstract
This paper presents a novel unsupervised segmentation method for the three-dimensional microstructure of lung cancer specimens in micro-computed tomography (micro-CT) images. Micro-CT scanning can nondestructively capture detailed histopathological components of resected lung cancer specimens. However, it is difficult to manually annotate cancer components on micro-CT images. Moreover, since most of the recent segmentation methods using deep neural networks have relied on supervised learning, it is also difficult to cope with unlabeled micro-CT images. In this paper, we propose an unsupervised segmentation method using a deep generative model. Our method consists of two phases. In the first phase, we train our model by iterating two steps: (1) inferring pairs of continuous and categorical latent variables of image patches randomly extracted from an unlabeled image and (2) reconstructing image patches from the inferred pairs of latent variables. In the second phase, our trained model estimates te probabilities of belonging to each category and assigns labels to patches from an entire image in order to obtain the segmented image. We apply our method to seven micro-CT images of resected lung cancer specimens. The original sizes of the micro-CT images were \(1024 \times 1024 \times (544{-}2185)\) voxels, and their resolutions were 25–30 \(\upmu \)m/voxel. Our aim was to automatically divide each image into three regions: invasive carcinoma, noninvasive carcinoma, and normal tissue. From quantitative evaluation, mean normalized mutual information scores of our results are 0.437. From qualitative evaluation, our segmentation results prove helpful for observing the anatomical extent of cancer components. Moreover, we visualize the degree of certainty of segmentation results by using values of categorical latent variables.
You have full access to this open access chapter, Download conference paper PDF
Similar content being viewed by others
1 Introduction
Recent research on deep generative models, such as variational auto-encoders (VAEs) [1] and generative adversarial networks (GANs) [2], have accelerated studies of unsupervised representation learning using neural networks. Meanwhile, however, most of the segmentation methods using neural networks still rely on supervised learning using manually labeled data. In this study, we present a deep generative model that can be used as a segmentation method by setting the number of categories in a dataset.
The proposed deep generative model has two major advantages. The first is that we can learn representations by considering the number of categories. In previous unsupervised segmentation methods using deep neural networks, the number of clusters had to be set to a value larger than the number of actual categories in a dataset [3, 4]. Therefore, learned features are not necessarily suitable for segmenting into the desired number of categories. Our deep generative models overcome this problem. The second advantage is that we can assign labels to images without the use of external clustering algorithms such as k-means [5]. Previous unsupervised segmentation methods have adopted different methods for the feature learning and for the segmentation respectively (e.g. [3]). In that case, the features learned by unsupervised methods are not necessarily suitable for a clustering algorithm used in segmentation. Using our deep generative models, we eliminate the need to choose an appropriate clustering algorithm.
In this work, we evaluate our method on the micro-computed tomography (micro-CT) images of lung cancer specimens. Micro-CT scanning of resected lung specimens can nondestructively capture detailed structures of lung cancer specimens. If the observation of cancer components in 3D became possible with micro-CT images, histopathological diagnosis using micro-CT images could be used to support the current histopathological diagnosis based on microscopic images [6]. However, due to the difficulty of manually annotating cancer components on micro-CT images, it is not easy to acquire a sufficiently large dataset for training supervised segmentation methods using deep neural networks. Therefore, micro-CT images of lung cancer specimens are suitable for applying our proposed unsupervised segmentation method.
2 Methods
2.1 Overview
The underlying idea of our method is that deep generative models that allow the inference of the posterior category probabilities are applicable to image segmentation. Our method consists of a training phase and a segmentation phase. In the training phase, our method iterates two steps: (1) inferring of pairs of continuous and categorical latent variables of image patches randomly extracted from unlabeled 3D images and (2) reconstructing image patches from the inferred pairs of latent variables. In the segmentation phase, our trained network assigns labels to patches from a target image.
2.2 Deep Generative Model with Categorical Latent Variables
The proposed deep generative model learns feature representation for reconstructing an input image with categorical latent variables representing posterior category probabilities. Our deep generative model consists of three networks: (1) the encoder, (2) the generator, and (3) the code discriminator. The training process of our deep generative model is illustrated in Fig. 1.
The encoder estimates both continuous latent variables \(\varvec{z} \in \mathbb {R}^{d}\) and categorical variables \(\varvec{y} \in \mathbb {R}^{k}\) from observed input \(\varvec{x}\). In the training phase, categorical variables are intended to play the role of controlling which categories of images the generator creates, while in the segmentation phase they become the posterior probabilities for determining segmentation labels. Note that the number of dimensions of the categorical variables is the same as the desired number of categories k. The generator reconstructs images based on both continuous latent variables \(\varvec{z}\) and categorical latent variables \(\varvec{y}\) of corresponding inputs.
The code discriminator distinguishes between the estimated latent variables and a certain prior distribution. It is used to impose latent variables on a more tractable distribution than the true distribution. In our method, a tractable prior of a continuous latent variable \(\varvec{z}\) is noise from the Gaussian distribution, and a prior of a categorical latent variable \(\varvec{y}\) is a one-hot vector from the uniform distribution. Unlike the previous code discriminators used in [7] and [8], our code discriminator copes with both continuous and categorical latent variables. To distinguish our code discriminator from the previous one, we call the new one a unified code discriminator. The architecture of the unified code discriminator is based on a projection discriminator [9], which distinguishes between real and generated images given both the image and the corresponding label. In the unified code discriminator, we take inner product between the embedded categorical latent variables and the intermediate features of continuous latent variables. There are two reason why we simultaneously cope with both latent variables in one network. One reason is that we can reduce parameters of our model. The other reason is to learn feature space where features belonging to different categories can be transformed smoothly and continuously each other. An explicit constraint of both latent variables is difficult, we adopt adversarial training.
2.3 Objective Function
Let \(q_{\varvec{\eta }}(\varvec{y}, \varvec{z}|\varvec{x})\), \(G_{\varvec{\theta }}(\varvec{y}, \varvec{z})\) and \(C_{\omega }(\varvec{y}, \varvec{z})\) denote the encoder, generator and unified code discriminator, respectively. Note that \(q_{\varvec{\eta }}(\varvec{y}, \varvec{z}|\varvec{x})\) is represented as an approximation of the true posterior \(p(\varvec{y}, \varvec{z}|\varvec{x})\). Moreover, let the distribution of the observed data be \(p(\varvec{x})\). The training of our the deep generative model is carried out to optimize the parameters of the encoder \(\varvec{\eta }\), the generator \(\varvec{\theta }\), and the unified code discriminator \(\varvec{\omega }\). An objective function \(\mathcal {L}\) of our deep generative model is defined as
where D is a divergence between \(q_{\varvec{\eta }}(\varvec{y}, \varvec{z}|\varvec{x})\) and \(p(\varvec{y}, \varvec{z})\), and \(\lambda \) is a hyperparameter for adjusting the \(L_{1}\)-norm reconstruction loss. While the VAE is not theoretically suitable for handing two type of latent variables, our objective function is similar to that of the Wasserstein auto-encoder (WAE) [10], that is, to minimize the optimal transport cost between the distributions of real images and generated images. Our deep generative model can be viewed as an extension of WAE designed to cope with categorical latent variables. In order to optimize the objective function, we alternately update \(\varvec{\theta }\), \(\varvec{\eta }\) and \(\varvec{\omega }\). As a training dataset, we extract N patches of \(w\times w \times w\) voxels from the unlabeled target image to which we apply segmentation.
2.4 Segmentation
After training the deep generative models, we conduct labeling to obtain a segmentation result using a trained encoder. We first extract a certain number of patches of \(w\times w \times w\) voxels from the target image separated by s voxels each and input them into the trained encoder. Given each patch, the trained encoder produces categorical latent variable \(\varvec{y}\), which represents the posterior category probability. We then choose the index \(h ~ (1 \le h \le k)\) of the maximum values of \(\varvec{y} = \left( y_{1}, \dots , y_{k}\right) ^{\text {T}}\), which is used as a segmentation label. We project each label onto a subpatch of \(s\times s \times s\) voxels centered at the corresponding patch in order to obtain a segmented image. Additionally, we can obtain probability maps of each cluster. The values of each dimension of the categorical latent variable \(y_{1}, \dots , y_{k}\) represent the probability of belonging to the corresponding cluster. Using the value of \(y_{h}\), we produce maps that visualize the probability of belonging to a certain cluster for each voxel.
3 Experiments
3.1 Datasets
We used seven specimens of resected lung cancer tissue scanned with a micro-CT scanner (inspeXio SMX-90CT Plus, Shimadzu Corporation, Kyoto, Japan) to evaluate our proposed method. The lung cancer specimens from different patients were scanned with similar isotropic resolutions in the range of 25–30 \(\upmu \)m. Each micro-CT volume consists of 544, 1083 and 1625 slices of \(1024\times 1024\) pixels. In order to eliminate background regions, which are not relevant to our task, and to reduce computational cost, we used cropped images. Detailed information for each image is shown in Table 1. We attempted to divide the images into three histopathological regions: (a) invasive carcinoma, (b) noninvasive carcinoma, and (c) normal tissue.
3.2 Parameter Settings
For training the deep generative model, we used 10,000 randomly extracted patches setting N to 10,000 and w to 32. We set the dimension of continuous and categorical latent variables d and k to 10 and 3, respectively. After the training, we extracted \(32 \times 32 \times 32\)-voxel patches from unlabeled micro-CT images, setting a stride s to 5. We input them into the trained encoder and then assigned labels based on the values of categorical latent variables to obtain a segmented image. We also obtained probability maps directly using the values of categorical latent variables.
3.3 Evaluations
For quantitative evaluation, we used normalized mutual information (NMI). The range of NMI is 0 to 1, where a larger NMI value means better segmentation results. We measured mean NMI of several manually annotated slices. We used five annotated slices for Lung-A, six for Lung-B, seven for Lung-C and Lung-D, one for Lung-E, and 10 for Lung-F and Lung-G. We compared the proposed method with k-means segmentation and the multithreshold Otsu method. Table 2 shows comparisons of NMIs. Our proposed methods accomplished higher NMIs for all specimens.
Figure 2 shows the segmentation results and probability maps of Lung-F. In the segmentation results, the colors red, green, and blue are assigned to invasive carcinoma, noninvasive carcinoma, and normal tissue, respectively, after obtaining segmentation results whose colors were initially assigned at random. In the probability maps, voxel values represent probability, and their range is 0 to 1. Here, the red parts represent higher probability and the dark blue parts represent lower probability. The segmentation results show that our method effectively divided images into the three histopathological regions. Moreover, the probability map of each histopathological region shows high probabilities in that region.
We also visually evaluated the segmentation results by 3D volume rendering. Figure 3 shows the volume rendering of Lung-F and the results of labeling the segmentation results of the proposed methods. We only overlaid labels of invasive carcinoma and noninvasive carcinoma to emphasize cancer regions. In the volume rendering, we can observe the anatomical extent of lung cancer. Concretely, regions labeled as invasive carcinoma are seen as solid components, while regions labeled as noninvasive carcinoma are seen as thick walls of alveoli.
4 Discussion
The segmentation results show that our method succeeded in learning suitable features for separating histopathological regions. Furthermore, they suggest that features learned by our models are suitable not only for image reconstruction but also for unsupervised segmentation. The probability maps, moreover, show that the result of our representation learning is close to the human sense of visual recognition. While the highest values of the probability maps of invasive carcinoma and normal tissue exceeded 0.9, those of noninvasive carcinoma were about 0.8. Generally, distinguishing noninvasive carcinoma from other regions on micro-CT images is more difficult for humans, even medical experts, than distinguishing invasive carcinoma from normal tissue. In our probability maps of each histopathological region, furthermore, the closer to the boundary of the region, the lower the probability becomes. These results suggest that regions with high probability mostly coincide with regions that are easy for humans to identify. Interestingly, our deep generative model mimics the human ability of visual recognition on micro-CT images without comparing image patches using an external clustering algorithm or using annotated labels.
5 Conclusion
We proposed a deep generative model specifically designed for unsupervised segmentation. Our method produced promising segmentation results for micro-CT images of lung cancer specimens. Moreover, the probability maps of the three histopathological regions functioned in a highly similar way to the capabilities of the human sense of visual recognition.
References
Kingma, D.P., Welling, M.: Auto-encoding variational bayes (2014)
Goodfellow, I., et al.: Generative adversarial nets. In: Advances in Neural Information Processing Systems, pp. 2672–2680 (2014)
Moriya, T., et al.: Unsupervised segmentation of 3D medical images based on clustering and deep representation learning. In: Medical Imaging 2018: Biomedical Applications in Molecular, Structural, and Functional Imaging, vol. 10578, p. 1057820. International Society for Optics and Photonics (2018)
Bulten, W., Litjens, G.: Unsupervised prostate cancer detection on H&E using convolutional adversarial autoencoders. In: Medical Imaging with Deep Learning (2018)
MacQueen, J., et al.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Oakland, CA, USA, vol. 1, pp. 281–297 (1967)
Nakamura, S., et al.: Micro-computed tomography of the lung: imaging of alveolar duct and alveolus in human lung. In: D55. Lab Methodology and Bioengineering: Just Do It, pp. A7411–A7411. American Thoracic Society (2016)
Makhzani, A., Shlens, J., Jaitly, N., Goodfellow, I.: Adversarial autoencoders. In: International Conference on Learning Representations (2016)
Rosca, M., Lakshminarayanan, B., Warde-Farley, D., Mohamed, S.: Variational approaches for auto-encoding generative adversarial networks. CoRR abs/1706.04987 (2017)
Miyato, T., Koyama, M.: cGANs with projection discriminator. In: International Conference on Learning Representations (2018)
Tolstikhin, I., Bousquet, O., Gelly, S., Schoelkopf, B.: Wasserstein auto-encoders. In: International Conference on Learning Representations (2018)
Acknowledgements
Parts of this work was supported by MEXT/JSPS KAKENHI (26108006, 17H00867, 17K20099, 26560255, 15H01116, 15K19933), the JSPS Bilateral Joint Research Project, AMED (19lk1010036h0001) and the Hori Sciences and Arts Foundation.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Moriya, T. et al. (2019). Unsupervised Segmentation of Micro-CT Images of Lung Cancer Specimen Using Deep Generative Models. In: Shen, D., et al. Medical Image Computing and Computer Assisted Intervention – MICCAI 2019. MICCAI 2019. Lecture Notes in Computer Science(), vol 11769. Springer, Cham. https://doi.org/10.1007/978-3-030-32226-7_27
Download citation
DOI: https://doi.org/10.1007/978-3-030-32226-7_27
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-32225-0
Online ISBN: 978-3-030-32226-7
eBook Packages: Computer ScienceComputer Science (R0)