Keywords

1 Introduction

The prostate is an important reproductive organ for men. A young man’s prostate is about the size of a walnut. It slowly grows larger with age. If it gets too large, this can cause problems, which is very common after age 50. The elder aging is more likely to prone prostate troubles. There are three major prostate diseases, including prostatitis, non-cancerous enlargement of the prostate (BPH), and prostate cancer. In particular, prostate cancer is the second leading cause of cancer death in American men. A sooner detection of prostate cancer can lead to higher probability of cure, such that it is important to utilize reliable computer-aided diagnosis solutions for prostate diseases. Prostate segmentation from MRI is an essential prerequisite for detecting prostate diseases, especially prostate cancer [1]. Prostate segmentation can be used to make radiotherapy plans to protect surrounding tissues and estimate volume of prostate [2]. There are some limitations to the traditional manual segmentation method: First, it relies on professional radiologists. In addition, the manual segmentation method is inefficient.

Although much progress has been made, there are still some challenges that have not yet been fully addressed, including the variation of scale and shape of prostates, boundary blur caused by lesions, which has led to a gap between clinical needs and automatic segmentation performance. In particular, the scale of prostate gland varies greatly in the same patient’s MRI volume, so the scale of the prostate gland presented in the slices at different locations of the MRI volume are very different, as shown in Fig. 1. This scale variation largely limits the accuracy of prostate segmentation. In this paper, we propose a cascaded scale normalization network to reduce the damage of scale variation to segmentation accuracy. The details of the model will be given in the third section.

Fig. 1.
figure 1

Example of prostate MR images exhibiting large variations

2 Related Works

Automatic segmentation of different type medical images can provide important information for diagnosis and treatment of diseases. With the rapid development of deep learning in recent years, medical image segmentation technology has also made breakthroughs. Convolutional neural networks have been shown to provide robust feature representation for tasks such as classification and segmentation. Many medical image segmentation algorithms based on deep neural networks have emerged [3,4,5,6,7]. Ronneberger et al. [3] proposed u-net for biomedical image segmentation, which introduces skip connections between down-sampling paths and up-sampling paths to increase the transmission of features based on fully convolutional networks (FCN). Li et al. [5] proposed a hybrid model for liver and tumor segmentation from CT volumes, which consists of a 2D DenseUNet for efficiently extracting intra-slice features and a 3D counterpart for hierarchically aggregating volumetric contexts.

Many automatic prostate segmentation methods have been proposed for MR images. Because the trend of neural networks has been rekindled in recent years, many computer vision research fields, including image segmentation, have shown dramatic performance improvement by using deep neural networks. We divide the automatic prostate segmentation method into non-deep learning methods and deep learning-based methods. Non-deep learning methods can be further divided into deformable model based methods [8,9,10], atlas based methods [11,12,13,14], and graph based methods [15,16,17]. Pasquier et al. [9] present a method using SSM in a Bayesian classification framework, obtaining contextual information and priori knowledge by using Markov fields. To eliminate the restriction of landmarks, Toth et al. [10] proposed a novel landmark-free AAM model to capture shape information by a level set. This method can ease the difficulty of setting landmarks in ASMs. Klein et al. [11] employed multi-atlas matching and localized mutual information for prostate in 3D MR images. Ou et al. [13] propose a “zooming process” for multi-atlas-based prostate segmentation which overcome many limitations of the datasets. Zouqi et al. [15] proposed a method for prostate segmentation from ultrasound images, which contains the advantages of both graph cuts and domain knowledge based Fuzzy Inference Systems. Ever since entering the era of deep learning, many deep neural network models have been proposed [18,19,20,21,22]. Guo et al. [18] proposed a stacked sparse auto-encoder model. Milletari et al. [19] utilized 3D convolutional neural network for prostate segmentation. Yu et al. [20] introduced the mixed residual connections into 3D convolutional neural network. Jia et al. [21] employed the ensemble technique for fine prostate segmentation.

3 Method

The goal of the prostate segmentation task is to obtain the segmentation result for each slice image of the input MR image volume V = {v1, v2, …, vn}. The proposed segmentation method consists of three stages. The first stage is called coarse segmentation. The MRI volume is divided into single slice images, and all slice images are fed into the first dense-unet segmentation network to obtain the preliminary segmentation results of each slice image. The second stage is called morphological-based segmentation result refinement, which recombines the slice images obtained by the first stage segmentation into volumes. It employs the prior knowledge and morphology methods to obtain refined segmentation results from each MRI volume. The third stage is called scale normalization segmentation. Based on to the refined coarse segmentation result, the part of the slice images containing prostate is resized for normalizing the scale of the prostate regions. The resized slice images are fed into the second dense-unet segmentation network, and the final scale normalization segmentation results are obtained. An overview of proposed method is shown in Fig. 2.

Fig. 2.
figure 2

An overview of the proposed model

3.1 Coarse Segmentation Network

The coarse segmentation stage and the scale normalization segmentation stage utilize the same dense-unet architecture proposed by Li et al. [23]. In order to more accurately segment prostates, this paper employs a dense-unet network model, as illustrated in Fig. 3. It can be seen as an extended version of densenet [24] architecture on u-net for semantic segmentation. The architecture follows the encoder-decoder fashion, which is composed of a down-sampling path and an up-sampling path. The down-sampling path extracts the semantic features of the input images, layer by layer through continuous convolution and down-sampling operations, from low-level to high-level. Up-sampling path expands the resolution of feature maps by deconvolution operation until the resolution of input images is completely restored. Both up-sampling path and down-sampling are composed of dense block and transition layer, which connects adjacent dense blocks. Their structure is shown in Fig. 4.

Fig. 3.
figure 3

Dense block and transition layer

Fig. 4.
figure 4

Dense block and transition layer

Both up-sampling path and down-sampling path are composed of 4 dense blocks and 4 transition layers. A dense block consists of n consecutive convolution layers of the same resolution, each followed by a batch normalization (BN) rectified linear units (ReLU) and dropout layer. The Lth convolution layer takes the feature maps of all the previous layers as input.

The MRI volume V = {v1, v2, …, vn} is split into individual slice images vi, all slice images are fed into the first dense-unet segmentation network to obtain the preliminary segmentation results Sc = {sc1, sc2, …, scn}.

3.2 Segmentation Result Refinement

The goal of this stage is to use the morphological method to refine the coarse segmentation results obtained in the previous stage based on prior knowledge. Human anatomy reveals that prostate is chestnut-shaped. Thus, in the MRI volume, a larger prostate area is present in the middle slice image, and the slice image near the sides of the MRI volume presents a smaller prostate area. Inspired by the above characteristics, we design the following refinement process:

  1. 1.

    First, the first stage coarse segmentation results are recombined into a volume Sc = {sc1, sc2, …, scn} according to the original arrangement order. Each segmentation result is a binary image, including foreground and background

  2. 2.

    Morphological closing operation is conducted separately on each sci. The morphological closing on an image is defined as a dilation followed by an erosion. Closing operations can remove small dark spots and connect small bright cracks. Then, each segmentation result only retain its largest connected area, and the remaining pixel values are set to background. After this step, the segmentation result volume is referred as So = {so1, so2, …, son}.

  3. 3.

    The image som with the largest connected area is selected from So and used as a seed slice for the entire volume. Generating a bounding box Bm for som, which contains the entire largest connected area of the som with a small margin. Starting from the slice som, forward refinement (\( {\text{m}} \to {\text{m}} + 1 \)) and backward refinement (\( {\text{m}} \to {\text{m}} - 1 \)) are performed by sliding. Taking forward refinement as an example, in order to refine som−1, the region of som−1 is selected, which corresponds to the region of the bounding box Bm of som. Only this region in the som−1 is retained, and the rest region of som−1 is set as background. Then the bounding box Bm−1 of som−1 is generated. Next, the segmentation result som−2 can be refined using Bm−1. The backward refinement is the same as its forward counterpart. Through the above refining process, the refined segmentation results are obtained Sr = {sr1, sr2, …, srn}. The above process is illustrated in Fig. 5

    Fig. 5.
    figure 5

    Forward and backward refinement

  4. 4.

    Finally, the segmentation results which only contain background are eliminated from the volume, and the final segmentation refinement results R = {rs, rs+1, …, re-1, re} (e–s +1 <= n) are obtained as the input of the third stage.

3.3 Scale Normalization Segmentation

After the coarse segmentation and refinement process, we can get the refined segmentation results R = {rs, rs+1,…, re-1, re}. Guided by the bounding box of each Ri, the foreground regions of the corresponding MRI slices are cropped and resized to the same scale (256 × 256, in experiments of this paper) to get the scale normalized input. Then, these scale normalized MR slices image are fed into the scale normalization segmentation network as proposed above to get final segmentation results.

3.4 Training

The coarse segmentation network and the scale normalization segmentation network are trained separately in the same way. Both networks were trained using Adam solver with a mini-batch size of 8 due to the limited capacity of GPU memory. Both models were trained for 500 epochs. The learning rate was set as 0.001 initially and is divided by 10 when training 50% of the total training epoch number.

4 Experiments

4.1 Dataset and Pre-processing

We validate our method on the MICCAI Prostate MR Image Segmentation (PROMISE12) challenge dataset [25]. This open dataset contains 50 training cases which include a transversal T2-weighted MR images of prostate and the corresponding expert-annotated segmentation results. We use 5-flod cross validation, which randomly divided the data set into 5 folds, each with 10 volumes, using a fold for testing, and the rest for training.

The data of PROMISE12 dataset is multi-center, multi-vendor and collected from clinical setting, causing large differences in slice thickness, with/without endorectal coil, dynamic range, voxel size, field of view, and position, all of which have directly effecting to the performance of the model.

In data processing, the first step is to sort the pixel values of each slice in order to remove noise. Then, set an pixel interval [min, max] to clip pixels, min is lower boundary of smallest 2% of the pixels in the slice, while max is upper boundary of biggest 2% of the pixels. Pixel values outside the interval are clipped to the interval boundaries. Finally, all the slices are resized to 256*256, and their pixel values are mapped to 0–255.

4.2 Implementation

The proposed model was implemented based on Tensorflow. All the experiments were performed on server equipped with CPU, memory, and NVIDIA TITAN XP GPU.

4.3 Quantitative Analysis

In order to evaluate the performance of our model, we used dice score as the metrics to measure the overall region similarity of the segmentation results, and it is the same as the other prostate segmentation methods. The dice score is calculated via Eq. 1, where X denotes the volumetric ground truth, and Y the volumetric predicted value.

$$ {\text{dice}} = \frac{{2\left| {X \cap Y} \right|}}{\left| X \right| + \left| Y \right|} $$
(1)

Some quantitative results of the proposed method are shown in Table 1 through cross validation. Some segmentation results of our model are shown in Fig. 5. It is observed that the full model has achieved more accurate segmentation results than a single segmentation network. This is because we introduce the refinement process that can eliminate error segmentation. As shown in Fig. 6, Unreasonable foreground region is eliminated via our segmentation result refinement. On the other hand, the refinement process also provides scale normalized input for the second segmentation network. This makes the second network do not need to consider how to solve the extreme scale variation between slice images. Some segmentation results of our model are shown in Fig. 7.

Table 1. Quantitative results
Fig. 6.
figure 6

An example of elimination of error segmentation

Fig. 7.
figure 7

The results of the experiments

5 Conclusion

In this paper, we propose a cascaded prostate segmentation model to solve scale variation for automatic prostate segmentation from MR images. Through refining coarse segmentation results, the proposed method greatly alleviates the problem of scale variation in prostate segmentation task. Future investigations to be conducted include to integrating the framework into an end-to-end model for further improvement of performance.