Multi-label semantic segmentation of magnetic resonance images of the prostate gland

Mark Locherer¹,
Christopher Bonenberger¹,
Wolfgang Ertel¹,
Boris Hadaschik²,
Kristina Stumm²,
Markus Schneider¹^na1 &
…
Jan Philipp Radtke^2,3,4^na1

370 Accesses
Explore all metrics

Abstract

Prostate segmentation is a substantial factor in the diagnostic pathway of suspicious prostate lesions. Medical doctors are assisted by computer-aided detection and diagnosis systems systems and methods derived from artificial intelligence deep learning-based systems have to be trained on existing data. Especially freely available labeled prostate magnetic resonance imaging data is rare. With regard to this problem, we show a method to combine two existing small datasets to form a bigger one. We present a data processing pipeline consisting of a cascaded network architecture that is able to perform multi-label semantic segmentation, hence, capable of classifying each pixel in a \(\textrm{T}_{2}\)-weighted image not only to one class but to a subset of the prostate zones and classes of interest. This delivers richer information such as overlapping zones that are key to medical radiological examination. Additionally, we describe how to integrate expert knowledge in our deep learning system. To increase data variety for training and evaluation we use image augmentation on our two datasets—a freely available dataset and our new open-source dataset. To combine the datasets we denoise the contourings in our dataset by using an effective yet simple algorithm based on standard computer vision methods only. The performance of the presented methodology is compared and evaluated using the dice score metric and 5-fold cross-validation on all datasets. Although we trained on tiny datasets our method achieves excellent segmentation quality and is even able to detect prostate cancer. Our method to combine the two datasets reduces segmentation errors and increases data variety. The proposed architecture significantly improves performance by including expert knowledge via feature-map concatenation. On the initiative for collaborative computer vision benchmarking dataset we achieve on average dice scores of approximately 91% for the whole prostate gland, 67% for the peripheral zone and 75% for the prostate central gland. We find that image augmentation except contrast limited adaptive histogram equalisation did not have much influence on the segmentation quality. Derived and enhanced from existing methods we present an approach that is able to deliver multi-label semantic segmentation results for prostate magnetic resonance imaging. This approach is simple and could be applied to other applications of deep learning as well. It improves the segmentation results by a large margin. Once tweaked to the data, our denoising and combination algorithm delivers robust and accurate results even on data with segmentation errors.

Label-set impact on deep learning-based prostate segmentation on MRI

Article Open access 25 September 2023

Automatic prostate and prostate zones segmentation of magnetic resonance images using DenseNet-like U-net

Article Open access 31 August 2020

Algorithms for classification of sequences and segmentation of prostate gland: an external validation study

Article 04 March 2024

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

With the emergence of artificial intelligence (AI) methodologies in medical image processing, especially from 2015 onward, machine learning models have become increasingly capable of supporting medical personnel not only in their daily work to recognize and categorize tissue and body parts according to specific needs such as semantic segmentation and gleason score (GS) prediction for the prostate, but also for training of less experienced radiologists for educational purposes that study prostate anatomy, cancer aggressiveness and its characteristic features [1]. Current machine learning (ML)-based methods [2,3,4,5,6,7,8,9,10,11,12,13,14,15] for fully automated segmentation of the prostate achieve dice score (DSC) values as accurate as a radiologist of about 92% for the whole prostate gland [2]. However, the results are hard to compare due to the usage of different performance metrics such as DSC (individual slices or 3D-volumes) and different datasets or training- and augmentation techniques. For example, [3] used 19 \(\textrm{T}_{2}\)-weighted image (\(\textrm{T}_{2}\)WI) exams while [2] used data from 188 patients of the PROSTATEx challenge [16].

Prostate segmentation on magnetic resonance imaging (MRI) scans can be done manually or semi-automatically utilizing computer vision tools. This labor-intensive process has led to efforts to fully automate it using AI algorithms. Manual segmentation, typically performed slice-by-slice on axial \(\textrm{T}_{2}\)WI, is the common approach but is time-consuming and is affected by the radiologist’s segmentation ability [17,18,19].

In this study, we introduce:

1.
A new fully automated cascaded network configuration called Casc-UNetS based on a parameter-reduced version of UNet (convolutional neural network (CNN) for biomedical image segmentation) [20] and \(\textrm{T}_{2}\)WI only that is inspired by existing methods [6, 8, 9] and computer vision algorithms (i.e., image moments, contours, contrast limited adaptive histogram equalisation (CLAHE), etc.) that is capable of multi-label semantic segmentation as shown in Fig. 1. Trained on two tiny datasets, the first stage contours the WG. The second stage segments the PZ and CG and the third stage outlines PCa which most frequently appears in the PZ [21] as shown in Figs. 1 and 2. To incorporate expert knowledge (e.g. the prostate WG and the other associated regions) the model uses a filtering layer and a-priori feature-map concatenations between subsequent stages to focus while training to the individual region of interest (ROI).
2.
A new multiparametric MRI (mp-MRI) prostate dataset called UDE-prostate (UDEp) whose raw data was collected and segmented by our partner at the university Duisburg-Essen. It contains exams in the DICOM^{Footnote 1} format [22] and segmentation labels in the MITK^{Footnote 2} format [23] for apparent diffusion coeffient (ADC), \(\textrm{T}_{2}\)WI and dynamic contrast enhanced (DCE) with 37, 29 and 30 exams respectively. For 22 of the exams all the aforementioned modalities are available.
3.
In order to provide our network with a larger dataset we combine our data with the freely available I2CVB^{Footnote 3} dataset with only 29 and 19 exams and produce a third one with 48 exams. Initially both datasets appear to be incompatible since the I2CVB contains the CG class which is missing in the UDEp dataset. Therefore, we present a sophisticated denoising and approximation method to automatically generate this class for the UDEp dataset which produces good CG contours even on imprecise ground truth labelling. Based on our knowledge, this is the first research to combine two small prostate datasets.
4.
Casc-UNetS is (statistically) evaluated using the DSC on the three datasets with different configurations (e.g. image augmentation, filtering layers, etc.) to find out how the model changes due to additional data and training variations using the DSC for the prostate regions base, midgland, apex as well as the complete prostate.
5.
We open-source the UDEp dataset and Casc-UNetS.

2 Related works

Comprehensive meta analysis on semantic segmentation of the prostate and PCa detection are given in [1, 4, 24]. While [4] focuses on algorithms drawn from standard ML-techniques for computer-aided detection and diagnosis (CAD), [24] review ML- and DL-approaches on MRI and [1] outlines also both but with the main perspective on DL-based solutions. Semantic segmentation of the prostate and PCa detection can be divided two general pillars. The first one is the imaging technique which is most frequently either ultrasound (US) or MRI with its different modalities (e.g. ADC, DCE, diffusion weighted imaging (DWI), magnetic resonance spectroscopy imaging (MRSI), \(\textrm{T}_{2}\)WI, etc.). While mono parametric approaches only use a single modality, biparametric MRI (bp-MRI) and mp-MRI uses multiple MRI techniques (e.g. ADC, DCE and \(\textrm{T}_{2}\)WI). Hence, a dataset has to be at hand that matches the foreseen approach. The second pillar is the ML-method which ranges from standard algorithms such as k-means-clustering (k-means), support vector machine (SVM), ensemble classifier (EC), multi layer perceptron (MLP) and many others to DL-based CNNs with modifications that originate in transformers [25]. The CNNs used in [2, 3, 6,7,8, 26] are encoder-decoder networks, some with skip connections between the two and residual connections in the encoder. After that the standard ML-pipeline for validation, performance evaluation and visualization follows.

Alkadi et al. [3, 9] use the VGG16 architecture and propose a 2.5D method realized by a 2D multi channel input to incorporate volumetric information for the semantic segmentation of the prostate into the four different classes PCa, non-prostate, PZ and CG. The authors showed that their pseudo-volume method is superior to a single slice input in the I2CVB dataset. Zhu et al. [6] propose a method to segment the prostate into the WG and the PZ. Their dataset holds bp-MRI consisting of 163 exams on which they evaluate a three-stage method. Firstly, they select the WG as the ROI by utilizing k-means and other methods on a DWI slice to obtain an approximate segmentation of the prostate. This segmentation is then transferred to the corresponding \(\textrm{T}_{2}\)WI to preselect the prostate area by cropping it which serves as an input to the first UNet that extracts the WG. Thirdly, the WG is fed into the second UNet to identify the PZ region. Aldoj et al. [2] used a CNN that is inspired by DenseNet and UNet to automatically segment the prostate into WG, the CZ and the PZ and use data from 188 patients from the ProstateX challenge with in-house segmented masks. The authors found an increase in performance which they claim results from the UNet architecture in combination with dense blocks. Nai et al. [7] compared and evaluated the usage of mono- and multimodal algorithms for the segmentation of mp-MRI data of 160 patients. While their multimodal network created better boundary segmentations of the CG and PZ, the WG segmentations did not improve. Duran et al. [8] introduced a multi-class attention network to segment the PZ and PZ lesions with additional GS group grading of the same using a bp-MRI approach utilizing \(\textrm{T}_{2}\)WI and ADC of their in-house dataset with 98 subjects. The network is inspired by UNet and consists of a single encoding path and two decoding paths that share feature-maps. The first one segments the PZ and the second one segments PZ lesions and grades them according to five different GS groups. Hung et al. [10] investigate a transformer inspired cross-slice attention module called (CAT) to learn inter-slice dependencies at different levels. This mechanism is designed to be used in networks with shortcut connections between the encoding and decoding modules. The authors use their CAT module inside a nnUNet and nnUNet++ to segment the TZ and PZ and experimentally showed on an in-house and the ProstateX dataset that meaningful data can be learned from adjacent \(\textrm{T}_{2}\)WI slices. Isaksson et al. [25] compare two proprietary segmentation software systems against four DL-architectures including V-net which is a 3D version of standard UNet and two different 2D UNets to segment the prostate WG on a dataset with 100 patients. While their EfficientDet reached the highest DSC they argue that UNet still performs very well on small datasets. Additionally, they investigated whether clinical medical parameters have an effect on the segmentation, however, they did not find impactful correlations. All models were trained on two dataset splits on \(\textrm{T}_{2}\)WI and evaluated. Compared to the established medical software tools the DL-networks achieved better results.

Cascaded architectures are used in [6, 8, 27, 28], however, [6] uses DWI and \(\textrm{T}_{2}\)WI to segment the WG and the PZ only, [27] and [28] use it for image registration and localization in ADC and \(\textrm{T}_{2}\)WI and [8] uses it for PZ segmentation for PZ GS group grading.

3 Methods

This section introduces the pipeline used to process the datasets for this study. The CNN architecture is based on UNet [20], which was originally developed for biomedical image segmentation. We use the \(\textrm{T}_{2}\)WI for training and evaluation from our new open-source dataset UDEp (university Duisburg-Essen review board approval, 19-TEMP579281-BO) and from the openly available I2CVB dataset [4]. This modality is optimal to study the anatomy of the prostate and distinguish between the PZ with high-signal intensities and the CG with rather low signal-intensities [4, 29].

The pipeline consists of standard computer vision algorithms and DL-based methods. The \(\textrm{T}_{2}\)WI are preprocessed using histogram equalization. In order to obtain more data we join both datasets. Before merging the datasets the CG-class is created in the UDEp dataset via our generation algorithm to obtain the same amount of classes in both datasets and remove artifacts. After that the data is fed into our cascaded network architecture. Finally, the output is evaluated.

3.1 Model

The cascaded architecture Casc-UNetS is a concatenation of individual UNetSlim (UNetS) (c.f. Fig. 2) and standard computer vision processing. UNetS1 is used to predict the WG, UNetS2 the PZ and CG and finally UNetS3 the PCa class. We also integrate volumetric information to the input of the network by using the sliding window of [3, 9] and omitting the need to resample the MRI to coherent volumes.

Following the initial stage, the expert knowledge of the prostate WG location in \(\textrm{T}_{2}\)WI is employed to eliminate artifacts that may be erroneously generated by UNetS1. This optimization is achieved through the incorporation of a filter layer between UNetS1 and UNetS2, as follows: Calculate the MRI center point. Find the contours of all predicted WGs and calculate their area and center points using the contour moments as described in Sect. 3.4. Calculate the distance of each contour’s center point to the MRI center. If the contour has at least 5% (which was found to generate highest metric results on the trainset) of the possible maximum WG contour area, then select it as the correct WG, otherwise, choose the contour with largest area. Sometimes the selected prostate contour is very erratic, thus, the convex hull can optionally be calculated to smoothen the proposed outer contour. This option is only used to calculate the inputs to UNetS2 for the selected contour since this information is solely used to preselect the ROI.

Feature-map concatenation is used to preselect the ROI which is based on the expert knowledge of the location of the prostate zones PZ and CG which both lie within the WG. This is done by providing UNetS2 with the WG segmentation and UNetS3 with the WG, PZ and CG from the previous stages. The idea of providing additional information to the network is tested with the I2CVB-dataset to predict the PZ and CG classes without prior hyperparameter tuning.

The general UNet architecture is modified and in the following referred to as UNetS—a parameter reduced UNet (c.f. Fig. 3). The model can be divided into two basic blocks, namely a down- and an up-block. The down-block consists of a 2D-convolution layer followed by batch normalization [30] and the rectified linear unit (ReLU). This sequence is repeated twice and followed by a 2D max-pooling layer except for the bottleneck. The up-block is composed by 2D up-scaling using the bilinear transformation, followed by feature-map concatenation of the corresponding down-block and again two sequences of a 2D-convolution layer followed by batch normalization and ReLU.

Compared to standard UNet as described by [20] UNetS uses the following modifications: a) Batch normalization is used after each 2D convolution layer. b) The deepest stage with 1024 feature map channels is omitted to reduce the amount of parameters by more than 86% from about \({{\mathrm{31.04}}\times {10}^{6}}\) parameter to approximately \(\mathrm{4.28}\times \textrm{10}^{6}\) parameter which reduces the chance of overfitting due to the small amount of image data and vanishing gradient [31]. The final output convolution maps the channels of the last up-block to the desired number of prediction class channels. c) As an activation layer softmax is used instead of a combination with the same and cross-entropy loss. d) Up-sampling is achieved using the bilinear transform as opposed to transpose convolution in classical UNet. e) In contrast to a single image input we utilize the three consecutive MRI slice input to make use of inter-slice spacial information and mimic the behaviour of a radiologist when analyzing images. This method was shown to be superior in contrast to a single slice input, c.f. [3, 9, 10]. f) To incorporate expert knowledge of the location of the prostate into our network we equip UNetS1 with a filtering layer to remove artifacts. The input to UNetS2 and 3 is extended by feature-map concatenations over the previous segmentation result(s) on the current image.

3.2 Training

Hyperparameter tuning is performed for each dataset and stage using grid search [31]. The data is split into training, validation, and testing sets with ratios of 50 %, 20 % and 30 % respectively. We train our model on the training set and use the validation set for evaluation. The best hyperparameters are then tested on unseen testing data. For Casc-UNetS, tuning is done separately for each stage. For UNetS1, hyperparameters are tuned with and without CLAHE. For UNetS2 and 3, only CLAHE MRI are used due to improved validation losses.

Ma et al. [32] found in their statistical evaluation of different loss functions for medical imaging datasets that dice loss (DLoss) performs well with only a diminishing margin to the top performers, hence, the model is trained using \(\text {DLoss} = 1 - {\text {DSC}}\) which is calculated for a batch of inputs over the prediction classes. For optimization, adaptive moment estimation (Adam) [33] is used with a step-size decay scheduler with \(\gamma = 0.1\) and a step-size of 30 epochs.

3.3 Datasets

High quality prostate datasets with precise contouring are scarce. Hence, to create more training data, we combine the two small datasets UDEp and I2CVB. The UDEp dataset contains 29 \(\textrm{T}_{2}\)WI exams with 861 slices. The segmentations were created by a medicine PhD-student KS under the supervision of the expert urologist JPR^{Footnote 4} using the MITK [23] software. PCa was verified using systematic and guided needle biopsy to the lesion foci.

The UDEp dataset is segmented into WG, PZ and the multiple prostate lesion (PLES) classes that are joined to a single PCa class.

Additionally, we use the 19 \(\textrm{T}_{2}\)WI prostate exams from the I2CVB dataset with 468 slices in total that contain at least one of the following classes: PCa, PZ, CG and WG. In comparison to the UDEp dataset a segmentation for the CG is provided. The PCa class was proven through biopsy and found in 17 patients while 2 patients had negative biopsy results.

The third dataset called combined (COMB) dataset holding prostate exams of the UDEp and I2CVB with \(29 + 19 = 48\) patients. Since the UDEp does not contain a contouring for the CG class a method is presented on how to approximate it to have all classes present in both datasets.

3.4 CG generation

The CG (consisting of central zone, transitional zone, and anterior fibromuscular stroma) can be estimated by finding the set difference of the binary segmentation maps of the WG and the PZ, as \(\text {CG} = \text {WG} {\setminus } \text {PZ}\). The contourings in the UDEp dataset are sometimes less precise compared to the labels of the I2CVB dataset. As a result, the difference between the WG and PZ classes produces various artifacts, such as shapes around the original PZ and more than one CG, etc. The procedure to approximate the CG described in Fig. 4 is applied to remove these artifacts. It follows the these steps: a) The first approximation for the \(\widetilde{\text {CG}}\) is calculated as \(\text {WG} {\setminus } \text {PZ}\). b) A 2D median filter is applied to remove almost all unwanted artifacts. c) Since the median filter slightly blurs the contours, the set difference of the current approximation of the \(\widetilde{\text {CG}}\) with the intersection of itself and the PZ is computed, to achieve separability of the contours for the next step. Furthermore, contour moments are computed to determine the distances of all possible CGs to the PZ centroid. d) Finally, the CG candidate with shortest distance and relative area of 40% of the maximum area is selected as shown in Fig. 4-e. Further remarks are given in Appendix A.

Furthermore, the prostate \(\textrm{T}_{2}\)WI for each patient are grouped into the complete volume and three different regions or sub-volumes apex, midgland and base which comprise the 1st to 15th, the 16th to 84th and the 85th to 100th percentile of the MRI for which the targets include at least one of the classes WG, CG or PZ respectively [7]. This grouping helps to better determine the strengths and weaknesses of the DL-model in the above mentioned regions. The sub-volume names, however, are not to confuse with the exact prostate biopsy regions [34].

The distribution of the segmentation classes is shown in Table 1. The PCa class is underrepresented since only a few slices per patient contain PCa regions. Thus, this class is the hardest to detect. Furthermore, the UDEp dataset contains fewer PCa samples which makes it more difficult to train. The generated CG class is slightly larger in the UDEp dataset compared to the I2CVB set due to the automatic generation approach.

Table 1 The class pixel distribution is shown for each class over the complete prostate region of the three datasets. The background class is disregarded

Full size table

3.5 Image augmentation and histogram equalization

Alkadi et al. [3] and Duran et al. [8] claim that image augmentation reduces overfitting while [2] could not find any improvements. The augmentation pipeline used for this work consists of the following linear transformations: The patient’s MRI height and width differ. Thus, all \(\textrm{T}_{2}\)WI are resized to \(320 \times 320\) pixels which bisects the height and width of most UDEp’s MRI and lies approximately in the middle of the I2CVB sizes. Additionally, the images are horizontally flipped with a probability of 50%. Moreover, the border of the images is cropped in a random manner. The cropped area is replaced by a reflection pad. Cropping varies slightly the location of the prostate gland. Furthermore, the MRI are rotated up to \(\pm 2^{\circ }\). The empty parts that are created by the rotation are padded by the parts that were rotated out of the image. For training, the targets and input \(\textrm{T}_{2}\)WI are augmented with the above mentioned transformations, however, the inputs are also modified by adding Gaussian noise.

To enhance image contrast and make shapes and textures mainly in the boundary zones more visible (c.f. [35, 36]), we employ CLAHE [37] which is shown in Fig. 5.

3.6 Evaluation

In order to comprehend the interconnections between the target (TGT) and the prediction (PRED) the metrics \(\text {PPV} = \frac{TP}{TP + FP}\), \(\text {TPR} = \frac{TP}{TP+FN}\) and \(\text {DSC} = \frac{2 \cdot TP}{2 \cdot TP + FP + FN}\) are introduced. The positive predictive value (PPV) is the fraction of properly rated positives over the predicted positive cases (PPC). The true positive rate (TPR) is the proportion of properly categorized positives over the reference positive cases (RPC). The DSC is the harmonic mean between PPV and TPR and compares the intersecting area of the TGT and the PRED to the sum of both areas [40]. The reader is referred to Appendix B for further information on the calculation of the metrics.

3.7 Cross-validation

The authors of [4] identified k-fold cross-validation (k-cv) with \(k = 3\) or \(5\) as common numbers in literature. While [3] use leave-one-patient-out cross-validation (LOPO CV) for the exiguous I2CVB dataset, [2, 8, 28] use k-cv with either \(k=4\) or \(k=5\). While searching the hyperparameter space, the dataset is split randomly into the three sets train-, validation- and testset. For the final performance evaluation 5-fold cross validation (5-CV) is applied. Hence, the dataset is divided into the training-set that is used for training and the test-set on which the model is tested. After training and evaluation the highest performances for the DSC of each slice and each patient’s volume are averaged and the standard deviation is calculated. For hyperparameter tuning and performance evaluation, each split always contains all the images of a single patient, which ensures that the network model does not learn any features which are later validated and tested on.

4 Results

In Table 2 the slice- and 3D volume based DSC results are shown. Figure 6 shows the examples for Casc-UNetS and Fig. 7 shows PCa samples only. The findings are validated using the one-sided paired t-test with \(\alpha = 0.05\). Using CLAHE fewer WG artifacts were predicted in stage 1 which results in preciser ROIs for the WG which serve as additional input to stages 2 and 3. The effect of feature-map concatenation between individual stages results in an increase in 3D volume based TPR (15%, \(p < 0.001\)), PPV (5%, \(p < 0.05\)) and DSC (12%, \(p < 0.001\)) for the PZ and is highly critical. For the CG the increase in TPR (9%, \(p < 0.001\)), PPV (8%, \(p < 0.01\)) and DSC (6%, \(p < 0.01\)) is also highly critical. The effects of different training configurations of UNetS1 to 3 are discussed in the following subsections.

Table 2 Average and standard deviation for 3D-volume- and slice-based DSC results for Casc-UNetS in the different prostate regions and zones

Full size table

4.1 WG prediction with UNetS1

A DL-model has been trained based on each dataset. Since the COMB dataset comprises the I2CVB and the UDEp dataset, it has to be evaluated whether the model trained on the COMB dataset outperforms the models trained on the individual ones. Hence, for each MRI the metric differences are compared.

On the complete prostate region the COMB model improves the average DSC performance by 0.1% on the I2CVB dataset and worsens it by 1.1% on the UDEp dataset. [7] also report that their network did not always benefit from an increase in training samples. In the prostate apex region it improves by 1.4% for the I2CVB dataset while it deteriorates by 2.2% for the UDEp set. In the midgland region it diminishes by 0.2% and 0.4% for the I2CVB and UDEp set respectively. In the base region there is an increase of 0.2% for the I2CVB and 3.2% decrease for the UDEp dataset. There might be several reasons which lead to these findings: As mentioned in Sect. 3.4 the UDEp dataset contains segmentation errors. In many slices the WG extends the PZ which normally limits it. Thus, there is only a small improvement of 0.1% for the I2CVB set. Furthermore, the results for the WG are already very high (around 89% for I2CVB) which indicate that the model was able to learn the correct features based on the individual datasets.

Since the findings using the COMB dataset are in terms of the one sided paired-t-test with \(\alpha = 0.05\), \(p = 0.961\) for the I2CVB and \(p = 0.350\) for the UDEp not critical, the models trained on the individual datasets are used in the following stages of Casc-UNetS.

By using the filter between UNetS1 and 2 the slice-based DSC mean is increased from 88.9% to 89.2% (\(p < 0.001\)), 81.6% to 81.8% (\(p < 0.05\)) and 84.3% to 84.5% (\(p < 0.01\)) on the I2CVB, UDEp and COMB dataset respectively. The DSC increase is found in an increase in PPV of 0.4%, 0.4%, 0.5% for the three datasets while TPR remains constant, hence, the number of false positives (FP) is reduced, thus, the number of PPC decreases which in turn increases PPV and DSC (c.f. Sect. 3.6). Sometimes the model predicts within the target WG multiple smaller WGs. Obviously, the filter will remove all but one. The filter is used to reject unwanted artifacts outside the prostate area to provide an improved ROI selection to stages 2 and 3.

4.2 PZ and CG prediction with UnetS2

UNetS2 was trained and evaluated in two different ways using the additional WG contouring which is intended to focus the model to the WG to predict the PZ and CG. For training it is either possible to use the target- or the prediction WGs. The targets are used since they resulted in 1% higher DSC. The WG segmentation is either concatenated with the MRI slices (concatenated input) or the Hadamard product is formed between the WG and the input MRI (attention gate). The Hadamard-product of the prediction of the WG with the MRI [8] is used to focus UNetS2 solely to the ROI. The results are almost similar to those of the concatenated input, however, the base PZ region of the COMB dataset lies about 10% lower compared to the concatenated input. The attention gate acts like a perforator and reduces the MRI slices to only the predicted WG region before the model evaluates them. Since the base prostate contains mostly small areas, which are hard to detect for UNetS1, it will be impossible for the model to detect anything outside these predicted WG. Thus, Casc-UNetS uses the concatenated input.

4.3 PCa prediction with UNetS3

The last stage is formed by UNetS3 to predict the PCa class. Again two variants are tested. A filtered and unfiltered one since previous experiments revealed multiple PCa predictions scattered around the MRI slices. Hence, the filter is intended to remove all false predictions outside the segmentation contours of UNetS1 and 2. It is implemented as a element-wise multiplication of the predicted PCa class with a mask which is created by merging the classes WG, PZ, CG to obtain the complete predicted prostate region and finding the convex hull for smoothening. The experimental results, show that PCa outside the prostate region was not predicted anymore by providing the predicted segmentations of UNetS1 and -2 as an input, hence, filtering is redundant and thus, is rejected for UNetS3.

The highest DSC values are achieved for the I2CVB dataset. As expected the model learns the PCa class best in the I2CVB due to a higher PCa frequency (c.f. Table 1) followed by the COMB and finally the UDEp dataset. Although the percentages are rather low, the PCa class was not detected using a single stage UNetS.

5 Discussion

Looking closer at the cascaded network architecture of Casc-UNetS many advantages become evident. The architecture allows additional post-processing after each stage to incorporate the location of the prostate in \(\textrm{T}_{2}\)WI or the location of prostate lesions since they most frequently appear in the PZ [21]. Moreover, the output of each stage can be imagined as an a-priori residual connection of the last stage to the input of the next stage. The increase in DSC emphasizes that the network model can learn more precisely by limiting the model to the ROI as shown in Sect. 3.1. Each stage can be further optimized based on its individual segmentation task. Furthermore, Casc-UNetS solves the multi-label semantic segmentation task in stages using the softmax activation layer, i.e., one class (WG) per stage or if possible multiple classes (PZ and CG) if they do not superimpose anatomically. In order to solve the multi-label semantic segmentation problem multiple classes can be correct for a single pixel. If two classes are present in a ground truth pixel it is highly unlikely for softmax that the prediction probabilities split up in the same portions and sum up to one (even if both are correct). Additionally, the binary confusion matrix is calculated to compare the results which forces one single predicted class to one and all others to zero. Thus, for any pixel holding two or more ground truth classes the result would only be partially correct. Hence, the staged architecture has the advantage to utilize existing means and still realize multi-label classification in semantic segmentation. Although the performance results using image augmentation are slightly improved for the WG, CG and PCa (for the PZ they are 0.1% degraded), the findings are in terms of the paired-t-test with \(\alpha =0.05\) in the four classes WG, PZ, CG and PCa with p-values of 0.027, 0.439, 0.014 and 0.135 respectively only for the WG and CG with its lower signal intensities compared to the PZ [4, 29] critical. Comparing the extensive implementation effort of image augmentation with its effect this is a remarkable finding and in accordance with [2, 41].

On the other hand, there are also challenges that have to be addressed. While the targets of the I2CVB dataset [4] are mostly accurate the ones of the UDEp dataset contain segmentation errors which required the development of the novel CG generation and noise-reduction algorithm (c.f. Sect. 3.4). While this algorithm works very well to generate the CG, the existing targets of WG, PZ and PCa are not affected by the noise-reduction. Especially, the WG still contains a lot of errors that present itself by a WG that extends the PZ which usually limits it. The network trained on this data will learn the tissue features of the partially incorrect labeled MRI as correct which results in difficulties. We cannot compare a rather accurate model that is trained on the I2CVB dataset with incorrect targets from the UDEp and vice-versa. Since some slices within the UDEp-dataset contain segmentation errors, a combination of both datasets results in misleading information for training, hence, the additional data did not result in better or more accurate results (c.f. Sect. 4.1). Nevertheless, we conjecture that the model trained on the additional data from the UDEp dataset will result in a more robust model that generalizes better. Additionally, each stage has to be trained and evaluated separately which is time consuming and it is especially challenging to apply image augmentation for training UNetS2 and -3. Currently the prostate filter selects the correct contour based on the location of the predicted contours. This assumption is made since the prostate lies in the center of the male pelvic cavity. In some cases multiple contours will be correct if they lie within the ground truth prostate contour. However, the prostate filter is only based on distance and area. Thus, the prostate filter could be further improved to recognize more true positives while removing false positives.

To delineate the distinctions with classical UNet, we examined our staged architecture, termed Casc-UNetS, which essentially constitutes a concatenation of a modified UNet [c.f. 42]. The performance of traditional UNet architectures on the I2CVB dataset is documented in the literature (i.e., in the study “Attention Guided Deep Supervision Model for Prostate Segmentation” [43]) and found to be inferior compared to adapted versions of the same. The authors analyzed the performance disparities of different UNet versions (3D, attention guided) with classical UNet. For instance, [43, 44] provide extensive insights into the performance of various UNet variants, directly correlating with the findings of [3, 9], which form the foundation of our work.

In contrast to most other works that incorporate CNNs for prostate segmentation we only use two tiny datasets of 19, 29 and finally the combination of both with 48 exams with 5-CV which is at the very bottom in terms of dataset size when comparing to the studies that utilize CNNs as outlined in the meta analysis of Wildeboer et al. [c.f. [1], Table 4] with a mean of 202 patient exams to achieve similar results.

The following limitations are to mention: In general, if a network modification or training option did not improve the evaluation results in one stage it was disregarded, i.e., the effect of image augmentation was only evaluated for the model trained on the I2CVB data, the effect of training the model on the COMB dataset and evaluating it on the individual datasets I2CVB and UDEp was only evaluated for UNetS1 (c.f. Sect. 4.1).

For further investigations on how to properly join two datasets a standardized data acquisition and labeling protocol must be established in collaboration with the medical imaging professionals. After that, the data must be relabeled to generate a highly accurate UDEp dataset that can be compared to the performance of the dataset used in this work.

6 Conclusions

In this manuscript, we proposed an innovative cascaded CNN architecture called Casc-UNetS, which is able to perform multi-label semantic segmentation to precisely contour the prostate in \(\textrm{T}_{2}\)WI. The model is inspired by current research methods [3, 6,7,8, 20] and was evaluated on three different datasets. The network architecture was optimized in an iterative process which allows each stage to be tweaked to its prediction task. Featuremap concatenation improved the overall prediction performance by a large margin and PCa is detected based on \(\textrm{T}_{2}\)WI only.

Given that the CG contours in the UDEp-dataset are missing we employ our algorithm that uses median filtering and contour moments to approximate and simultaneously denoise the CG for the UDEp \(\textrm{T}_{2}\)WI in order to join both the I2CVB and UDEp datasets to obtain a dataset with more variety. The contrast of the MRI was improved using CLAHE which created preciser WG predictions. For each network stage hyperparameter tuning was conducted and the performance evaluated using 5-CV. The results are compared and analysed as shown in Table 2. Image augmentation techniques—except for CLAHE—were found to be almost irrelevant (c.f. Sect. 5).

Data availability

The UDEp dataset is available in the zenodo repository, DOI: 10.5281/zenodo.12817071 and https://doi.org/10.5281/zenodo.12817071. The project source code in particular Casc-UNetS is available in the zenodo repository, DOI: 10.5281/zenodo.11108534 and https://doi.org/10.5281/zenodo.11108534. The I2CVB dataset [4] is available in the zenodo repository, DOI: 10.5281/zenodo.162231 and https://doi.org/10.5281/zenodo.162231.

Notes

Digital Imaging and Communications in Medicine (DICOM) file format.
Medical Imaging Interaction Toolkit (MITK) file format.
initiative for collaborative computer vision benchmarking (I2CVB) dataset.
JPR was trained as prostate MRI radiologist. He received training from board-certified radiologists and prostate MRI experts in 2015 and 2016. He read approximately 800 prostate MR scans and contoured about 600 prostate MRIs for research purposes.

References

Wildeboer RR, van Sloun RJG, Wijkstra H, Mischi M. Artificial intelligence in multiparametric prostate cancer imaging with focus on deep-learning methods. Comput Methods Prog Biomed. 2020;189: 105316. https://doi.org/10.1016/j.cmpb.2020.105316.
Article Google Scholar
Aldoj N, Biavati F, Michallek F, Stober S, Dewey M. Automatic prostate and prostate zones segmentation of magnetic resonance images using DenseNet-like U-net. Scientific Rep. 2020;10(1):14315. https://doi.org/10.1038/s41598-020-71080-0.
Article Google Scholar
Alkadi R, Taher DF, El-Baz A, Werghi N. A deep learning-based approach for the detection and localization of prostate cancer in T2 magnetic resonance images. J Dig Imag. 2018. https://doi.org/10.1007/s10278-018-0160-1.
Article Google Scholar
Lemaître G, Martí R, Freixenet J, Vilanova JC, Walker PM, Meriaudeau F. Computer-aided detection and diagnosis for prostate cancer based on mono and multi-parametric MRI: a review. Comput Biol Med. 2015;60:8–31. https://doi.org/10.1016/j.compbiomed.2015.02.009.
Article Google Scholar
Huang G, Liu Z, Van Der Maaten L, Weinberger KQ. Densely Connected Convolutional Networks. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR); 2017. pp. 2261–2269.
Zhu Y, Wei R, Gao G, Ding L, Zhang X, Wang X, et al. Fully automatic segmentation on prostate MR images based on cascaded fully convolution network. J Magn Reson Imag. 2019;49(4):1149–56. https://doi.org/10.1002/jmri.26337.
Article Google Scholar
Nai YH, Teo BW, Tan NL, Chua KYW, Wong CK, O’Doherty S, et al. Evaluation of multimodal algorithms for the segmentation of multiparametric MRI prostate images. Comput Math Methods Med. 2020;2020:8861035–8861035. https://doi.org/10.1155/2020/8861035.
Article Google Scholar
Duran A, Jodoin PM, Lartizien C. Prostate cancer semantic segmentation by Gleason Score group in bi-parametric MRI with self attention model on the peripheral zone. In: Arbel T, Ben Ayed I, de Bruijne M, Descoteaux M, Lombaert H, Pal C, editors. Proceedings of the Third Conference on Medical Imaging with Deep Learning. vol. 121 of Proceedings of Machine Learning Research. Montreal: PMLR; 2020. p. 193–204.
Google Scholar
Alkadi R, El-Baz A, Taher F, Werghi N. A 2.5D deep learning-based approach for prostate cancer detection on T2-weighted magnetic resonance imaging. In: Leal-Taixé L, Roth S, editors. Computer vision—ECCV 2018 workshops. Cham: Springer International Publishing; 2019. p. 734–9.
Chapter Google Scholar
Hung A, Zheng H, Miao Q, Raman S, Terzopoulos D, Dand Sung K. CAT-net: a cross-slice attention transformer model for prostate zonal segmentation in MRI. IEEE Trans Med Imag. 2023;42(1):291–303. https://doi.org/10.1109/TMI.2022.3211764.
Article Google Scholar
Ushinsky A, Bardis M, Glavis-Bloom J, Uchio E, Chantaduly C, Nguyentat M, Chow D, Chang P, Houshyar R. A 3D–2D hybrid U-net convolutional neural network approach to prostate organ segmentation of multiparametric MRI. Am J Roentgenol. 2020. https://doi.org/10.2214/AJR.19.22168.
Article Google Scholar
Clark T, Wong A, Haider M, Khalvati F. Fully deep convolutional neural networks for segmentation of the prostate gland in diffusion-weighted MR images. Am J Roentgenol. 2017. https://doi.org/10.1007/978-3-319-59876-5_12.
Article Google Scholar
Litjens G, Toth R, van de Ven W, Hoeks C, Kerkstra S, van Ginneken B, Vincent G, Guillard G, Birbeck N, Zhang J, Strand R, Malmberg F, Ou Y, Davatzikos C, Kirschner M, Jung F, Yuan J, Qiu W, Gao Q, Edwards P, Maan B, van der Heijden F, Ghose S, Mitra J, Dowling J, Barratt D, Huisman H, Madabhushi A. Evaluation of prostate segmentation algorithms for MRI: the PROMISE12 challenge. Med Image Analy. 2014;18(2):359–73. https://doi.org/10.1016/j.media.2013.12.002.
Article Google Scholar
Ramacciotti L, Hershenhouse J, Mokhtar D, Paralkar D, Kaneko M, Eppler M, Gill K, Mogoulianitis V, Duddalwar V, Abreu A, Gill I, Cacciamani G. Comprehensive assessment of MRI-based artificial intelligence frameworks performance in the detection, segmentation, and classification of prostate lesions using open-source databases. Urol Clin N Am. 2024;51(1):131–61. https://doi.org/10.1016/j.ucl.2023.08.003.
Article Google Scholar
Johnson L, Harmon S, Yilmaz E, Lin Y, Belue M, Merriman K, Lay N, Sanford T, Sarma K, Arnold C, Xu Z, Roth H, Yang D, Tetreault J, Xu D, Patel K, Gurram S, Wood B, Citrin D, Pinto P, Choyke P, Turkbey B. Automated prostate gland segmentation in challenging clinical cases: comparison of three artificial intelligence methods. Abdom Radiol. 2024;49(5):1545–56. https://doi.org/10.1007/s00261-024-04242-7.
Article Google Scholar
Litjens G, Debats O, Barentsz J, Karssemeijer N, Huisman H. ProstateX challenge data. Cancer Imag Arch. 2017. https://doi.org/10.7937/K9TCIA.2017.MURS5CL.
Article Google Scholar
Montagne S, Hamzaoui D, Allera A, Ezziane M, Luzurier A, Quint R, Kalai M, Ayache N, Delingette H, Renard-Penna R. Challenge of prostate MRI segmentation on T2-weighted images: inter-observer variability and impact of prostate morphology. Insights Imag. 2021;12(1):71. https://doi.org/10.1186/s13244-021-01010-9.
Article Google Scholar
Becker A, Chaitanya K, Schawkat K, Muehlematter U, Hötker A, Konukoglu E, Donati O. Variability of manual segmentation of the prostate in axial T2-weighted MRI: a multi-reader study. Eur J Radiol. 2019. https://doi.org/10.1016/j.ejrad.2019.108716.
Article Google Scholar
Turkbey B, Fotin S, Huang R, Yin Y, Daar D, Aras O, Bernardo M, Garvey B, Weaver J, Haldankar H, Muradyan N, Merino M, Pinto P, Periaswamy S, Choyke P. Fully automated prostate segmentation on MRI: comparison with manual segmentation methods and specimen volumes. Am J Roentgenol. 2013;201(5):W720–9. https://doi.org/10.2214/AJR.12.9712.
Article Google Scholar
Ronneberger O, Fischer P, Brox T. U-Net: convolutional networks for biomedical image segmentation. In: Navab N, Hornegger J, Wells WM, Frangi AF, editors. Medical image computing and computer-assisted intervention-MICCAI 2015. Cham: Springer International Publishing; 2015. p. 234–41.
Google Scholar
McNeal JE, Redwine EA, Freiha FS, Stamey TA. Zonal distribution of prostatic adenocarcinoma. Correlation with histologic pattern and direction of spread. Am J Surg Pathol. 1988;12(12):897–906. https://doi.org/10.1097/00000478-198812000-00001.
Article Google Scholar
Mildenberger P, Eichelberg M, Martin E. Introduction to the DICOM standard. Eur Radiol. 2002;12(4):920–7. https://doi.org/10.1007/s003300101100.
Article Google Scholar
Nolden M, Zelzer S, Seitel A, Nabers D, Müller M, Franz A, et al. The medical imaging interaction toolkit: challenges and advances. Int J Comput Assis Radiol Surg. 2013. https://doi.org/10.1007/s11548-013-0840-8.
Article Google Scholar
Khan Z, Yahya N, Alsaih K, Al-Hiyali M, Meriaudeau F. Recent automatic segmentation algorithms of MRI prostate regions: a review. IEEE Access. 2021;9:97878–905. https://doi.org/10.1109/ACCESS.2021.3090825.
Article Google Scholar
...Isaksson L, Pepa M, Summers P, Zaffaroni M, Vincini M, Corrao G, Mazzola G, Rotondi M, Lo Presti G, Raimondi S, Gandini S, Volpe S, Haron Z, Alessi S, Pricolo P, Mistretta F, Luzzago S, Cattani F, Musi G, Cobelli O, Cremonesi M, Orecchia R, Marvaso G, Petralia G, Jereczek-Fossa B. Comparison of automated segmentation techniques for magnetic resonance images of the prostate. BMC Med Imag. 2023. https://doi.org/10.1186/s12880-023-00974-y.
Article Google Scholar
Hossain MS, Paplinski AP, Betts JM. Residual semantic segmentation of the prostate from magnetic resonance images. In: Cheng L, Leung ACS, Ozawa S, editors. Neural information processing. Cham: Springer International Publishing; 2018. p. 510–21.
Chapter Google Scholar
Yang X, Liu C, Wang Z, Yang J, Min HL, Wang L, et al. Co-trained convolutional neural networks for automated detection of prostate cancer in multi-parametric MRI. Med Image Analy. 2017;42:212–27. https://doi.org/10.1016/j.media.2017.08.006.
Article Google Scholar
Wang Z, Liu C, Cheng D, Wang L, Yang X, Cheng KT. Automated detection of clinically significant prostate cancer in mp-MRI images based on an end-to-end deep neural network. IEEE Trans Med Imag. 2018;37(5):1127–39. https://doi.org/10.1109/TMI.2017.2789181.
Article Google Scholar
Fütterer JJ. Multiparametric MRI in the detection of clinically significant prostate cancer. Korean J Radiol. 2017;18(4):597–606. https://doi.org/10.3348/kjr.2017.18.4.597.
Article Google Scholar
Ioffe S, Szegedy C. Batch normalization: accelerating deep network training by reducing internal covariate shift. In: Bach F, Blei D, editors. Proceedings of the 32nd International conference on machine learning, vol. 37. Lille: PMLR; 2015. p. 448–56.
Google Scholar
Goodfellow I, Bengio Y, Courville A. Deep learning. Cambridge: MIT Press; 2016.
Google Scholar
Ma J, Chen J, Ng M, Huang R, Li Y, Li C, et al. Loss odyssey in medical image segmentation. Med Image Analy. 2021;71: 102035. https://doi.org/10.1016/j.media.2021.102035.
Article Google Scholar
Kingma DP, Ba J. Adam: A Method for Stochastic Optimization. CoRR. 2015. arXiv:1412.6980.
Ma Q, Yang D, Xue B, Wang C, Chen H, Dong Y, et al. Transrectal real-time tissue elastography targeted biopsy coupled with peak strain index improves the detection of clinically important prostate cancer. Oncol Lett. 2017. https://doi.org/10.3892/ol.2017.6126.
Article Google Scholar
Zhang D, Yang Z, Jiang S, Zhou Z, Meng M, Wang W. Automatic segmentation and applicator reconstruction for CT-based brachytherapy of cervical cancer using 3D convolutional neural networks. J Appl Clin Med Phys. 2020;21(10):158–69. https://doi.org/10.1002/acm2.13024.
Article Google Scholar
Yan L, Liu D, Xiang Q, Luo Y, Wang T, Wu D, et al. PSP net-based automatic segmentation network model for prostate magnetic resonance imaging. Comput Methods Prog Biomed. 2021;207: 106211. https://doi.org/10.1016/j.cmpb.2021.106211.
Article Google Scholar
Pizer SM, Amburn EP, Austin JD, Cromartie R, Geselowitz A, Greer T, et al. Adaptive histogram equalization and its variations. Comput Vis Graph Image Process. 1987;39(3):355–68. https://doi.org/10.1016/S0734-189X(87)80186-X.
Article Google Scholar
Bradski G. The openCV library. Dr Dobb’s J. 2000;25(11):120–3.
Google Scholar
Szeliski R. Computer vision: algorithms and applications. 2nd ed. London: Springer; 2021.
Google Scholar
Powers DM. Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation. arXiv preprint arXiv:2010.16061. 2020.
Desai AD, Gold GE, Hargreaves BA, Chaudhari AS. Technical considerations for semantic segmentation in MRI using convolutional neural networks. arXiv preprint arXiv:1902.01977. 2019.
Bhandary S, Kuhn D, Babaiee Z, Fechter T, Benndorf M, Zamboglou C, Grosu A, Grosu R. Investigation and benchmarking of U-Nets on prostate segmentation tasks. Comput Med Imag Graph. 2023;107(11): 102241. https://doi.org/10.1016/j.compmedimag.2023.102241.
Article Google Scholar
Shanmugalingam K, Sowmya A, Moses D, Meijering E. Attention guided deep supervision model for prostate segmentation in multisite heterogeneous MRI data. Int Conf Med Imag Deep Learn. 2022;172:1085–95.
Google Scholar
Liu Y, Zhu Y, Xin Y, Zhang Y, Yang D, Xu T. MESTrans: multi-scale embedding spatial transformer for medical image segmentation. Comput Methods Prog Biomed. 2023;233: 107493.
Article Google Scholar

Download references

Acknowledgements

Not applicable

Funding

Open Access funding enabled and organized by Projekt DEAL. This research did not receive any funding.

Author information

Markus Schneider and Jan Philipp Radtke contributed equally as shared senior authors.

Authors and Affiliations

Institute for Artificial Intelligence, University of Applied Sciences Ravensburg-Weingarten (RWU), P.O. Box 30 22, Weingarten, 88216, Germany
Mark Locherer, Christopher Bonenberger, Wolfgang Ertel & Markus Schneider
Department of Urology, University Hospital Essen, Hufelandstraße 55, Essen, 45147, Germany
Boris Hadaschik, Kristina Stumm & Jan Philipp Radtke
Department of Urology, Medical Faculty, Heinrich Heine-University Düsseldorf, Moorenstr. 5, Düsseldorf, 40225, Germany
Jan Philipp Radtke
Department of Radiology, German Cancer Research Center, Im Neuenheimer Feld 280, Heidelberg, 69120, Germany
Jan Philipp Radtke

Authors

Mark Locherer
View author publications
You can also search for this author in PubMed Google Scholar
Christopher Bonenberger
View author publications
You can also search for this author in PubMed Google Scholar
Wolfgang Ertel
View author publications
You can also search for this author in PubMed Google Scholar
Boris Hadaschik
View author publications
You can also search for this author in PubMed Google Scholar
Kristina Stumm
View author publications
You can also search for this author in PubMed Google Scholar
Markus Schneider
View author publications
You can also search for this author in PubMed Google Scholar
Jan Philipp Radtke
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Study, software, data preparation and manuscript: ML. Supervision: CB, MS and WE. Raw UDEp data and labeling: JPR, BH, KS and ML. Article revision: ML, MS, CB, JPR, BH.

Corresponding author

Correspondence to Mark Locherer.

Ethics declarations

Ethics approval and consent to participate

The collection of the MRI data for the UDEp dataset was performed in line with the principles of the Declaration of Helsinki. Approval was granted by the review board of the university Duisburg-Essen (# 19-TEMP579281-BO).

Informed consent

Informed consent was obtained from all individual participants included in the study.

Consent for publication

The authors affirm that human research participants provided informed consent for publication of the UDEp dataset.

Competing interests

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A

CG generation

Special remarks on the CG generation algorithm are indicated based on Fig. 4 with its individual steps a to f.

Nai et al. [7] calculated the central gland CG by taking the set difference between the whole gland WG and the peripheral zone PZ. In step b the first approximation \(\widetilde{\text {CG}}\) is modified using 2D median filter with adaptive filter size. The filter kernel size is determined by the number of pixels of the \(\widetilde{\text {CG}}\) for each MRI slice which is imagined to be roughly a circle with area \(\text {A}_{\widetilde{\text {CG}}}\) and the kernel size is proportional to its by \(\alpha\) scaled radius \(\sqrt{\text {A}_{\widetilde{\text {CG}}}/\pi }\). The scaling factor \(\alpha\) has to be determined experimentally and it depends on the artifacts shown in Fig. 4-b. Setting \(\alpha = 0.8\) generates good CG approximations for most samples in the UDEp dataset. The median filter will generate a binary one if the majority of the superimposed pixels are ones. This results in very sharp borders and removes almost all unwanted artifacts. If the filter kernel size is chosen too large this would remove all small CGs. On the other hand, a small kernel is not able to remove the unwanted artifacts in large CGs. Step b already generates in some cases a perfect CG. However, in other cases multiple possible CGs are created, especially if the WG segmentation outreaches the PZ as shown in c. Furthermore, the filter slightly blurs the \(\widetilde{\text {CG}}\) contours such that some pixels of the \(\widetilde{\text {CG}}\) interfere with the PZ contours. The median filter might blur the \(\widetilde{\text {CG}}\) slightly, which can impede separability of possible CG contours in step d. Hence, to achieve separable CG candidates, we compute \(\text {CG} = \widetilde{\text {CG}} {\setminus } \{\widetilde{\text {CG}} \cap \text {PZ}\}\). Moreover, the PZ and remaining possible CG contour moments \(M_{00}\), \(M_{10}\) and \(M_{01}\) [39] are calculated to find the centroids \(\{{{\bar{x}}},\ {{\bar{y}}}\}=\left\{ {\frac{M_{10}}{M_{00}}},{\frac{M_{01}}{M_{00}}}\right\}\), as well as the centroid distances. In step d CG contour with shortest centroid distance and a relative area \(\text {A}_m \ge 0.4\) of the maximum area of all contours is selected. If the area is smaller than 40% the contour with maximum area is chosen as the correct CG (as shown in e).

This method was tweaked to the UDEp dataset (choosing \(\alpha = 0.8\) and \(\text {A}_m \ge 0.4\)) and yields in the majority good results, especially in the prostate midgland region (large WG with well-formed large PZ). It is to note that the algorithm steps 4 to 6 fix approximately 10% of the CG segmentation maps in the UDEp-dataset.

Appendix B

Evaluation

In current literature some authors [6,7,8] state the usage of complete 3D volumes for the evaluation. For others [2, 3, 27], it is not specified which makes comparison of the evaluation metrics, additionally, to the usage of different datasets and validation methods, problematic. It is important to specify the way of calculation since the results greatly differ because the classes WG, PZ, CG and PCa are unevenly distributed over the individual slices. In particular, the apex- and base region contain ROIs with very few pixels and thus, are much harder to detect compared to the slices in the midgland that contain larger ROIs. Hence, it is a huge difference of finding the DSC for a 3D volume that comprises all slices or the DSC for an individual slice. These differences can be observed in Table 2. Thus, we distinguish between slice-based, true positive (TP), false positive (FP), true negative (TN) and false negative (FN) and the metrics PPV, TPR and DSC individually, and the volume-based evaluation that sums over all TP, FP, TN and false negative (FN) of the complete volume and then the PPV, TPR and DSC are computed [6,7,8]. Let \(I\) depict the image domain with N pixels and \(C\) classes. Moreover, in the case of sliced-based evaluation the variables \(t_n\) and \(p_n\) represent the n-th target- and prediction pixel of a slice and in the case of volume-based evaluation they represent the n-th pixel / voxel of a 3D-volume respectively. For any given type the TP are calculated as \(\sum _{N} t_{n} \cdot p_{n}\), the FP as \(\sum _{N} (1 - t_{n}) \cdot p_{n}\), the TN as \(\sum _{N} (1 - t_{n}) \cdot (1 - p_{n})\) and the false negative (FN) as \(\sum _{N} (1 - t_{n}) \cdot (1 - p_{n})\).

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Locherer, M., Bonenberger, C., Ertel, W. et al. Multi-label semantic segmentation of magnetic resonance images of the prostate gland. Discov Artif Intell 4, 66 (2024). https://doi.org/10.1007/s44163-024-00162-z

Download citation

Received: 03 May 2024
Accepted: 13 August 2024
Published: 02 October 2024
DOI: https://doi.org/10.1007/s44163-024-00162-z