Abstract
Purpose
Semantic segmentation plays a pivotal role in many applications related to medical image and video analysis. However, designing a neural network architecture for medical image and surgical video segmentation is challenging due to the diverse features of relevant classes, including heterogeneity, deformability, transparency, blunt boundaries, and various distortions. We propose a network architecture, DeepPyramid+, which addresses diverse challenges encountered in medical image and surgical video segmentation.
Methods
The proposed DeepPyramid+ incorporates two major modules, namely “Pyramid View Fusion” (PVF) and “Deformable Pyramid Reception” (DPR), to address the outlined challenges. PVF replicates a deduction process within the neural network, aligning with the human visual system, thereby enhancing the representation of relative information at each pixel position. Complementarily, DPR introduces shape- and scale-adaptive feature extraction techniques using dilated deformable convolutions, enhancing accuracy and robustness in handling heterogeneous classes and deformable shapes.
Results
Extensive experiments conducted on diverse datasets, including endometriosis videos, MRI images, OCT scans, and cataract and laparoscopy videos, demonstrate the effectiveness of DeepPyramid+ in handling various challenges such as shape and scale variation, reflection, and blur degradation. DeepPyramid+ demonstrates significant improvements in segmentation performance, achieving up to a 3.65% increase in Dice coefficient for intra-domain segmentation and up to a 17% increase in Dice coefficient for cross-domain segmentation.
Conclusions
DeepPyramid+ consistently outperforms state-of-the-art networks across diverse modalities considering different backbone networks, showcasing its versatility. Accordingly, DeepPyramid+ emerges as a robust and effective solution, successfully overcoming the intricate challenges associated with relevant content segmentation in medical images and surgical videos. Its consistent performance and adaptability indicate its potential to enhance precision in computerized medical image and surgical video analysis applications.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
Introduction
Semantic segmentation has emerged as a critical tool in computerized medical image and surgical video analysis, empowering numerous applications in various domains. In surgical videos, semantic segmentation is a prerequisite in several applications ranging from phase and action recognition, irregularity detection, surgical training, objective skill assessment, relevance-based compression, surgical planning, operation room organization, and so forth [1,2,3,4]. In the case of volumetric medical images, semantic segmentation can considerably aid in the diagnosis, treatment planning, and monitoring [5]. Automatic segmentation of medical images and videos can also reduce subjective errors caused by time constraints and workloads while enhancing treatment and surgical efficiency.
Designing a neural network architecture for medical image and surgical video segmentation presents a challenge due to the diverse features exhibited by different relevant labels. Specifically, many classes of objects relevant to the medical image and surgical video analysis are heterogeneous, featuring deformable or amorphous instances, as well as color, texture, and scale variation. Besides in surgical videos, the problem of motion blur degradation becomes more critical due to the camera’s proximity to the surgical scene. Unlike general images, medical images and surgical videos may contain transparent relevant content (such as intraocular lens) or exhibit blunt boundaries, further complicating the task of semantic segmentation. Accordingly, an effective network for medical image and surgical video segmentation should be able to simultaneously deal with (I) heterogeneity and deformability in relevant objects, and (II) transparency, blunt edges, and distortions such as motion and defocus blur.
This paper introduces a U-Net-based CNN for semantic segmentation, which effectively addresses the challenges associated with segmenting relevant content in medical images and surgical videos by adaptively capturing semantic information.Footnote 1 The proposed network, called DeepPyramid+, comprises two key modules: (i) Pyramid View Fusion (PVF) module, which offers a narrow-to-wide-angle global view of the feature map centering at each pixel position, and (ii) Deformable Pyramid Reception (DPR) module, responsible for performing shape-adaptive feature extraction on the input convolutional feature mapFootnote 2. We provide comprehensive experiments to compare the performance of DeepPyramid+ with state-of-the-art baselines for five intra-domain and two cross-domain datasets. Experimental results reveal the superiority of DeepPyramid+ compared to the baselines. Ablation studies confirm the effectiveness of each proposed module in boosting semantic segmentation performance. To support reproducibility and further investigations, we will release the PyTorch implementation of DeepPyramid+ and all dataset splits with the acceptance of this paper.
Related work
U-Net [7] was initially proposed for medical image segmentation and achieved succeeding performance being attributed to its skip connections. Many U-Net-based architectures have been proposed over the past years to improve the segmentation accuracy and address different flaws and restrictions in the previous architectures [8,9,10,11,12,13,14].
Attention modules
Attention mechanisms can be broadly described as the techniques to guide the network’s computational resources (i.e.,the convolutional operations) toward the most determinative features in the input feature map [9, 15, 16]. Such mechanisms have been especially proven to be gainful in the case of semantic segmentation. The scSE blocks [15] aim to recalibrate the feature maps based on pixel-wise and channel-wise global features. BARNet [12] adopts a bilinear-attention module to extract the cross-dependencies between the different channels of a convolutional feature map. PAANET [11] uses a double-attention module to model semantic dependencies between channels and spatial positions in the convolutional feature map.
Fusion modules
Fusion modules can be characterized as modules designed to improve semantic representation via combining several feature maps. The input feature maps could range from varying-level semantic features to the features coming from parallel operations. PSPNet [17] adopts a pyramid pooling module (PPM) containing parallel sub-region average pooling layers followed by upsampling to fuse the multi-scale sub-region representations. Atrous spatial pyramid pooling (ASPP) [18, 19] was proposed to deal with objects’ scale variance by aggregating multi-scale features extracted using parallel varying-rate dilated convolutions. CPFNet [13] uses another fusion approach for scale-aware feature extraction.
Methodology
We present a segmentation network that focuses on (I) modeling heterogeneous classes featuring deformations, shape, scale, color, and context variation, (II) dealing with content distortion due to motion blur and reflection, and (III) handling objects’ transparency and blunt boundaries (Fig. 1). At its core, our network adopts the U-Net architecture, with the encoder part being set to VGG16. We develop two decoder modules specifically tailored to tackle the mentioned challenges: (1) Pyramid View Fusion (PVF), which aims to replicate a deduction process within the neural network analogous to the functioning of the human visual system by enhancing the representation of relative information at each individual pixel position. (2) Deformable Pyramid Reception (DPR), which addresses the limitations of regular convolutional layers by introducing deformable dilated convolutions and shape- and scale-adaptive feature extraction techniques. This module allows for handling the complexities of heterogeneous classes and deformable shapes, resulting in improved accuracy and robustness in the segmentation performance.
We specify the functionality of each module in the following subsections. Additional discussions regarding the effectiveness of each module and an analysis of the complexity for each module are available in the supplementary material.
Notations. Throughout this paper, we represent convolutional layers with a kernel size of \((k\times k)\), dilation of d, m output channels, and g groups as \(\circledast _{k,d}^{m,g}\). For deformable convolutions, we use the symbol \({\tilde{\circledast }}_{k,d}^{m,g}\). Additionally, we illustrate the average-pooling layer with a kernel size of \((k\times k)\) and a stride of s pixels as and global average pooling as . The symbol \(+\!\!\!\!+\,_{D}\) denotes feature map concatenation over dimension D. Furthermore, we employ \(\Uparrow ^{(W_{out}, H_{out})}\) and \(\Downarrow ^{(W_{out}, H_{out})}\) for upsampling and downsampling operations with a scale factor of \((W_{out}, H_{out})\), respectively. We use \(\sigma (\cdot )\) to represent the Softmax operation, \(\Vert \cdot \Vert _{n}\) for layer normalization over the last n dimensions, \(\mathcal {R}(\cdot )\) for the ReLU nonlinearity function, and \(\tau (\cdot )\) for the hard tangent hyperbolic function.
Pyramid View Fusion (PVF)
To optimize computational complexity, the initial step involves creating a bottleneck by employing a convolutional layer with a kernel size of one, as illustrated in Fig. 2. Following this dimensionality reduction stage, the resulting convolutional feature map is fed into four parallel branches. The first branch features a global average pooling layer, which is subsequently followed by upsampling. The other three branches employ average pooling layers with progressively increasing filter sizes while maintaining a stride of one pixel. The use of a one-pixel stride is specifically important to achieve a pixel-wise centralized pyramid view, as opposed to the region-wise pyramid attention approach employed in PSPNet [17]. The output feature maps from all branches are then concatenated and fed into a convolutional layer with four groups, for extracting inter-channel dependencies during dimensionality reduction. Subsequently, a regular convolutional layer is applied to extract joint intra-channel and inter-channel dependencies. The resulting feature map is then passed through a layer-normalization function, which helps normalize the activations for improved stability and performance.
Deformable Pyramid Reception (DPR)
The architecture of the Deformable Pyramid Reception (DPR) module, as depicted in Fig. 2, can be described as follows. Initially, the upsampled coarse-grained semantic feature map from the preceding layer is concatenated with its symmetric fine-grained feature map from the encoder. Subsequently, these concatenated features are passed through three parallel branches. The first branch employs a regular convolution operation, while the other two branches utilize deformable convolutions with different dilation rates of three and six. The structured convolution covers the immediate neighboring pixels up to one pixel to the central pixel. The deformable convolutions with the dilation rate of three and six cover an area from two to four and five to seven pixels far away from each central pixel, respectively. Accordingly, the DPR module forms a learnable sparse receptive field of size \(15\times 15\) pixels by incorporating these layers. These layers share the weights to avoid imposing a huge number of trainable parameters.
To compute the feature-map-adaptive offset field for each deformable convolution, a regular convolution operation is employed. Considering the target area of the two deformable convolutions, the offset field should be computed based on the internal content within four and seven pixels away from each central pixel (\(k=9\), \(k=15\)). The computed offset values are then passed through a tangent hyperbolic function, which clips them within the range of \([-1, 1]\), to ensure that each deformable convolution adaptively covers an area within the range of \([k-1, k+1]\). The offset field provides two values per element in the deformable convolutional kernel (horizontal and vertical offsets). Accordingly, the number of offset field’s output channels for a deformable convolution with a kernel of size \(3\times 3\) is equal to 18. This enables the deformable convolution to spatially adjust its receptive field based on the learned offset values, improving its ability to capture contextually relevant information.
The output feature maps of the parallel structured and deformable convolutions are then passed through a feature fusion decision (FFD) module [4]. This module determines the significance of each input feature map based on the spatial descriptors using pixel-wise convolutions. These descriptors are concatenated and subjected to a Softmax operation, resulting in normalized descriptors. The normalized descriptors determine the pixel-wise contribution or weight of each input convolutional feature map in the final fused feature map. The output feature map of the FFD module is obtained as a weighted sum of the input feature maps, where the normalized descriptors serve as pixel-wise weights. The resulting feature map from the FFD module goes through a series of additional operations for deeper feature extraction and normalization.
Experimental settings
Datasets
We evaluate the performance of our proposed network on five intra-domain datasets from three different modalities (video, MRI, and OCT) and two cross-domain datasets from two different modalities. Table 1 details the specifications of adopted datasets, and Fig. 3 presents exemplary images together with the ground-truth segmentations from each dataset. These datasets cover a wide range of object classes with distinct characteristics. For example, endometriosis videos contain amorphous endometrial implants with color and texture variations. OCT scans involve amorphous intraretinal fluid, while prostate MRI images include deformations and variations in scale, contrast, and brightness. In addition, instrument segmentation in cataract and laparoscopy surgeries presents various challenges, such as scale variation, reflection, motion blur, and defocus blur degradation. The diversities in datasets ensure realistic conditions for evaluating the proposed network’s effectiveness in addressing challenges in medical image and surgical video segmentation.Footnote 3 For result reproducibility, we provide all train/test sets as CSV files in the paper’s GitHub repository.
Alternative methods
We compare the effectiveness of our proposed network architecture with eleven state-of-the-art neural networks using different backbones. Table 2 lists the specifications of the baselines and the proposed network. Note that UNet+ is an improved version of UNet, where we use VGG16 as the backbone network and double convolutional blocks (two consecutive convolutions followed by batch normalization and ReLU layers) as decoder modules. To have fair comparisons with alternative methods, we report the performance of DeepPyramid+ with three different backbones (VGG16, ResNet34, and ResNet50).
Training settings
All backbones are initialized with the ImageNet pre-trained parameters. We use a batch size of four for all datasets, set the initial learning rate to 0.001, and decrease it during training using polynomial decay \(lr = lr_{\textrm{init}}\times (1-\frac{\hbox {iter}}{\hbox {total-iter}})^{0.9}\). The input size of the networks is \(512\times 512\) for all datasets. We apply cropping and random rotation (up to \(30^{\circ }\)), color jittering (brightness = 0.7, contrast = 0.7, saturation = 0.7), Gaussian blurring, and random sharpening as augmentations during training, and use the cross-entropy log dice loss during training [6]. All experiments are conducted using NVIDIA RTX:3090 GPUs.
Ablation study settings
To evaluate the effectiveness of different modules, we use the improved version of UNet (UNet+), with the same backbone (VGG16) as our baseline. This network does not include any PVF modules. Besides, the DPR module is replaced with a sequence of two convolutional layers, each of which being followed by a batch normalization layer and a ReLU activation.
Experimental results
Table 3 reports the segmentation performance of the proposed and state-of-the-art networks across three different modalities. DeepPyramid+ consistently demonstrates the highest average performance across all datasets with various backbones, while other methods, such as CPFNet, exhibit varying performance with different backbones and 2.22% compared to DeepPyramid+, respectively. Besides, DeepPyramid+ achieves the best results with all three backbones for endometrial implants and prostate segmentation and the best results with ResNet34 and ResNet50 backbones for IRF segmentation in OCT. Considering instrument segmentation performance (Table 4), DeepPyramid+ with VGG16 backbone shows more than 5.6% gain in segmentation compared to CPFNet as its main alternative (58.93% vs. 53.29%). Across all backbones, DeepPyramid+ with VGG16 backbone shows more than 2.7% higher performance compared to other methods. Besides, the best results for both datasets correspond to DeepPyramid+ with VGG16 backbone. Overall, DeepPyramid+ with our suggested backbone (VGG16) achieves the best segmentation performance in instrument and organ/disease segmentation.
Table 5 compares the cross-domain segmentation performance of DeepPyramid+ and its best two alternatives for three backbones (considering single-domain results in Table 3 and Table 4). Overall, DeepPyramid+ consistently outperforms other methods across all backbones. Considering the MRI dataset, DeepPyramid+ with VGG16 backbone shows more than 4.8% gain in Dice compared to alternatives. For instrument segmentation in cataract surgery, DeepPyramid+ with the VGG16 backbone exhibits an impressive improvement of approximately 19.5% in Dice score compared to CPFNet with the same backbone (55.10% vs. 35.59%), and a 17% improvement compared to the best alternative across all backbones (55.10% vs. 38.10% achieved by UPerNet). This exceptional performance in dealing with cross-domain distribution gaps [28] can be attributed to the effectiveness of the proposed modules in incorporating multi-scale local and global features.
Table 6 provides an ablation study of DeepPyramid+ components. The results suggest that both PVF and DPR modules contribute significantly to improvements in segmentation performance across all datasets. This impact is more prominent in the case of cataract surgery, where the addition of PVF and DPR modules lead to a 4.95% and 4.72% increase in the Dice coefficient, respectively.
Conclusion
In recent years, considerable attention has been devoted to computerized medical image and surgical video analysis. A reliable relevant-instance-segmentation approach is a prerequisite for a majority of these applications. In this paper, we introduce a novel network architecture for semantic segmentation that addresses the challenges encountered in medical image and surgical video segmentation. Our proposed architecture, DeepPyramid+, incorporates two innovative modules, namely “Pyramid View Fusion” and “Deformable Pyramid Reception.” Experimental results demonstrate the effectiveness of DeepPyramid+ in capturing object features in challenging scenarios, including shape and scale variation, reflection and blur degradation, blunt edges, and deformability, resulting in competitive performance in cross-domain segmentation compared to state-of-the-art networks. The ablation study validates the efficacy of the proposed modules in DeepPyramid+, showcasing their performance across diverse datasets. The obtained promising results indicate the potential of DeepPyramid+ to enhance the precision in various computerized medical imaging and surgical video analysis applications.
Notes
This paper is an extended version of DeepPyramid [6], featuring minor enhancements in the DPR module.
The PyTorch implementation of DeepPyramid \(+\) is publicly available at https://github.com/Negin-Ghamsarian/DeepPyramid_Plus.
This paper aims to design a dedicated network tailored to address medical image and video segmentation challenges, emphasizing various modalities but not within a multi-modal training framework. We substantiate the efficacy of our model through distinctive validations across diverse medical image and video datasets.
References
Ghamsarian N, Taschwer M, Putzgruber-Adamitsch D, Sarny S, Schoeffmann K (2021) Relevance detection in cataract surgery videos by spatio-temporal action localization. In: 2020 25th International conference on pattern recognition (ICPR), pp 10720–10727
Ghamsarian N (2020) Enabling relevance-based exploration of cataract videos. In: Proceedings of the 2020 international conference on multimedia retrieval, pp 378–382
Ghamsarian N, Amirpourazarian H, Timmerer C, Taschwer M, Schöffmann K (2020) Relevance-based compression of cataract surgery videos using convolutional neural networks. In: Proceedings of the 28th ACM international conference on multimedia, pp 3577–3585
Ghamsarian N, Taschwer M, Putzgruber-Adamitsch D, Sarny S, El-Shabrawi Y, Schoeffmann K (2021) LensID: a CNN-RNN-based framework towards lens irregularity detection in cataract surgery videos. In: Medical image computing and computer assisted intervention—MICCAI 2021: 24th international conference, Strasbourg, France, September 27–October 1, 2021, Proceedings, Part VIII 24. Springer, pp 76–86
Huang X, Wang H, She C, Feng J, Liu X, Hu X, Chen L, Tao Y (2022) Artificial intelligence promotes the diagnosis and screening of diabetic retinopathy. Front Endocrinol 13:946915
Ghamsarian N, Taschwer M, Sznitman R, Schoeffmann K (2022) Deeppyramid: Enabling pyramid view and deformable pyramid reception for semantic segmentation in cataract surgery videos. In: Medical image computing and computer assisted intervention—MICCAI 2022: 25th international conference, Singapore, September 18–22, 2022, Proceedings, Part V. Springer, pp 276–286
Ronneberger O, Fischer P, Brox T (2015) U-net: Convolutional networks for biomedical image segmentation. In: Navab N, Hornegger J, Wells WM, Frangi AF (eds) Medical Image Computing and Computer-Assisted Intervention—MICCAI 2015. Springer, Cham, pp 234–241
Chen X, Zhang R, Yan P (2019) Feature fusion encoder decoder network for automatic liver lesion segmentation. In: 2019 IEEE 16th international symposium on biomedical imaging (ISBI 2019), pp 430–433
Ni Z-L, Bian G-B, Zhou X-H, Hou Z-G, Xie X-L, Wang C, Zhou Y-J, Li R-Q, Li Z (2019) Raunet: Residual attention u-net for semantic segmentation of cataract surgical instruments. In: Gedeon T, Wong KW, Lee M (eds) Neural Information Processing. Springer, Cham, pp 139–149
Gu Z, Cheng J, Fu H, Zhou K, Hao H, Zhao Y, Zhang T, Gao S, Liu J (2019) Ce-net: Context encoder network for 2d medical image segmentation. IEEE Trans Med Imaging 38(10):2281–2292
Ni Z-L, Bian G-B, Wang G-A, Zhou X-H, Hou Z-G, Chen H-B, Xie X-L (2020) Pyramid attention aggregation network for semantic segmentation of surgical instruments. Proc AAAI Conf Artif Intell 34(07):11782–11790
Ni Z-L, Bian G-B, Wang G-A, Zhou X-H, Hou Z-G, Xie X-L, Li Z, Wang Y-H (2021) Barnet: bilinear attention network with adaptive receptive fields for surgical instrument segmentation. In: Proceedings of the twenty-ninth international conference on international joint conferences on artificial intelligence, pp 832–838
Feng S, Zhao H, Shi F, Cheng X, Wang M, Ma Y, Xiang D, Zhu W, Chen X (2020) CPFNet: Context pyramid fusion network for medical image segmentation. IEEE Trans Med Imaging 39(10):3008–3018
Zhou Z, Siddiquee MMR, Tajbakhsh N, Liang J (2020) Unet++: Redesigning skip connections to exploit multiscale features in image segmentation. IEEE Trans Med Imaging 39(6):1856–1867
Roy AG, Navab N, Wachinger C (2019) Recalibrating fully convolutional networks with spatial and channel “squeeze and excitation" blocks. IEEE Trans Med Imaging 38(2):540–549
Ghamsarian N, Taschwer M, Putzgruber-Adamitsch D, Sarny S, El-Shabrawi Y, Schöffmann K (2021) Recal-net: Joint region-channel-wise calibrated network for semantic segmentation in cataract surgery videos. In: Neural information processing: 28th international conference, ICONIP 2021, Sanur, Bali, Indonesia, December 8–12, 2021, Proceedings, Part III 28. Springer, pp 391–402
Zhao H, Shi J, Qi X, Wang X, Jia J (2017) Pyramid scene parsing network. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR)
Chen L-C, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2018) Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans Pattern Anal Mach Intell 40(4):834–848
Chen L-C, Zhu Y, Papandreou G, Schroff F, Adam H (2018) Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Proceedings of the European conference on computer vision (ECCV)
Ghamsarian N, El-Shabrawi Y, Nasirihaghighi S, Putzgruber-Adamitsch D, Zinkernagel M, Wolf S, Schoeffmann K, Sznitman R (2023) Cataract-1K: cataract surgery dataset for scene segmentation, phase recognition, and irregularity detection. arXiv preprint https://arxiv.org/abs/2312.06295
Bodenstedt S, Speidel S, Allan M, Stoyanov D, Maier-Hein L, Kenngott H, Wagner M (2015) Multi-instrument EndoVis challenge dataset. https://endovissub-instrument.grand-challenge.org/
Leibetseder A, Schoeffmann K, Keckstein J, Keckstein S (2022) Endometriosis detection and localization in laparoscopic gynecology. Multimed Tools Appl 81(5):6191–6215
Liu Q, Dou Q, Yu L, Heng PA (2020) MS-Net: multi-site network for improving prostate segmentation with heterogeneous MRI data. IEEE Trans Med Imaging
Bogunovic H, Venhuizen F, Klimscha S, Apostolopoulos S, Bab-Hadiashar A, Bagci U, Beg MF, Bekalo L, Chen Q, Ciller C, Gopinath K, Gostar AK, Jeon K, Ji Z, Kang SH, Koozekanani DD, Lu D, Morley D, Parhi KK, Park HS, Rashno A, Sarunic M, Shaikh S, Sivaswamy J, Tennakoon R, Yadav S, De Zanet S, Waldstein SM, Gerendas BS, Klaver C, Sánchez CI, Schmidt-Erfurth U (2019) Retouch: the retinal oct fluid detection and segmentation benchmark and challenge. IEEE Trans Med Imaging 38(8):1858–1874
Grammatikopoulou M, Flouty E, Kadkhodamohammadi A, Quellec G, Chow A, Nehme J, Luengo I, Stoyanov D (2021) CaDIS: Cataract dataset for surgical RGB-image segmentation. Med Image Anal 71:102053
Chen L-C, Zhu Y, Papandreou G, Schroff F, Adam H (2018) Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Proceedings of the European conference on computer vision (ECCV), pp 801–818
Xiao T, Liu Y, Zhou B, Jiang Y, Sun J (2018) Unified perceptual parsing for scene understanding. In: Proceedings of the European conference on computer vision (ECCV), pp 418–434
Ghamsarian N, Gamazo Tejero J, Márquez-Neila P, Wolf S, Zinkernagel M, Schoeffmann K, Sznitman R (2023) Domain adaptation for medical image segmentation using transformation-invariant self-training. In: International conference on medical image computing and computer-assisted intervention. Springer, pp 331–341
Funding
Open access funding provided by University of Bern
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Ethical approval
For this type of study, formal consent is not required.
Informed consent
This article uses patient data from publicly available datasets.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This work was funded by Haag-Streit Foundation, Switzerland.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Ghamsarian, N., Wolf, S., Zinkernagel, M. et al. DeepPyramid+: medical image segmentation using Pyramid View Fusion and Deformable Pyramid Reception. Int J CARS 19, 851–859 (2024). https://doi.org/10.1007/s11548-023-03046-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11548-023-03046-2