Abstract
Accurate segmentation of polyps is crucial for efficient colorectal cancer detection during the colonoscopy screenings. State Space Models, exemplified by Mamba, have recently emerged as a promising approach, excelling in long-range interaction modeling with linear computational complexity. However, previous methods do not consider the cross-scale dependencies of different pixels and the consistency in feature representations and semantic embedding, which are crucial for polyp segmentation. Therefore, we introduce Polyp-Mamba, a novel unified framework aimed at overcoming the above limitations by integrating multi-scale feature learning with semantic structure analysis. Specifically, our framework includes a Scale-Aware Semantic module that enables the embedding of multi-scale features from the encoder to achieve semantic information modeling across both intra- and inter-scales, rather than the single-scale approach employed in prior studies. Furthermore, the Global Semantic Injection module is deployed to inject scale-aware semantics into the corresponding decoder features, aiming to fuse global and local information and enhance pyramid feature representation. Experimental results across five challenging datasets and six metrics demonstrate that our proposed method not only surpasses state-of-the-art methods but also sets a new benchmark in the field, underscoring the Polyp-Mamba framework’s exceptional proficiency in the polyp segmentation tasks.
Z. Xu and F. Tang—Contribute equally to this work.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Pooler, B.D., et al.: Growth rates and histopathological outcomes of small (6–9 mm) colorectal polyps based on CT colonography surveillance and endoscopic removal. Gut 72(12), 2321–2328 (2023)
Djinbachian, R., Iratni, R., Durand, M., Marques, P., von Renteln, D.: Rates of incomplete resection of 1-to 20-mm colorectal polyps: a systematic review and meta-analysis. Gastroenterology 159(3), 904–914 (2020)
Haggar, F.A., Boushey, R.P.: Colorectal cancer epidemiology: incidence, mortality, survival, and risk factors. Clin. Colon Rectal Surg. 22(04), 191–197 (2009)
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: CVPR (2015)
Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image segmentation. In: MICCAI (2015)
Li, W., Xiong, X., Li, S., Fan, F.: Hybridvps: hybrid-supervised video polyp segmentation under low-cost labels. In: IEEE Signal Processing Letters (2023)
Li, W., Lu, W., Chu, J., Fan, F.: LACINet: a lesion-aware contextual interaction network for polyp segmentation. In: IEEE Transactions on Instrumentation and Measurement (2023)
Tang, F., Xu, Z., Qu, Z., Feng, W., Jiang, X., Ge, Z.: Hunting attributes: context prototype-aware learning for weakly supervised semantic segmentation. In: CVPR (2024)
Xia, P., et al.: Generalizing to unseen domains in diabetic retinopathy with disentangled representations. In: MICCAI (2024)
Zhao, X., Tang, F., Wang, X., Xiao, J.: SFC: shared feature calibration in weakly supervised semantic segmentation. Proc. AAAI Conf. Artif. Intell. 38(7), 7525–7533 (2024). https://doi.org/10.1609/aaai.v38i7.28584
Wang, J., Huang, Q., Tang, F., Meng, J., Su, J., Song, S.: Stepwise feature fusion: local guides global. In: Wang, L., Dou, Q., Fletcher, P.T., Speidel, S., Li, S. (eds.) MICCAI 2022, pp. 110–120. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-16437-8_11
Tang, F., et al.: DuAT: dual-aggregation transformer network for medical image segmentation. In: Liu, Q., et al. (eds.) Pattern Recognition and Computer Vision, pp. 343–356. Springer, Singapore (2024). https://doi.org/10.1007/978-981-99-8469-5_27
Dosovitskiy, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)
Vaswani, A., et al.: Attention is all you need: NeurIPS (2017)
Kalman, R.E.: A new approach to linear filtering and prediction problems. J. Basic Eng. 82(1), 35–45 (1960)
Gu, A., Dao, T.: Mamba: linear-time sequence modeling with selective state spaces. arXiv preprint arXiv:2312.00752 (2023)
Zhu, L., Liao, B., Zhang, Q., Wang, X., Liu, W., Wang, X.: Vision mamba: efficient visual representation learning with bidirectional state space model. arXiv preprint arXiv:2401.09417 (2024)
Ma, J., Li, F., Wang, B.: U-mamba: enhancing long-range dependency for biomedical image segmentation. arXiv preprint arXiv:2401.04722 (2024)
Xing, Z., Ye, T., Yang, Y., Liu, G., Zhu, L.: Segmamba: long-range sequential modeling mamba for 3D medical image segmentation. arXiv preprint arXiv:2401.13560 (2024)
Wang, Z., Zheng, J.Q., Zhang, Y., Cui, G., Li, L.: Mamba-UNet: UNet-like pure visual mamba for medical image segmentation. arXiv preprint arXiv:2402.05079 (2024)
Ruan, J., Xiang, S.: VM-UNET: vision mamba UNet for medical image segmentation. arXiv preprint arXiv:2402.02491 (2024)
Cao, H., et al.: Swin-Unet: Unet-like pure transformer for medical image segmentation. In: ECCV (2022)
Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer normalization. arXiv preprint arXiv:1607.06450 (2016)
Liu, Y., et al.: Vmamba: visual state space model. arXiv preprint arXiv:2401.10166 (2024)
Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Netw. 107, 3–11 (2018)
Wei, J., Wang, S., Huang, Q.: F\(^3\)net: fusion, feedback and focus for salient object detection. In: AAAI (2020)
Vázquez, D., et al.: A benchmark for endoluminal scene segmentation of colonoscopy images. J. Healthc. Eng. 2017(1), 4037190 (2017)
Silva, J., Histace, A., Romain, O., Dray, X., Granado, B.: Toward embedded detection of polyps in WCE images for early diagnosis of colorectal cancer. Int. J. Comput. Assist. Radiol. Surg. 9, 283–293 (2014). https://doi.org/10.1007/s11548-013-0926-3
Bernal, J., Sánchez, F.J., Fernández-Esparrach, G., Gil, D., Rodríguez, C., Vilariño, F.: WM-DOVA maps for accurate polyp highlighting in colonoscopy: validation vs. saliency maps from physicians. Comput. Med. Imag. Graph. 43, 99–111 (2015)
Tajbakhsh, N., Gurudu, S.R., Liang, J.: Automated polyp detection in colonoscopy videos using shape and context information. In: IEEE Transactions on Medical Imaging (2015)
Jha, D., et al.: Kvasir-SEG: a segmented polyp dataset. In: Ro, Y.M., et al. (eds.) MMM 2020. LNCS, vol. 11962, pp. 451–462. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-37734-2_37
Dong, B., Wang, W., Fan, D.P., Li, J., Fu, H., Shao, L.: Polyp-PVT: polyp segmentation with pyramid vision transformers. In: AIR (2023)
Fan, D.P., et al.: PraNet: parallel reverse attention network for polyp segmentation. In: MICCAI (2020)
Zhang, Y., Liu, H., Hu, Q.: TransFuse: fusing transformers and CNNs for medical image segmentation. In: MICCAI (2021)
Jain, S., et al.: CoInNet: a convolution-involution network with a novel statistical attention for automatic polyp segmentation. IEEE Trans. Med. Imag. 42, 3987–4000 (2023)
Su, Y., Shen, Y., Ye, J., He, J., Cheng, J.: Revisiting feature propagation and aggregation in polyp segmentation. In: Greenspan, H., et al. (eds.) MICCAI 2023, pp. 632–641. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-43904-9_61
Shao, H., Zhang, Y., Hou, Q.: Polyper: boundary sensitive polyp segmentation. In: AAAI (2024)
Fan, D.P., Gong, C., Cao, Y., Ren, B., Cheng, M.M., Borji, A.: Enhanced-alignment measure for binary foreground map evaluation. arXiv preprint arXiv:1805.10421 (2018)
Fan, D.P., Cheng, M.M., Liu, Y., Li, T., Borji, A.: Structure-measure: a new way to evaluate foreground maps. In: ICCV (2017)
Fang, Y., Chen, C., Yuan, Y., Tong, K.Y.: Selective feature aggregation network with area-boundary constraints for polyp segmentation. In: MICCAI (2019)
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Ethics declarations
Disclosure of Interests
The authors declare that they have no competing interests.
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Xu, Z. et al. (2024). Polyp-Mamba: Polyp Segmentation with Visual Mamba. In: Linguraru, M.G., et al. Medical Image Computing and Computer Assisted Intervention – MICCAI 2024. MICCAI 2024. Lecture Notes in Computer Science, vol 15008. Springer, Cham. https://doi.org/10.1007/978-3-031-72111-3_48
Download citation
DOI: https://doi.org/10.1007/978-3-031-72111-3_48
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-72110-6
Online ISBN: 978-3-031-72111-3
eBook Packages: Computer ScienceComputer Science (R0)