ABP: Asymmetric Bilateral Prompting for Text-Guided Medical Image Segmentation

Xinyi Zeng¹⁴,
Pinxian Zeng¹⁴,
Jiaqi Cui¹⁴,
Aibing Li¹⁴,
Bo Liu¹⁵,
Chengdi Wang¹⁶ &
…
Yan Wang¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15009))

Included in the following conference series:

International Conference on Medical Image Computing and Computer-Assisted Intervention

991 Accesses

Abstract

Deep learning-based segmentation models have made remarkable progress in aiding pulmonary disease diagnosis by segmenting lung lesion areas in large amounts of annotated X-ray images. Recently, to alleviate the demand for medical image data and further improve segmentation performance, various studies have extended mono-modal models to incorporate additional modalities, such as diagnostic textual notes. Despite the prevalent utilization of cross-attention mechanisms or their variants to model interactions between visual and textual features, current text-guided medical image segmentation approaches still face limitations. These include a lack of adaptive adjustments for text tokens to accommodate variations in image contexts, as well as a deficiency in exploring and utilizing text-prior information. To mitigate these limitations, we propose Asymmetric Bilateral Prompting (ABP), a novel method tailored for text-guided medical image segmentation. Specifically, we introduce an ABP block preceding each up-sample stage in the image decoder. This block first integrates a symmetric bilateral cross-attention module for both textual and visual branches to model preliminary multi-modal interactions. Then, guided by the opposite modality, two asymmetric operations are employed for further modality-specific refinement. Notably, we utilize attention scores from the image branch as attentiveness rankings to prune and remove redundant text tokens, ensuring that the image features are progressively interacted with more attentive text tokens during up-sampling. Asymmetrically, we integrate attention scores from the text branch as text-prior information to enhance visual representations and target predictions in the visual branch. Experimental results on the QaTa-COV19 dataset validate the superiority of our proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Multi-Perspective Text-Guided Multimodal Fusion Network for Brain Tumor Segmentation

SimTxtSeg: Weakly-Supervised Medical Image Segmentation with Simple Text Cues

MedCLIP-SAM: Bridging Text and Image Towards Universal Medical Image Segmentation

References

Lalmuanawma, S., Hussain, J., Chhakchhuak, L.: Applications of machine learning and artificial intelligence for Covid-19 (SARS-CoV-2) pandemic: a review. Chaos, Solitons Fract. 139, 110059 (2020)
Article MathSciNet Google Scholar
Shi, F., Wang, J., Shi, J., et al.: Review of artificial intelligence techniques in imaging data acquisition, segmentation, and diagnosis for COVID-19. IEEE Rev. Biomed. Eng. 14, 4–15 (2020)
Article Google Scholar
Wang, K., Zhan, B., Zu, C., Wu, X., et al.: Semi-supervised medical image segmentation via a tripled-uncertainty guided mean teacher model with contrastive learning. Med. Image Anal. 79, 102447 (2022)
Article Google Scholar
Degerli, A, Ahishali, M, Kiranyaz, S, et al.: Reliable covid-19 detection using chest X-ray images. In: IEEE International Conference on Image Processing, pp. 185–189 (2021)
Google Scholar
Tang, C., Zeng, X., Zhou, L., Zhou, Q., et al.: Semi-supervised medical image segmentation via hard positives oriented contrastive learning. Pattern Recogn. 146, 110020 (2024)
Article Google Scholar
Qiu, Y., Liu, Y., Li, S., et al.: MiniSeg: an extremely minimum network for efficient covid-19 segmentation. In: AAAI Conference on Artificial Intelligence, vol. 35, issue (6), pp. 4846–4854 (2021)
Google Scholar
Tang, P., Yang, P., Nie, D., et al.: Unified medical image segmentation by learning from uncertainty in an end-to-end manner. Knowl.-Based Syst. 241, 108215 (2022)
Google Scholar
Ronneberger, O., Fischer, P., Brox, T., et al.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W., Frangi, A. (eds.) MICCAI 2015, Part III 18, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Chapter Google Scholar
Zhou, Z., Rahman Siddiquee, M.M., Tajbakhsh, N., et al.: UNet++: a nested U-Net architecture for medical image segmentation. In: 4th International Workshop, DLMIA 2018, and 8th International Workshop, ML-CDS 2018, Held in Conjunction with MICCAI 2018, Proceedings 4, pp. 3-11. Springer,Cham (2018). https://doi.org/10.1007/978-3-030-00889-5_1
Huang, H., Lin, L., Tong, R., et al.: UNet 3+: a full-scale connected unet for medical image segmentation. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp.1055–1059 (2020)
Google Scholar
Nguyen, T., Hua, B.S., Le, N.: 3D-UCaps: 3D Capsules unet for volumetric image segmentation. In: de Bruijne, M., et al. (eds.) MICCAI 2021, Part I 24, pp. 548–558. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-87193-2_52
Chapter Google Scholar
Chen, J., Lu, Y., Yu, Q., et al.: TransUNet: transformers make strong encoders for medical image segmentation. arXiv preprint arXiv:2102.04306 (2021)
Yan, X., Tang, H., Sun, S., et al.: AFTer-UNet: axial fusion transformer unet for medical image segmentation. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 3971–3981 (2022)
Google Scholar
Cao, H., Wang, Y., Chen, J., et al.: Swin-Unet: Unet-like pure transformer for medical image segmentation. In: European Conference on Computer Vision, pp. 205–218. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-25066-8_9
Hatamizadeh, A., Nath, V., Tang, Y., et al.: Swin UNETR: swin transformers for semantic segmentation of brain tumors in MRI images. In: International MICCAI Brainlesion Workshop, pp. 272–284. Springer, Cham (2021). https://doi.org/10.1007/978-3-031-08999-2_22
Oktay, O., Schlemper, J., Folgoc, L.L., et al.: Attention U-Net: learning where to look for the pancreas. arXiv preprint arXiv:1804.03999 (2018)
Zeng, X., Zeng, P., Tang, C., et al.: DBTrans: a dual-branch vision transformer for multi-modal brain tumor segmentation. In: Greenspan, H., et al. (eds.) MICCAI 2023, pp. 502–512. Springer, Cham (2023)
Google Scholar
Uppal, S., Bhagat, S., Hazarika, D., et al.: Multimodal research in vision and language: a review of current and emerging trends. Inf. Fus. 77, 149–171 (2022)
Article Google Scholar
Chen, F.L., Zhang, D.Z., Han, M.L., et al.: VLP: a survey on vision-language pre-training. Mach. Intell. Res. 20(1), 38–56 (2023)
Article Google Scholar
Zhang, Z., Yao, L., Wang, B., et al.: EMIT-Diff: enhancing medical image segmentation via text-guided diffusion model. arXiv preprint arXiv:2310.12868 (2023)
Wang, P., Chung, A.C.S.: DoubleU-Net.: colorectal cancer diagnosis and gland instance segmentation with text-guided feature control. In: European Conference on Computer Vision, pp. 338–354. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-66415-2_22
Li, Z., Li, Y., Li, Q., et al.: LViT: language meets vision transformer in medical image segmentation. IEEE Trans. Med. Imaging 43(1), 96–107 (2023)
Article Google Scholar
Tomar, N.K., Jha, D., Bagci, U., et al.: TGANet: text-guided attention for improved polyp segmentation. In: Wang, L., et al. (eds.) MICCAI 2022, pp. 151–160. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-16437-8_15
Chapter Google Scholar
Poudel, K., Dhakal, M., Bhandari, P., et al.: Exploring transfer learning in medical image segmentation using vision-language models. arXiv preprint arXiv:2308.07706 (2023)
Lee, G.E., Kim, S.H., Cho, J., et al.: Text-Guided cross-position attention for segmentation: case of medical image. In: Greenspan, H., et al. (eds.) MICCAI 2023, pp. 537–546. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-16437-8_15
Chapter Google Scholar
Zhong, Y., Xu, M., Liang, K., et al.: Ariadne’s Thread: using text prompts to improve segmentation of infected areas from chest X-ray images. In: Greenspan, H., et al. (eds.) MICCAI 2023, pp. 724–733. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-43901-8_69
Chapter Google Scholar
Kim, S., Shen S, Thorsley D, et al.: Learned token pruning for transformers. In: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 784–794 (2022)
Google Scholar
Ma, J., Guo, S., Zhang, L.: Text prior guided scene text image super-resolution. IEEE Trans. Image Process. 32, 1341–1353 (2023)
Article Google Scholar
Boecking, B., Usuyama, N., Bannur, S., et al.: Making the most of text semantics to improve biomedical vision–language processing. In: European Conference on Computer Vision, pp. 1–21. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-20059-5_1
Liu, Z., Mao, H., Wu, C.Y., et al.: A convnet for the 2020s. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11976–11986 (2022)
Google Scholar

Download references

Acknowledgments

This work is supported by National Natural Science Foundation of China (NSFC 62371325, 62071314), Sichuan Science and Technology Program 2023YFG0025, 2023YFG0101, and 2023 Science and Technology Project of Sichuan Health Com-mission 23LCYJ002.

Author information

Authors and Affiliations

School of Computer Science, Sichuan University, Chengdu, China
Xinyi Zeng, Pinxian Zeng, Jiaqi Cui, Aibing Li & Yan Wang
Department of Computing, The Hong Kong Polytechnic University, Hong Kong, China
Bo Liu
Department of Respiratory and Critical Care Medicine, West China Hospital, Chengdu, China
Chengdi Wang

Authors

Xinyi Zeng
View author publications
You can also search for this author in PubMed Google Scholar
Pinxian Zeng
View author publications
You can also search for this author in PubMed Google Scholar
Jiaqi Cui
View author publications
You can also search for this author in PubMed Google Scholar
Aibing Li
View author publications
You can also search for this author in PubMed Google Scholar
Bo Liu
View author publications
You can also search for this author in PubMed Google Scholar
Chengdi Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yan Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yan Wang .

Editor information

Editors and Affiliations

Children’s National Hospital/George Washington University, Washington, DC, USA
Marius George Linguraru
The Chinese University of Hong Kong, Hong Kong, China
Qi Dou
Technical University of Denmark, Kgs Lyngby, Denmark
Aasa Feragen
Imperial College London, London, UK
Stamatia Giannarou
Imperial College London, London, UK
Ben Glocker
Universitat de Barcelona, Barcelona, Spain
Karim Lekadir
Helmholtz Munich, Technical University of Munich and King’s College London, Munich, Germany
Julia A. Schnabel

Ethics declarations

Disclosure of Interests

The authors have no competing interests to declare that are relevant to the content of this article.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zeng, X. et al. (2024). ABP: Asymmetric Bilateral Prompting for Text-Guided Medical Image Segmentation. In: Linguraru, M.G., et al. Medical Image Computing and Computer Assisted Intervention – MICCAI 2024. MICCAI 2024. Lecture Notes in Computer Science, vol 15009. Springer, Cham. https://doi.org/10.1007/978-3-031-72114-4_6

Download citation

DOI: https://doi.org/10.1007/978-3-031-72114-4_6
Published: 03 October 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-72113-7
Online ISBN: 978-3-031-72114-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The Medical Image Computing and Computer Assisted Intervention Society (opens in a new tab)

ABP: Asymmetric Bilateral Prompting for Text-Guided Medical Image Segmentation

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Multi-Perspective Text-Guided Multimodal Fusion Network for Brain Tumor Segmentation

SimTxtSeg: Weakly-Supervised Medical Image Segmentation with Simple Text Cues

MedCLIP-SAM: Bridging Text and Image Towards Universal Medical Image Segmentation

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Ethics declarations

Disclosure of Interests

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Societies and partnerships

Subscribe and save

Buy Now

Navigation

ABP: Asymmetric Bilateral Prompting for Text-Guided Medical Image Segmentation

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Multi-Perspective Text-Guided Multimodal Fusion Network for Brain Tumor Segmentation

SimTxtSeg: Weakly-Supervised Medical Image Segmentation with Simple Text Cues

MedCLIP-SAM: Bridging Text and Image Towards Universal Medical Image Segmentation

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Ethics declarations

Disclosure of Interests

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation