PromptSmooth: Certifying Robustness of Medical Vision-Language Models via Prompt Learning

Noor Hussein¹⁴,
Fahad Shamshad¹⁴,
Muzammal Naseer¹⁴ &
…
Karthik Nandakumar¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15012))

Included in the following conference series:

International Conference on Medical Image Computing and Computer-Assisted Intervention

517 Accesses

Abstract

Medical vision-language models (Med-VLMs) trained on large datasets of medical image-text pairs and later fine-tuned for specific tasks have emerged as a mainstream paradigm in medical image analysis. However, recent studies have highlighted the susceptibility of these Med-VLMs to adversarial attacks, raising concerns about their safety and robustness. Randomized smoothing is a well-known technique for turning any classifier into a model that is certifiably robust to adversarial perturbations. However, this approach requires retraining the Med-VLM-based classifier so that it classifies well under Gaussian noise, which is often infeasible in practice. In this paper, we propose a novel framework called PromptSmooth to achieve efficient certified robustness of Med-VLMs by leveraging the concept of prompt learning. Given any pre-trained Med-VLM, PromptSmooth adapts it to handle Gaussian noise by learning textual prompts in a zero-shot or few-shot manner, achieving a delicate balance between accuracy and robustness, while minimizing the computational overhead. Moreover, PromptSmooth requires only a single model to handle multiple noise levels, which substantially reduces the computational cost compared to traditional methods that rely on training a separate model for each noise level. Comprehensive experiments based on three Med-VLMs and across six downstream datasets of various imaging modalities demonstrate the efficacy of PromptSmooth. Our code and models are available at https://github.com/nhussein/PromptSmooth.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Adversarial Prompt Tuning for Vision-Language Models

Pseudo-Prompt Generating in Pre-trained Vision-Language Models for Multi-label Medical Image Classification

M-FLAG: Medical Vision-Language Pre-training with Frozen Language Models and Latent Space Geometry Optimization

References

Athalye, A., Carlini, N., Wagner, D.: Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples. In: International conference on machine learning. pp. 274–283. PMLR (2018)
Google Scholar
Azad, B., Azad, R., Eskandari, S., Bozorgpour, A., Kazerouni, A., Rekik, I., Merhof, D.: Foundational models in medical imaging: A comprehensive survey and future vision. arXiv preprint arXiv:2310.18689 (2023)
Carlini, N., Tramer, F., Dvijotham, K.D., Rice, L., Sun, M., Kolter, J.Z.: (certified!!) adversarial robustness for free! arXiv preprint arXiv:2206.10550 (2022)
Cohen, J., Rosenfeld, E., Kolter, Z.: Certified adversarial robustness via randomized smoothing. In: international conference on machine learning. pp. 1310–1320. PMLR (2019)
Google Scholar
Dong, J., Chen, J., Xie, X., Lai, J., Chen, H.: Adversarial attack and defense for medical image analysis: Methods and applications. arXiv preprint arXiv:2303.14133 (2023)
Finlayson, S.G., Bowers, J.D., Ito, J., Zittrain, J.L., Beam, A.L., Kohane, I.S.: Adversarial attacks on medical machine learning. Science 363(6433), 1287–1289 (2019)
Article Google Scholar
Gamper, J., Alemi Koohbanani, N., Benet, K., Khuram, A., Rajpoot, N.: Pannuke: an open pan-cancer histology dataset for nuclei instance segmentation and classification. In: Digital Pathology: 15th European Congress, ECDP 2019, Warwick, UK, April 10–13, 2019, Proceedings 15. pp. 11–19. Springer (2019)
Google Scholar
Han, T., Nebelung, S., Khader, F., Wang, T., Mueller-Franzes, C., Försch, S., Kleesiek, C., Bressem, K.K., et al.: Medical foundation models are susceptible to targeted misinformation attacks. arXiv preprint arXiv:2309.17007 (2023)
Huang, Z., Bianchi, F., Yuksekgonul, M., Montine, T.J., Zou, J.: A visual–language foundation model for pathology image analysis using medical twitter. Nature medicine 29(9), 2307–2316 (2023)
Google Scholar
Ikezogwo, W., Seyfioglu, S., Ghezloo, F., Geva, D., Sheikh Mohammed, F., Anand, P.K., Krishna, R., Shapiro, L.: Quilt-1m: One million image-text pairs for histopathology. Advances in Neural Information Processing Systems 36 (2024)
Google Scholar
Kather, J.N., Krisam, J., Charoentong, P., Luedde, T., Herpel, E., Weis, C.A., Gaiser, T., Marx, A., Valous, N.A., Ferber, D., et al.: Predicting survival from colorectal cancer histology slides using deep learning: A retrospective multicenter study. PLoS medicine 16(1), e1002730 (2019)
Article Google Scholar
Kriegsmann, K., Lobers, F., Zgorzelski, C., Kriegsmann, J., Janssen, C., Meliss, R.R., Muley, T., Sack, U., Steinbuss, G., Kriegsmann, M.: Deep learning for the detection of anatomical tissue structures and neoplasms of the skin on scanned histopathological tissue sections. Frontiers in Oncology 12, 1022967 (2022)
Article Google Scholar
Kumari, A., Bhardwaj, D., Jindal, S., Gupta, S.: Trust, but verify: A survey of randomized smoothing techniques. arXiv preprint arXiv:2312.12608 (2023)
Laousy, O., Araujo, A., Chassagnon, G., Paragios, N., Revel, M.P., Vakalopoulou, M.: Certification of deep learning models for medical image segmentation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. pp. 611–621. Springer (2023)
Google Scholar
Lecuyer, M., Atlidakis, V., Geambasu, R., Hsu, D., Jana, S.: Certified robustness to adversarial examples with differential privacy. In: 2019 IEEE symposium on security and privacy (SP). pp. 656–672. IEEE (2019)
Google Scholar
Li, L., Xie, T., Li, B.: Sok: Certified robustness for deep neural networks. In: 2023 IEEE symposium on security and privacy (SP). pp. 1289–1310. IEEE (2023)
Google Scholar
Qiu, K., Zhang, H., Wu, Z., Lin, S.: Exploring transferability for randomized smoothing. arXiv preprint arXiv:2312.09020 (2023)
Radford, A., Kim, J.W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., et al.: Learning transferable visual models from natural language supervision. In: International conference on machine learning. pp. 8748–8763. PMLR (2021)
Google Scholar
Salman, H., Sun, M., Yang, G., Kapoor, A., Kolter, J.Z.: Denoised smoothing: A provable defense for pretrained classifiers. Advances in Neural Information Processing Systems 33, 21945–21957 (2020)
Google Scholar
Shih, G., Wu, C.C., Halabi, S.S., Kohli, M.D., Prevedello, L.M., Cook, T.S., Sharma, A., Amorosa, J.K., Arteaga, V., Galperin-Aizenberg, M., et al.: Augmenting the national institutes of health chest radiograph dataset with expert annotations of possible pneumonia. Radiology: Artificial Intelligence 1(1), e180041 (2019)
Google Scholar
Shrestha, P., Amgain, S., Khanal, B., Linte, C.A., Bhattarai, B.: Medical vision language pretraining: A survey. arXiv preprint arXiv:2312.06224 (2023)
Shu, M., Nie, W., Huang, D.A., Yu, Z., Goldstein, T., Anandkumar, A., Xiao, C.: Test-time prompt tuning for zero-shot generalization in vision-language models. Advances in Neural Information Processing Systems 35, 14274–14289 (2022)
Google Scholar
Silva-Rodriguez, J., Chakor, H., Kobbi, R., Dolz, J., Ayed, I.B.: A foundation language-image model of the retina (flair): Encoding expert knowledge in text supervision. arXiv preprint arXiv:2308.07898 (2023)
Silva-Rodríguez, J., Colomer, A., Sales, M.A., Molina, R., Naranjo, V.: Going deeper through the gleason scoring scale: An automatic end-to-end system for histology prostate grading and cribriform pattern detection. Computer methods and programs in biomedicine 195, 105637 (2020)
Article Google Scholar
Tawsifur, R., Amith, K., Yazan, Q., Anas, T., Serkan, K., Abul, K.S.B., Tariqul, I.M., Somaya, A.M.: Zughaier susu m, khan muhammad salman, et al. Exploring the effect of image enhancement techniques on covid-19 detection using chest x-ray images. Computers in biology and medicine 132, 104319 (2021)
Google Scholar
Wang, Z., Wu, Z., Agarwal, D., Sun, J.: Medclip: Contrastive learning from unpaired medical images and text. arXiv preprint arXiv:2210.10163 (2022)
Zhang, J., Kapse, S., Ma, K., Prasanna, P., Saltz, J., Vakalopoulou, M., Samaras, D.: Prompt-mil: Boosting multi-instance learning schemes via task-specific prompt tuning. arXiv preprint arXiv:2303.12214 (2023)
Zhao, Y., Pang, T., Du, C., Yang, X., Li, C., Cheung, N.M.M., Lin, M.: On evaluating adversarial robustness of large vision-language models. Advances in Neural Information Processing Systems 36 (2024)
Google Scholar
Zhao, Z., Liu, Y., Wu, H., Li, Y., Wang, S., Teng, L., Liu, D., Li, X., Cui, Z., Wang, Q., et al.: Clip in medical imaging: A comprehensive survey. arXiv preprint arXiv:2312.07353 (2023)
Zhong, Y., Xu, M., Liang, K., Chen, K., Wu, M.: Ariadne’s thread: Using text prompts to improve segmentation of infected areas from chest x-ray images. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. pp. 724–733. Springer (2023)
Google Scholar
Zhou, K., Yang, J., Loy, C.C., Liu, Z.: Learning to prompt for vision-language models. International Journal of Computer Vision 130(9), 2337–2348 (2022)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Mohamed Bin Zayed University of Artificial Intelligence, Abu Dhabi, UAE
Noor Hussein, Fahad Shamshad, Muzammal Naseer & Karthik Nandakumar

Authors

Noor Hussein
View author publications
You can also search for this author in PubMed Google Scholar
Fahad Shamshad
View author publications
You can also search for this author in PubMed Google Scholar
Muzammal Naseer
View author publications
You can also search for this author in PubMed Google Scholar
Karthik Nandakumar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Noor Hussein .

Editor information

Editors and Affiliations

Children’s National Hospital/George Washington University, Washington, DC, USA
Marius George Linguraru
The Chinese University of Hong Kong, Hong Kong, China
Qi Dou
Technical University of Denmark, Kgs Lyngby, Denmark
Aasa Feragen
Imperial College London, London, UK
Stamatia Giannarou
Imperial College London, London, UK
Ben Glocker
Universitat de Barcelona, Barcelona, Spain
Karim Lekadir
Helmholtz Munich, Technical University of Munich and King’s College London, Munich, Germany
Julia A. Schnabel

Ethics declarations

Disclosure of Interests

The authors have no competing interests to declare that are relevant to the content of this article.

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 130 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hussein, N., Shamshad, F., Naseer, M., Nandakumar, K. (2024). PromptSmooth: Certifying Robustness of Medical Vision-Language Models via Prompt Learning. In: Linguraru, M.G., et al. Medical Image Computing and Computer Assisted Intervention – MICCAI 2024. MICCAI 2024. Lecture Notes in Computer Science, vol 15012. Springer, Cham. https://doi.org/10.1007/978-3-031-72390-2_65

Download citation

DOI: https://doi.org/10.1007/978-3-031-72390-2_65
Published: 23 October 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-72389-6
Online ISBN: 978-3-031-72390-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The Medical Image Computing and Computer Assisted Intervention Society (opens in a new tab)

PromptSmooth: Certifying Robustness of Medical Vision-Language Models via Prompt Learning

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Adversarial Prompt Tuning for Vision-Language Models

Pseudo-Prompt Generating in Pre-trained Vision-Language Models for Multi-label Medical Image Classification

M-FLAG: Medical Vision-Language Pre-training with Frozen Language Models and Latent Space Geometry Optimization

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Ethics declarations

Disclosure of Interests

1 Electronic supplementary material

Supplementary material 1 (pdf 130 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Societies and partnerships

Subscribe and save

Buy Now

Navigation

PromptSmooth: Certifying Robustness of Medical Vision-Language Models via Prompt Learning

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Adversarial Prompt Tuning for Vision-Language Models

Pseudo-Prompt Generating in Pre-trained Vision-Language Models for Multi-label Medical Image Classification

M-FLAG: Medical Vision-Language Pre-training with Frozen Language Models and Latent Space Geometry Optimization

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Ethics declarations

Disclosure of Interests

1 Electronic supplementary material

Supplementary material 1 (pdf 130 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation