research-article

Prompting Industrial Anomaly Segment with Large Vision-Language Models

Authors:

feiniu yuanAuthors Info & Claims

MMASIA '24: Proceedings of the 6th ACM International Conference on Multimedia in Asia

Article No.: 32, Page 1

https://doi.org/10.1145/3696409.3700192

Published: 28 December 2024 Publication History

Abstract

Industrial Anomaly Detection (IAD) aims to identify and locate defects in images like industrial part defects, which is usually achieved by visual inspection methods. Existing works mostly design and train dedicated models for each anomaly category, which need abundant normal images and multiple training procedures. Such limitations prevent the above methods from being used in practice. To tackle the IAD task in a single model without the need for extra normal images, we propose a zero-shot IAD framework named Prompting Anomaly Segment Model (PASM). PASM extracts multimodal information from the images and then designs a novel pipeline to guide the general Large Vision-Language Models (LVLMs) to tackle the IAD task, which is out of the original targets of LVLMs. The proposed PASM can generate anomaly maps with high quality and high confidence. Experimental results on two widely used benchmark datasets demonstrate that PASM outperforms the models specially trained for anomaly segmentation in both quantitative and qualitative evaluations. Compared to suboptimal methods, we achieve 5.3%, 1.4% and 9.0% improvement in image-level AUC, pixel-level AUC and max-F1-pixel on the VisA dataset in zero-shot setting. In addition, PASM can be applied in a few-shot setting and obtain further improvement. The code and model will be released later.

Supplemental Material

ZIP File

Appendix

Download
3.75 MB

PDF File

Appendix

Download
1.21 MB

References

[1]

Jean-Baptiste Alayrac, Jeff Donahue, Pauline Luc, Antoine Miech, Iain Barr, Yana Hasson, Karel Lenc, Arthur Mensch, Katherine Millican, Malcolm Reynolds, et al. 2022. Flamingo: a Visual Language Model for Few-Shot Learning. In Advances in Neural Information Processing Systems.

[2]

Paul Bergmann, Kilian Batzner, Michael Fauser, David Sattlegger, and Carsten Steger. 2022. Beyond Dents and Scratches: Logical Constraints in Unsupervised Anomaly Detection and Localization. International Journal of Computer Vision 130, 4 (2022), 947–969.

[3]

Paul Bergmann, Michael Fauser, David Sattlegger, and Carsten Steger. 2019. MVTec AD - A Comprehensive Real-World Dataset for Unsupervised Anomaly Detection. In IEEE Conference on Computer Vision and Pattern Recognition. 9592–9600.

[4]

Nenglun Chen, Xingjia Pan, Runnan Chen, Lei Yang, Zhiwen Lin, Yuqiang Ren, Haolei Yuan, Xiaowei Guo, Feiyue Huang, and Wenping Wang. 2021. Distributed Attention for Grounded Image Captioning. In MM ’21: ACM Multimedia Conference, Virtual Event, China, October 20 - 24, 2021. ACM, 1966–1975.

Digital Library

[5]

Niv Cohen and Yedid Hoshen. 2020. Sub-Image Anomaly Detection with Deep Pyramid Correspondences. CoRR abs/2005.02357 (2020). arXiv:https://arXiv.org/abs/2005.02357

[6]

Hanqiu Deng and Xingyu Li. 2022. Anomaly Detection via Reverse Distillation from One-Class Embedding. In IEEE/CVF Conference on Computer Vision and Pattern Recognition. 9727–9736.

[7]

Hanqiu Deng, Zhaoxiang Zhang, Jinan Bao, and Xingyu Li. 2023. AnoVL: Adapting Vision-Language Models for Unified Zero-shot Anomaly Localization. CoRR abs/2308.15939 (2023). arXiv:https://arXiv.org/abs/2308.15939

[8]

Rohit Girdhar, Alaaeldin El-Nouby, Zhuang Liu, Mannat Singh, Kalyan Vasudev Alwala, Armand Joulin, and Ishan Misra. 2023. ImageBind One Embedding Space to Bind Them All. In IEEE/CVF Conference on Computer Vision and Pattern Recognition. 15180–15190.

[9]

Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron C. Courville, and Yoshua Bengio. 2020. Generative adversarial networks. Communications of the ACM 63, 11 (2020), 139–144.

[10]

Zhaopeng Gu, Bingke Zhu, Guibo Zhu, Yingying Chen, Ming Tang, and Jinqiao Wang. 2024. AnomalyGPT: Detecting Industrial Anomalies Using Large Vision-Language Models. In Thirty-Eighth AAAI Conference on Artificial Intelligence, AAAI 2024, Thirty-Sixth Conference on Innovative Applications of Artificial Intelligence, IAAI 2024, Fourteenth Symposium on Educational Advances in Artificial Intelligence, EAAI 2014, February 20-27, 2024, Vancouver, Canada, Michael J. Wooldridge, Jennifer G. Dy, and Sriraam Natarajan (Eds.). AAAI Press, 1932–1940.

Digital Library

[11]

Denis A. Gudovskiy, Shun Ishizaka, and Kazuki Kozuka. 2022. CFLOW-AD: Real-Time Unsupervised Anomaly Detection with Localization via Conditional Normalizing Flows. In IEEE/CVF Winter Conference on Applications of Computer Vision, WACV 2022, Waikoloa, HI, USA, January 3-8, 2022. IEEE, 1819–1828.

[12]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In IEEE Conference on Computer Vision and Pattern Recognition. 770–778.

[13]

Jongheon Jeong, Yang Zou, Taewan Kim, Dongqing Zhang, Avinash Ravichandran, and Onkar Dabeer. 2023. WinCLIP: Zero-/Few-Shot Anomaly Classification and Segmentation. In IEEE/CVF Conference on Computer Vision and Pattern Recognition,. 19606–19616.

[14]

Chao Jia, Yinfei Yang, Ye Xia, Yi-Ting Chen, Zarana Parekh, Hieu Pham, Quoc V. Le, Yun-Hsuan Sung, Zhen Li, and Tom Duerig. 2021. Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision. In International Conference on Machine Learning, Vol. 139. 4904–4916.

[15]

Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloé Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alexander C. Berg, Wan-Yen Lo, Piotr Dollár, and Ross B. Girshick. 2023. Segment Anything. In IEEE/CVF International Conference on Computer Vision. 3992–4003.

[16]

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet Classification with Deep Convolutional Neural Networks. In Advances in Neural Information Processing Systems. 1106–1114.

Digital Library

[17]

Jiarui Lei, Xiaobo Hu, Yue Wang, and Dong Liu. 2023. PyramidFlow: High-Resolution Defect Contrastive Localization Using Pyramid Normalizing Flow. In IEEE/CVF Conference on Computer Vision and Pattern Recognition. 14143–14152.

[18]

Junnan Li, Dongxu Li, Silvio Savarese, and Steven C. H. Hoi. 2023. BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models. In International Conference on Machine Learning, Vol. 202. 19730–19742.

[19]

Jian Li, Bin Zhang, Yabiao Wang, Ying Tai, Zhenyu Zhang, Chengjie Wang, Jilin Li, Xiaoming Huang, and Yili Xia. 2021. ASFD: Automatic and Scalable Face Detector. In MM ’21: ACM Multimedia Conference, Virtual Event, China, October 20 - 24, 2021. ACM, 2139–2147.

Digital Library

[20]

Yiting Li, Adam David Goodge, Fayao Liu, and Chuan-Sheng Foo. 2024. PromptAD: Zero-shot Anomaly Detection using Text Prompts. In IEEE/CVF Winter Conference on Applications of Computer Vision, WACV 2024, Waikoloa, HI, USA, January 3-8, 2024. IEEE, 1082–1091.

[21]

Tsung-Yi Lin, Michael Maire, Serge J. Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C. Lawrence Zitnick. 2014. Microsoft COCO: Common Objects in Context. In European Conference on Computer Vision, Vol. 8693. 740–755.

[22]

Haotian Liu, Chunyuan Li, Qingyang Wu, and Yong Jae Lee. 2023. Visual Instruction Tuning. In Advances in Neural Information Processing Systems.

[23]

Xuejing Liu, Liang Li, Shuhui Wang, Zheng-Jun Zha, Dechao Meng, and Qingming Huang. 2020. Transferrable Referring Expression Grounding with Concept Transfer and Context Inheritance. In MM ’20: The 28th ACM International Conference on Multimedia, Virtual Event / Seattle, WA, USA, October 12-16, 2020. ACM, 3938–3946.

Digital Library

[24]

Zhikang Liu, Yiming Zhou, Yuansheng Xu, and Zilei Wang. 2023. SimpleNet: A Simple Network for Image Anomaly Detection and Localization. In IEEE/CVF Conference on Computer Vision and Pattern Recognition. 20402–20411.

[25]

Pankaj Mishra, Riccardo Verk, Daniele Fornasier, Claudio Piciarelli, and Gian Luca Foresti. 2021. VT-ADL: A Vision Transformer Network for Image Anomaly Detection and Localization. In 30th IEEE International Symposium on Industrial Electronics. 1–6.

[26]

Patrick Pérez, Michel Gangnet, and Andrew Blake. 2003. Poisson image editing. ACM Trans. Graph. 22, 3 (2003), 313–318.

Digital Library

[27]

Jonathan Pirnay and Keng Chai. 2022. Inpainting Transformer for Anomaly Detection. In International Conference on Image Analysis and Processing, Vol. 13232. 394–406.

Digital Library

[28]

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. 2021. Learning transferable visual models from natural language supervision. (2021), 8748–8763.

[29]

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. 2021. Learning Transferable Visual Models From Natural Language Supervision. In International Conference on Machine Learning, Vol. 139. 8748–8763.

[30]

Karsten Roth, Latha Pemula, Joaquin Zepeda, Bernhard Schölkopf, Thomas Brox, and Peter V. Gehler. 2022. Towards Total Recall in Industrial Anomaly Detection. In IEEE/CVF Conference on Computer Vision and Pattern Recognition. 14298–14308.

[31]

Hannah M. Schlüter, Jeremy Tan, Benjamin Hou, and Bernhard Kainz. 2022. Natural Synthetic Anomalies for Self-supervised Anomaly Detection and Localization. In Computer Vision - ECCV 2022 - 17th European Conference, Tel Aviv, Israel, October 23-27, 2022, Proceedings, Part XXXI(Lecture Notes in Computer Science, Vol. 13691), Shai Avidan, Gabriel J. Brostow, Moustapha Cissé, Giovanni Maria Farinella, and Tal Hassner (Eds.). Springer, 474–489.

Digital Library

[32]

Yixuan Su, Tian Lan, Huayang Li, Jialu Xu, Yan Wang, and Deng Cai. 2023. PandaGPT: One Model To Instruction-Follow Them All. CoRR abs/2305.16355 (2023). arXiv:https://arXiv.org/abs/2305.16355

[33]

Tran Dinh Tien, Anh Tuan Nguyen, Nguyen Hoang Tran, Ta Duc Huy, Soan Thi Minh Duong, Chanh D. Tr. Nguyen, and Steven Q. H. Truong. 2023. Revisiting Reverse Distillation for Anomaly Detection. In IEEE/CVF Conference on Computer Vision and Pattern Recognition. 24511–24520.

[34]

Julian Wyatt, Adam Leach, Sebastian M. Schmon, and Chris G. Willcocks. 2022. AnoDDPM: Anomaly Detection with Denoising Diffusion Probabilistic Models using Simplex Noise. In IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. 649–655.

[35]

Xudong Yan, Huaidong Zhang, Xuemiao Xu, Xiaowei Hu, and Pheng-Ann Heng. 2021. Learning Semantic Context from Normal Samples for Unsupervised Anomaly Detection. In AAAI Conference on Artificial Intelligence. 3110–3118.

[36]

Jihun Yi and Sungroh Yoon. 2020. Patch SVDD: Patch-Level SVDD for Anomaly Detection and Segmentation. In Asian Conference on Computer Vision, Vol. 12627. 375–390.

[37]

Vitjan Zavrtanik, Matej Kristan, and Danijel Skocaj. 2021. Reconstruction by inpainting for visual anomaly detection. Pattern Recognition 112 (2021), 107706.

[38]

Xuan Zhang, Shiyu Li, Xi Li, Ping Huang, Jiulong Shan, and Ting Chen. 2023. DeSTSeg: Segmentation Guided Denoising Student-Teacher for Anomaly Detection. In IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3914–3923.

[39]

Chong Zhou, Chen Change Loy, and Bo Dai. 2022. Extract Free Dense Labels from CLIP. In European Conference on Computer Vision, Vol. 13688. 696–712.

Digital Library

[40]

Deyao Zhu, Jun Chen, Xiaoqian Shen, Xiang Li, and Mohamed Elhoseiny. 2023. MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models. CoRR abs/2304.10592 (2023). arXiv:https://arXiv.org/abs/2304.10592

[41]

Xizhou Zhu, Weijie Su, Lewei Lu, Bin Li, Xiaogang Wang, and Jifeng Dai. 2021. Deformable DETR: Deformable Transformers for End-to-End Object Detection. In International Conference on Learning Representations.

[42]

Yang Zou, Jongheon Jeong, Latha Pemula, Dongqing Zhang, and Onkar Dabeer. 2022. SPot-the-Difference Self-supervised Pre-training for Anomaly Detection and Segmentation. In European Conference on Computer Vision, Vol. 13690. 392–408.

Digital Library

Index Terms

Prompting Industrial Anomaly Segment with Large Vision-Language Models
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
        Image segmentation
2. Information systems
  1. Information systems applications
    1. Multimedia information systems

Recommendations

uCAP: An Unsupervised Prompting Method for Vision-Language Models
Computer Vision – ECCV 2024
Abstract
This paper addresses a significant limitation that prevents Contrastive Language-Image Pretrained Models (CLIP) from achieving optimal performance on downstream image classification tasks. The key problem with CLIP-style zero-shot classification ...
SSCL: Semi-supervised Contrastive Learning for Industrial Anomaly Detection
Pattern Recognition and Computer Vision
Abstract
Anomaly detection is an important machine learning task that aims to identify data points that are inconsistent with normal data patterns. In real-world scenarios, it is common to have access to some labeled and unlabeled samples that are known to ...
Language Models in the Loop: Incorporating Prompting into Weak Supervision
We propose a new strategy for applying large pre-trained language models to novel tasks when labeled training data is limited. Rather than apply the model in a typical zero-shot or few-shot fashion, we treat the model as the basis for labeling functions ...
Highlights
Problem statement

The goal of this paper is to use large language models to create smaller, specialized models. These specialized models can be better suited to specific tasks because they are tuned for them and are less expensive to serve in ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

MMAsia '24: Proceedings of the 6th ACM International Conference on Multimedia in Asia

December 2024

939 pages

ISBN:9798400712739

DOI:10.1145/3696409

Copyright © 2024 Copyright held by the owner/author(s). Publication rights licensed to ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 December 2024

Check for updates

Author Tags

Qualifiers

Research-article

Conference

MMAsia '24

Sponsor:

SIGMM

MMAsia '24: ACM Multimedia Asia

December 3 - 6, 2024

Auckland, New Zealand

Acceptance Rates

Overall Acceptance Rate 59 of 204 submissions, 29%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
51
Total Downloads

Downloads (Last 12 months)51
Downloads (Last 6 weeks)43

Reflects downloads up to 16 Feb 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

Figures

Tables

Media

View full text|Download PDF

View Table of Conten