Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3696409.3700192acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Prompting Industrial Anomaly Segment with Large Vision-Language Models

Published: 28 December 2024 Publication History

Abstract

Industrial Anomaly Detection (IAD) aims to identify and locate defects in images like industrial part defects, which is usually achieved by visual inspection methods. Existing works mostly design and train dedicated models for each anomaly category, which need abundant normal images and multiple training procedures. Such limitations prevent the above methods from being used in practice. To tackle the IAD task in a single model without the need for extra normal images, we propose a zero-shot IAD framework named Prompting Anomaly Segment Model (PASM). PASM extracts multimodal information from the images and then designs a novel pipeline to guide the general Large Vision-Language Models (LVLMs) to tackle the IAD task, which is out of the original targets of LVLMs. The proposed PASM can generate anomaly maps with high quality and high confidence. Experimental results on two widely used benchmark datasets demonstrate that PASM outperforms the models specially trained for anomaly segmentation in both quantitative and qualitative evaluations. Compared to suboptimal methods, we achieve 5.3%, 1.4% and 9.0% improvement in image-level AUC, pixel-level AUC and max-F1-pixel on the VisA dataset in zero-shot setting. In addition, PASM can be applied in a few-shot setting and obtain further improvement. The code and model will be released later.

Supplemental Material

ZIP File
Appendix
PDF File
Appendix

References

[1]
Jean-Baptiste Alayrac, Jeff Donahue, Pauline Luc, Antoine Miech, Iain Barr, Yana Hasson, Karel Lenc, Arthur Mensch, Katherine Millican, Malcolm Reynolds, et al. 2022. Flamingo: a Visual Language Model for Few-Shot Learning. In Advances in Neural Information Processing Systems.
[2]
Paul Bergmann, Kilian Batzner, Michael Fauser, David Sattlegger, and Carsten Steger. 2022. Beyond Dents and Scratches: Logical Constraints in Unsupervised Anomaly Detection and Localization. International Journal of Computer Vision 130, 4 (2022), 947–969.
[3]
Paul Bergmann, Michael Fauser, David Sattlegger, and Carsten Steger. 2019. MVTec AD - A Comprehensive Real-World Dataset for Unsupervised Anomaly Detection. In IEEE Conference on Computer Vision and Pattern Recognition. 9592–9600.
[4]
Nenglun Chen, Xingjia Pan, Runnan Chen, Lei Yang, Zhiwen Lin, Yuqiang Ren, Haolei Yuan, Xiaowei Guo, Feiyue Huang, and Wenping Wang. 2021. Distributed Attention for Grounded Image Captioning. In MM ’21: ACM Multimedia Conference, Virtual Event, China, October 20 - 24, 2021. ACM, 1966–1975.
[5]
Niv Cohen and Yedid Hoshen. 2020. Sub-Image Anomaly Detection with Deep Pyramid Correspondences. CoRR abs/2005.02357 (2020). arXiv:https://arXiv.org/abs/2005.02357
[6]
Hanqiu Deng and Xingyu Li. 2022. Anomaly Detection via Reverse Distillation from One-Class Embedding. In IEEE/CVF Conference on Computer Vision and Pattern Recognition. 9727–9736.
[7]
Hanqiu Deng, Zhaoxiang Zhang, Jinan Bao, and Xingyu Li. 2023. AnoVL: Adapting Vision-Language Models for Unified Zero-shot Anomaly Localization. CoRR abs/2308.15939 (2023). arXiv:https://arXiv.org/abs/2308.15939
[8]
Rohit Girdhar, Alaaeldin El-Nouby, Zhuang Liu, Mannat Singh, Kalyan Vasudev Alwala, Armand Joulin, and Ishan Misra. 2023. ImageBind One Embedding Space to Bind Them All. In IEEE/CVF Conference on Computer Vision and Pattern Recognition. 15180–15190.
[9]
Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron C. Courville, and Yoshua Bengio. 2020. Generative adversarial networks. Communications of the ACM 63, 11 (2020), 139–144.
[10]
Zhaopeng Gu, Bingke Zhu, Guibo Zhu, Yingying Chen, Ming Tang, and Jinqiao Wang. 2024. AnomalyGPT: Detecting Industrial Anomalies Using Large Vision-Language Models. In Thirty-Eighth AAAI Conference on Artificial Intelligence, AAAI 2024, Thirty-Sixth Conference on Innovative Applications of Artificial Intelligence, IAAI 2024, Fourteenth Symposium on Educational Advances in Artificial Intelligence, EAAI 2014, February 20-27, 2024, Vancouver, Canada, Michael J. Wooldridge, Jennifer G. Dy, and Sriraam Natarajan (Eds.). AAAI Press, 1932–1940.
[11]
Denis A. Gudovskiy, Shun Ishizaka, and Kazuki Kozuka. 2022. CFLOW-AD: Real-Time Unsupervised Anomaly Detection with Localization via Conditional Normalizing Flows. In IEEE/CVF Winter Conference on Applications of Computer Vision, WACV 2022, Waikoloa, HI, USA, January 3-8, 2022. IEEE, 1819–1828.
[12]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In IEEE Conference on Computer Vision and Pattern Recognition. 770–778.
[13]
Jongheon Jeong, Yang Zou, Taewan Kim, Dongqing Zhang, Avinash Ravichandran, and Onkar Dabeer. 2023. WinCLIP: Zero-/Few-Shot Anomaly Classification and Segmentation. In IEEE/CVF Conference on Computer Vision and Pattern Recognition,. 19606–19616.
[14]
Chao Jia, Yinfei Yang, Ye Xia, Yi-Ting Chen, Zarana Parekh, Hieu Pham, Quoc V. Le, Yun-Hsuan Sung, Zhen Li, and Tom Duerig. 2021. Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision. In International Conference on Machine Learning, Vol. 139. 4904–4916.
[15]
Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloé Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alexander C. Berg, Wan-Yen Lo, Piotr Dollár, and Ross B. Girshick. 2023. Segment Anything. In IEEE/CVF International Conference on Computer Vision. 3992–4003.
[16]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet Classification with Deep Convolutional Neural Networks. In Advances in Neural Information Processing Systems. 1106–1114.
[17]
Jiarui Lei, Xiaobo Hu, Yue Wang, and Dong Liu. 2023. PyramidFlow: High-Resolution Defect Contrastive Localization Using Pyramid Normalizing Flow. In IEEE/CVF Conference on Computer Vision and Pattern Recognition. 14143–14152.
[18]
Junnan Li, Dongxu Li, Silvio Savarese, and Steven C. H. Hoi. 2023. BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models. In International Conference on Machine Learning, Vol. 202. 19730–19742.
[19]
Jian Li, Bin Zhang, Yabiao Wang, Ying Tai, Zhenyu Zhang, Chengjie Wang, Jilin Li, Xiaoming Huang, and Yili Xia. 2021. ASFD: Automatic and Scalable Face Detector. In MM ’21: ACM Multimedia Conference, Virtual Event, China, October 20 - 24, 2021. ACM, 2139–2147.
[20]
Yiting Li, Adam David Goodge, Fayao Liu, and Chuan-Sheng Foo. 2024. PromptAD: Zero-shot Anomaly Detection using Text Prompts. In IEEE/CVF Winter Conference on Applications of Computer Vision, WACV 2024, Waikoloa, HI, USA, January 3-8, 2024. IEEE, 1082–1091.
[21]
Tsung-Yi Lin, Michael Maire, Serge J. Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C. Lawrence Zitnick. 2014. Microsoft COCO: Common Objects in Context. In European Conference on Computer Vision, Vol. 8693. 740–755.
[22]
Haotian Liu, Chunyuan Li, Qingyang Wu, and Yong Jae Lee. 2023. Visual Instruction Tuning. In Advances in Neural Information Processing Systems.
[23]
Xuejing Liu, Liang Li, Shuhui Wang, Zheng-Jun Zha, Dechao Meng, and Qingming Huang. 2020. Transferrable Referring Expression Grounding with Concept Transfer and Context Inheritance. In MM ’20: The 28th ACM International Conference on Multimedia, Virtual Event / Seattle, WA, USA, October 12-16, 2020. ACM, 3938–3946.
[24]
Zhikang Liu, Yiming Zhou, Yuansheng Xu, and Zilei Wang. 2023. SimpleNet: A Simple Network for Image Anomaly Detection and Localization. In IEEE/CVF Conference on Computer Vision and Pattern Recognition. 20402–20411.
[25]
Pankaj Mishra, Riccardo Verk, Daniele Fornasier, Claudio Piciarelli, and Gian Luca Foresti. 2021. VT-ADL: A Vision Transformer Network for Image Anomaly Detection and Localization. In 30th IEEE International Symposium on Industrial Electronics. 1–6.
[26]
Patrick Pérez, Michel Gangnet, and Andrew Blake. 2003. Poisson image editing. ACM Trans. Graph. 22, 3 (2003), 313–318.
[27]
Jonathan Pirnay and Keng Chai. 2022. Inpainting Transformer for Anomaly Detection. In International Conference on Image Analysis and Processing, Vol. 13232. 394–406.
[28]
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. 2021. Learning transferable visual models from natural language supervision. (2021), 8748–8763.
[29]
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. 2021. Learning Transferable Visual Models From Natural Language Supervision. In International Conference on Machine Learning, Vol. 139. 8748–8763.
[30]
Karsten Roth, Latha Pemula, Joaquin Zepeda, Bernhard Schölkopf, Thomas Brox, and Peter V. Gehler. 2022. Towards Total Recall in Industrial Anomaly Detection. In IEEE/CVF Conference on Computer Vision and Pattern Recognition. 14298–14308.
[31]
Hannah M. Schlüter, Jeremy Tan, Benjamin Hou, and Bernhard Kainz. 2022. Natural Synthetic Anomalies for Self-supervised Anomaly Detection and Localization. In Computer Vision - ECCV 2022 - 17th European Conference, Tel Aviv, Israel, October 23-27, 2022, Proceedings, Part XXXI(Lecture Notes in Computer Science, Vol. 13691), Shai Avidan, Gabriel J. Brostow, Moustapha Cissé, Giovanni Maria Farinella, and Tal Hassner (Eds.). Springer, 474–489.
[32]
Yixuan Su, Tian Lan, Huayang Li, Jialu Xu, Yan Wang, and Deng Cai. 2023. PandaGPT: One Model To Instruction-Follow Them All. CoRR abs/2305.16355 (2023). arXiv:https://arXiv.org/abs/2305.16355
[33]
Tran Dinh Tien, Anh Tuan Nguyen, Nguyen Hoang Tran, Ta Duc Huy, Soan Thi Minh Duong, Chanh D. Tr. Nguyen, and Steven Q. H. Truong. 2023. Revisiting Reverse Distillation for Anomaly Detection. In IEEE/CVF Conference on Computer Vision and Pattern Recognition. 24511–24520.
[34]
Julian Wyatt, Adam Leach, Sebastian M. Schmon, and Chris G. Willcocks. 2022. AnoDDPM: Anomaly Detection with Denoising Diffusion Probabilistic Models using Simplex Noise. In IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. 649–655.
[35]
Xudong Yan, Huaidong Zhang, Xuemiao Xu, Xiaowei Hu, and Pheng-Ann Heng. 2021. Learning Semantic Context from Normal Samples for Unsupervised Anomaly Detection. In AAAI Conference on Artificial Intelligence. 3110–3118.
[36]
Jihun Yi and Sungroh Yoon. 2020. Patch SVDD: Patch-Level SVDD for Anomaly Detection and Segmentation. In Asian Conference on Computer Vision, Vol. 12627. 375–390.
[37]
Vitjan Zavrtanik, Matej Kristan, and Danijel Skocaj. 2021. Reconstruction by inpainting for visual anomaly detection. Pattern Recognition 112 (2021), 107706.
[38]
Xuan Zhang, Shiyu Li, Xi Li, Ping Huang, Jiulong Shan, and Ting Chen. 2023. DeSTSeg: Segmentation Guided Denoising Student-Teacher for Anomaly Detection. In IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3914–3923.
[39]
Chong Zhou, Chen Change Loy, and Bo Dai. 2022. Extract Free Dense Labels from CLIP. In European Conference on Computer Vision, Vol. 13688. 696–712.
[40]
Deyao Zhu, Jun Chen, Xiaoqian Shen, Xiang Li, and Mohamed Elhoseiny. 2023. MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models. CoRR abs/2304.10592 (2023). arXiv:https://arXiv.org/abs/2304.10592
[41]
Xizhou Zhu, Weijie Su, Lewei Lu, Bin Li, Xiaogang Wang, and Jifeng Dai. 2021. Deformable DETR: Deformable Transformers for End-to-End Object Detection. In International Conference on Learning Representations.
[42]
Yang Zou, Jongheon Jeong, Latha Pemula, Dongqing Zhang, and Onkar Dabeer. 2022. SPot-the-Difference Self-supervised Pre-training for Anomaly Detection and Segmentation. In European Conference on Computer Vision, Vol. 13690. 392–408.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
MMAsia '24: Proceedings of the 6th ACM International Conference on Multimedia in Asia
December 2024
939 pages
ISBN:9798400712739
DOI:10.1145/3696409
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 December 2024

Check for updates

Author Tags

  1. IAD
  2. Image Segmentation
  3. Large Multi-modal Model

Qualifiers

  • Research-article

Conference

MMAsia '24
Sponsor:
MMAsia '24: ACM Multimedia Asia
December 3 - 6, 2024
Auckland, New Zealand

Acceptance Rates

Overall Acceptance Rate 59 of 204 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 51
    Total Downloads
  • Downloads (Last 12 months)51
  • Downloads (Last 6 weeks)43
Reflects downloads up to 16 Feb 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

Full Text

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media