Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3581783.3611828acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Towards Decision-based Sparse Attacks on Video Recognition

Published: 27 October 2023 Publication History

Abstract

Recent studies indicate that sparse attacks threaten the security of deep learning models, which modify only a small set of pixels in the input based on the l0 norm constraint. While existing research has primarily focused on sparse attacks against image models, there is a notable gap in evaluating the robustness of video recognition models. To bridge this gap, we are the first to study sparse video attacks and propose an attack framework named V-DSA in the most challenging decision-based setting, in which threat models only return the predicted hard label. Specifically, V-DSA comprises two modules: a Cross-Modal Generator (CMG) for query-free transfer attacks on each frame and an Optical flow Grouping Evolution algorithm (OGE) for query-efficient spatial-temporal attacks. CMG passes each frame to generate the transfer video as the starting point of the attack based on the feature similarity between image classification and video recognition models. OGE first initializes populations based on transfer video and then leverages optical flow to establish the temporal connection of the perturbed pixels in each frame, which can reduce the parameter space and break the temporal relationship between frames specifically. Finally, OGE complements the above optical flow modeling by grouping evolution which can realize the coarse-to-fine attack to avoid falling into the local optimum. In addition, OGE makes the perturbation with temporal coherence while balancing the number of perturbed pixels per frame, further increasing the imperceptibility of the attack. Extensive experiments demonstrate that V-DSA achieves state-of-the-art performance in terms of both threat effectiveness and imperceptibility. We hope V-DSA can provide valuable insights into the security of video recognition systems.

References

[1]
Wieland Brendel, Jonas Rauber, and Matthias Bethge. 2018. Decision-Based Ad-versarial Attacks: Reliable Attacks Against Black-Box Machine Learning Models. In International Conference on Learning Representations.
[2]
Joao Carreira and Andrew Zisserman. 2017. Quo vadis, action recognition? a new model and the kinetics dataset. In proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6299--6308.
[3]
Antonin Chambolle. 2004. An algorithm for total variation minimization and applications. Journal of Mathematical imaging and vision 20, 1 (2004), 89--97.
[4]
Pin-Yu Chen, Huan Zhang, Yash Sharma, Jinfeng Yi, and Cho-Jui Hsieh. 2017. Zoo: Zeroth order optimization based black-box attacks to deep neural networks without training substitute models. In Proceedings of the 10th ACM workshop on artificial intelligence and security. 15--26.
[5]
Xin Chen, Bin Yan, Jiawen Zhu, Dong Wang, Xiaoyun Yang, and Huchuan Lu. 2021. Transformer tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 8126--8135.
[6]
Zhaoyu Chen, Bo Li, Shuang Wu, Shouhong Ding, and Wenqiang Zhang. 2023. Query-Efficient Decision-based Black-Box Patch Attack. arXiv preprint arXiv:2307.00477 (2023).
[7]
Zhaoyu Chen, Bo Li, Shuang Wu, Kaixun Jiang, Shouhong Ding, and Wenqiang Zhang. 2023. Content-based Unrestricted Adversarial Attack. arXiv preprint arXiv:2305.10665 (2023).
[8]
Zhaoyu Chen, Bo Li, Shuang Wu, Jianghe Xu, Shouhong Ding, and Wenqiang Zhang. 2022. Shape matters: deformable patch attack. In Computer Vision-ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23-27, 2022, Proceedings, Part IV. Springer, 529--548.
[9]
Zhaoyu Chen, Bo Li, Jianghe Xu, Shuang Wu, Shouhong Ding, and Wenqiang Zhang. 2022. Towards Practical Certifiable Patch Defense with Vision Transformer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 15148--15158.
[10]
Francesco Croce and Matthias Hein. 2019. Sparse and imperceivable adversarial attacks. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 4724--4732.
[11]
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition. Ieee, 248--255.
[12]
Xiaoyi Dong, Dongdong Chen, Jianmin Bao, Chuan Qin, Lu Yuan, Weiming Zhang, Nenghai Yu, and Dong Chen. 2020. GreedyFool: Distortion-aware sparse adversarial attack. Advances in Neural Information Processing Systems 33 (2020), 11226--11236.
[13]
Yinpeng Dong, Hang Su, Baoyuan Wu, Zhifeng Li, Wei Liu, Tong Zhang, and Jun Zhu. 2019. Efficient decision-based black-box adversarial attacks on face recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 7714--7722.
[14]
Pinxue Guo, Tony Huang, Peiyang He, Xuefeng Liu, Tianjun Xiao, Zhaoyu Chen, and Wenqiang Zhang. 2023. OpenVIS: Open-vocabulary Video Instance Segmentation. arXiv preprint arXiv:2305.16835 (2023).
[15]
Pinxue Guo, Wei Zhang, Xiaoqiang Li, and Wenqiang Zhang. 2022. Adaptive online mutual learning bi-decoders for video object segmentation. IEEE Transactions on Image Processing 31 (2022), 7063--7077.
[16]
Rohit Gupta, Naveed Akhtar, Gaurav Kumar Nayak, Ajmal Mian, and Mubarak Shah. 2022. Query Efficient Cross-Dataset Transferable Black-Box Attack on Action Recognition. arXiv preprint arXiv:2211.13171 (2022).
[17]
Kensho Hara, Hirokatsu Kataoka, and Yutaka Satoh. 2018. Can spatiotemporal 3d cnns retrace the history of 2d cnns and imagenet?. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition. 6546--6555.
[18]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.
[19]
Ziwen He, Wei Wang, Jing Dong, and Tieniu Tan. 2022. Transferable Sparse Adversarial Attack. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 14963--14972.
[20]
Lingyi Hong, Wenchao Chen, Zhongying Liu, Wei Zhang, Pinxue Guo, Zhaoyu Chen, and Wenqiang Zhang. 2022. LVOS: A Benchmark for Long-term Video Object Segmentation. arXiv preprint arXiv:2211.10181 (2022).
[21]
Lingyi Hong, Wei Zhang, Liangyu Chen, Wenqiang Zhang, and Jianping Fan. 2021. Adaptive selection of reference frames for video object segmentation. IEEE Transactions on Image Processing 31 (2021), 1057--1071.
[22]
Berthold KP Horn and Brian G Schunck. 1981. Determining optical flow. Artificial intelligence 17, 1--3 (1981), 185--203.
[23]
Hao Huang, Yongtao Wang, Zhaoyu Chen, Yuze Zhang, Yuheng Li, Zhi Tang, Wei Chu, Jingdong Chen, Weisi Lin, and Kai-Kuang Ma. 2022. Cmua-watermark: A cross-model universal adversarial watermark for combating deepfakes. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 36. 989--997.
[24]
Andrew Ilyas, Logan Engstrom, Anish Athalye, and Jessy Lin. 2018. Blackbox adversarial attacks with limited queries and information. In International Conference on Machine Learning. PMLR, 2137--2146.
[25]
Xiaojun Jia, Xingxing Wei, Xiaochun Cao, and Xiaoguang Han. 2020. Advwatermark: A novel watermark perturbation for adversarial examples. In Proceedings of the 28th ACM International Conference on Multimedia. 1579--1587.
[26]
Kaixun Jiang, Zhaoyu Chen, Tony Huang, Jiafeng Wang, Dingkang Yang, Bo Li, Yan Wang, and Wenqiang Zhang. 2023. Efficient Decision-based Black-box Patch Attacks on Video Recognition. arXiv preprint arXiv:2303.11917 (2023).
[27]
Linxi Jiang, Xingjun Ma, Shaoxiang Chen, James Bailey, and Yu-Gang Jiang. 2019. Black-box adversarial attacks on video recognition models. In Proceedings of the 27th ACM International Conference on Multimedia. 864--872.
[28]
Andrej Karpathy, George Toderici, Sanketh Shetty, Thomas Leung, Rahul Sukthankar, and Li Fei-Fei. 2014. Large-Scale Video Classification with Convolutional Neural Networks. In 2014 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2014, Columbus, OH, USA, June 23-28, 2014. IEEE Computer Society, 1725--1732. https://doi.org/10.1109/CVPR.2014.223
[29]
Alexey Kurakin, Ian J Goodfellow, and Samy Bengio. 2018. Adversarial examples in the physical world. In Artificial intelligence safety and security. Chapman and Hall/CRC, 99--112.
[30]
Huichen Li, Xiaojun Xu, Xiaolu Zhang, Shuang Yang, and Bo Li. 2020. Qeba: Query-efficient boundary-based blackbox attack. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 1221--1230.
[31]
Shasha Li, Abhishek Aich, Shitong Zhu, Salman Asif, Chengyu Song, Amit Roy-Chowdhury, and Srikanth Krishnamurthy. 2021. Adversarial attacks on black box video classifiers: Leveraging the power of geometric transformations. Advances in Neural Information Processing Systems 34 (2021), 2085--2096.
[32]
Siyuan Liang, Xingxing Wei, Siyuan Yao, and Xiaochun Cao. 2020. Efficient adversarial attacks for visual object tracking. In European Conference on Computer Vision. Springer, 34--50.
[33]
Siao Liu, Zhaoyu Chen, Wei Li, Jiwei Zhu, Jiafeng Wang, Wenqiang Zhang, and Zhongxue Gan. 2022. Efficient universal shuffle attack for visual object tracking. In ICASSP 2022--2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2739--2743.
[34]
Siao Liu, Zhaoyu Chen, Yang Liu, Yuzheng Wang, Dingkang Yang, Zhile Zhao, Ziqing Zhou, Xie Yi, Wei Li, Wenqiang Zhang, and Zhongxue Gan. 2023. Improving Generalization in Visual Reinforcement Learning via Conflict-aware Gradient Agreement Augmentation. arXiv:2308.01194 [cs.CV]
[35]
Yang Liu, Jing Liu, Kun Yang, Bobo Ju, Siao Liu, Yuzheng Wang, Dingkang Yang, Peng Sun, and Liang Song. 2023. AMP-Net: Appearance-Motion Prototype Network Assisted Automatic Video Anomaly Detection System. IEEE Transactions on Industrial Informatics (2023), 1--13. https://doi.org/10.1109/TII.2023.3298476
[36]
Yang Liu, Jing Liu, Mengyang Zhao, Shuang Li, and Liang Song. 2022. Collaborative normality learning framework for weakly supervised video anomaly detection. IEEE Transactions on Circuits and Systems II: Express Briefs 69, 5 (2022), 2508--2512.
[37]
Yang Liu, Dingkang Yang, Yan Wang, Jing Liu, and Liang Song. 2023. Generalized video anomaly event detection: Systematic taxonomy and comparison of deep models. arXiv preprint arXiv:2302.05087 (2023).
[38]
Apostolos Modas, Seyed-Mohsen Moosavi-Dezfooli, and Pascal Frossard. 2019. Sparsefool: a few pixels make a big difference. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 9087--9096.
[39]
Ronghui Mu, Wenjie Ruan, Leandro Soriano Marcolino, and Qiang Ni. 2021. Sparse Adversarial Video Attacks with Spatial Transformations. In 32nd British Machine Vision Conference 2021, BMVC 2021, Online, November 22-25, 2021. BMVA Press, 101. https://www.bmvc2021-virtualconference.com/assets/papers/1529. pdf
[40]
Lukas Schott, Jonas Rauber, Matthias Bethge, and Wieland Brendel. 2018. Towards the first adversarially robust neural network model on MNIST. arXiv preprint arXiv:1805.09190 (2018).
[41]
Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. 2017. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision. 618--626.
[42]
Yucheng Shi, Yahong Han, Qinghua Hu, Yi Yang, and Qi Tian. 2022. Query-efficient Black-box Adversarial Attack with Customized Iteration and Sampling. IEEE Transactions on Pattern Analysis and Machine Intelligence (2022).
[43]
Khurram Soomro, Amir Roshan Zamir, and Mubarak Shah. 2012. UCF101: A dataset of 101 human actions classes from videos in the wild. arXiv preprint arXiv:1212.0402 (2012).
[44]
Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. 2014. Intriguing properties of neural networks. In 2nd International Conference on Learning Representations, ICLR 2014.
[45]
Viet Quoc Vo, Ehsan Abbasnejad, and Damith Ranasinghe. 2022. Query Efficient Decision Based Sparse Attacks Against Black-Box Deep Learning Models. In The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022. OpenReview.net. https://openreview.net/forum?id= 73MEhZ0anV
[46]
Xiaolong Wang, Ross Girshick, Abhinav Gupta, and Kaiming He. 2018. Non-local neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 7794--7803.
[47]
Yuzheng Wang, Zhaoyu Chen, Dingkang Yang, Yang Liu, Siao Liu, Wenqiang Zhang, and Lizhe Qi. 2023. Adversarial contrastive distillation with adaptive denoising. In ICASSP 2023--2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 1--5.
[48]
Zeyuan Wang, Chaofeng Sha, and Su Yang. 2021. Reinforcement Learning Based Sparse Black-box Adversarial Attack on Video Recognition Models. In Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, IJCAI 2021, Virtual Event / Montreal, Canada, 19-27 August 2021. ijcai.org, 3162--3168. https://doi.org/10.24963/ijcai.2021/435
[49]
Xingxing Wei, Huanqian Yan, and Bo Li. 2022. Sparse Black-Box Video Attack with Reinforcement Learning. Int. J. Comput. Vis. 130, 6 (2022), 1459--1473. https://doi.org/10.1007/s11263-022-01604-w
[50]
Xingxing Wei, Jun Zhu, Sha Yuan, and Hang Su. 2019. Sparse adversarial pertur-bations for videos. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 8973--8980.
[51]
Zhipeng Wei, Jingjing Chen, Xingxing Wei, Linxi Jiang, Tat-Seng Chua, Fengfeng Zhou, and Yu-Gang Jiang. 2020. Heuristic black-box adversarial attacks on video recognition models. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 12338--12345.
[52]
Zhipeng Wei, Jingjing Chen, Zuxuan Wu, and Yu-Gang Jiang. 2022. Boosting the Transferability of Video Adversarial Examples via Temporal Translation. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 36. 2659--2667.
[53]
Zhipeng Wei, Jingjing Chen, Zuxuan Wu, and Yu-Gang Jiang. 2022. Cross-Modal Transferable Adversarial Attacks from Images to Videos. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 15064--15073.
[54]
Zhipeng Wei, Jingjing Chen, Hao Zhang, Linxi Jiang, and Yu-Gang Jiang. 2022. Adaptive Temporal Grouping for Black-box Adversarial Attacks on Videos. In Proceedings of the 2022 International Conference on Multimedia Retrieval. 587--593.
[55]
Huanqian Yan and Xingxing Wei. 2021. Efficient Sparse Attacks on Videos using Reinforcement Learning. In Proceedings of the 29th ACM International Conference on Multimedia. 2326--2334.
[56]
Ceyuan Yang, Yinghao Xu, Jianping Shi, Bo Dai, and Bolei Zhou. 2020. Temporal pyramid network for action recognition. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 591--600.
[57]
Dingkang Yang, Zhaoyu Chen, Yuzheng Wang, Shunli Wang, Mingcheng Li, Siao Liu, Xiao Zhao, Shuai Huang, Zhiyan Dong, Peng Zhai, and Lihua Zhang. 2023. Context De-Confounded Emotion Recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 19005--19015.
[58]
Dingkang Yang, Shuai Huang, Haopeng Kuang, Yangtao Du, and Lihua Zhang. 2022. Disentangled Representation Learning for Multimodal Emotion Recognition. In Proceedings of the 30th ACM International Conference on Multimedia (ACM MM). 1642--1651.
[59]
Dingkang Yang, Shuai Huang, Shunli Wang, Yang Liu, Peng Zhai, Liuzhen Su, Mingcheng Li, and Lihua Zhang. 2022. Emotion Recognition for Multiple Context Awareness. In Proceedings of the European Conference on Computer Vision (ECCV), Vol. 13697. 144--162.
[60]
Dingkang Yang, Shuai Huang, Zhi Xu, Zhenpeng Li, Shunli Wang, Mingcheng Li, Yuzheng Wang, Yang Liu, Kun Yang, Zhaoyu Chen, et al. 2023. AIDE: A Vision-Driven Multi-View, Multi-Modal, Multi-Tasking Dataset for Assistive Driving Perception. arXiv preprint arXiv:2307.13933 (2023).
[61]
Dingkang Yang, Haopeng Kuang, Shuai Huang, and Lihua Zhang. 2022. Learning Modality-Specific and -Agnostic Representations for Asynchronous Multimodal Language Sequences. In Proceedings of the 30th ACM International Conference on Multimedia (ACM MM). 1708--1717.
[62]
Dingkang Yang, Yang Liu, Can Huang, Mingcheng Li, Xiao Zhao, Yuzheng Wang, Kun Yang, Yan Wang, Peng Zhai, and Lihua Zhang. 2023. Target and source modality co-reinforcement for emotion understanding from asynchronous multimodal sequences. Knowledge-Based Systems 265 (2023), 110370.
[63]
Hu Zhang, Linchao Zhu, Yi Zhu, and Yi Yang. 2020. Motion-excited sampler: Video adversarial attack with sparked prior. In European Conference on Computer Vision. Springer, 240--256

Cited By

View all
  • (2024)SVASTIN: Sparse Video Adversarial Attack via Spatio-Temporal Invertible Neural Networks2024 IEEE International Conference on Multimedia and Expo (ICME)10.1109/ICME57554.2024.10688258(1-6)Online publication date: 15-Jul-2024
  • (2024)A qualitative AI security risk assessment of autonomous vehiclesTransportation Research Part C: Emerging Technologies10.1016/j.trc.2024.104797169(104797)Online publication date: Dec-2024

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
MM '23: Proceedings of the 31st ACM International Conference on Multimedia
October 2023
9913 pages
ISBN:9798400701085
DOI:10.1145/3581783
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 October 2023

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. adversarial examples
  2. sparse attacks.
  3. video action recognition

Qualifiers

  • Research-article

Funding Sources

Conference

MM '23
Sponsor:
MM '23: The 31st ACM International Conference on Multimedia
October 29 - November 3, 2023
Ottawa ON, Canada

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)189
  • Downloads (Last 6 weeks)6
Reflects downloads up to 16 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)SVASTIN: Sparse Video Adversarial Attack via Spatio-Temporal Invertible Neural Networks2024 IEEE International Conference on Multimedia and Expo (ICME)10.1109/ICME57554.2024.10688258(1-6)Online publication date: 15-Jul-2024
  • (2024)A qualitative AI security risk assessment of autonomous vehiclesTransportation Research Part C: Emerging Technologies10.1016/j.trc.2024.104797169(104797)Online publication date: Dec-2024

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media