Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3665451.3665533acmconferencesArticle/Chapter ViewAbstractPublication Pagesasia-ccsConference Proceedingsconference-collections
research-article
Open access

MEGEX: Data-Free Model Extraction Attack Against Gradient-Based Explainable AI

Published: 23 July 2024 Publication History

Abstract

Explainable AI encourages machine learning applications in the real world, whereas data-free model extraction attacks (DFME), in which an adversary steals a trained machine learning model by creating input queries with generative models instead of collecting training data, have attracted attention as a serious threat. In this paper, we propose MEGEX, a data-free model extraction attack against explainable AI that provides gradient-based explanations for inference results, and investigate whether the gradient-based explanations increase the vulnerability to the data-free model extraction attacks. In MEGEX, an adversary leverages explanations by Vanilla Gradient as derivative values for training a generative model. We prove that MEGEX is identical to white-box data-free knowledge distillation, whereby the adversary can train the generative model with the exact gradients. Our experiments show that the adversary in MEGEX can steal highly accurate models - 0.98×, 0.91×, and 0.96× the victim model accuracy on SVHN, Fashion-MNIST, and CIFAR-10 datasets given 1.5M, 5M, 20M queries, respectively. In addition, we also apply sophisticated gradient-based explanations, i.e., SmoothGrad and Integrated Gradients, to MEGEX. The experimental results indicate that these explanations are potential countermeasures to MEGEX. We also found that the accuracy of the model stolen by the adversary depends on the diversity of query inputs by the generative model.

References

[1]
Amina Adadi and Mohammed Berrada. 2018. Peeking inside the black-box: A survey on Explainable Artificial Intelligence (XAI). IEEE Access 6 (2018), 52138--52160.
[2]
Ulrich Aïvodji, Alexandre Bolot, and Sébastien Gambs. 2020. Model extraction from counterfactual explanations. arXiv preprint arXiv:2009.01884 (2020).
[3]
Marco Ancona, Enea Ceolini, Cengiz Öztireli, and Markus Gross. 2017. Towards better understanding of gradient-based attribution methods for deep neural networks. arXiv preprint arXiv:1711.06104 (2017).
[4]
Varun Chandrasekaran, Kamalika Chaudhuri, Irene Giacomelli, Somesh Jha, and Songbai Yan. 2020. Exploring connections between active learning and model extraction. In 29th USENIX Security Symposium (USENIX Security 20). 1309--1326.
[5]
Jialuo Chen, Jingyi Wang, Tinglan Peng, Youcheng Sun, Peng Cheng, Shouling Ji, Xingjun Ma, Bo Li, and Dawn Song. 2022. Copy, Right? A Testing Framework for Copyright Protection of Deep Learning Models. In Proc. of IEEE S&P 2022. IEEE, 824--841.
[6]
Yanjiao Chen, Rui Guan, Xueluan Gong, Jianshuo Dong, and Meng Xue. 2023. D-DAE: Defense-Penetrating Model Extraction Attacks. In 2023 IEEE Symposium on Security and Privacy (SP). IEEE Computer Society, 432--449.
[7]
Yanjiao Chen, Rui Guan, Xueluan Gong, Jianshuo Dong, and Meng Xue. 2023. D-DAE: Defense-Penetrating Model Extraction Attacks. In 2023 IEEE Symposium on Security and Privacy (SP). 382--399. https://doi.org/10.1109/SP46215.2023.10179406
[8]
Gongfan Fang, Jie Song, Chengchao Shen, Xinchao Wang, Da Chen, and Mingli Song. 2019. Data-free adversarial distillation. arXiv preprint arXiv:1912.11006 (2019).
[9]
Xueluan Gong, Yanjiao Chen, Wenbin Yang, Guanghao Mei, and Qian Wang. 2021. InverseNet: Augmenting Model Extraction Attacks with Training Data Inversion. In IJCAI. 2439--2447.
[10]
Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. 2014. Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572 (2014).
[11]
Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. 2015. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015).
[12]
Matthew Jagielski, Nicholas Carlini, David Berthelot, Alex Kurakin, and Nicolas Papernot. 2020. High Accuracy and High Fidelity Extraction of Neural Networks. In 29th USENIX Security Symposium (USENIX Security 20).
[13]
Mika Juuti, Sebastian Szyller, Samuel Marchal, and N Asokan. 2019. PRADA: protecting against DNN model stealing attacks. In 2019 IEEE European Symposium on Security and Privacy (EuroS&P). IEEE, 512--527.
[14]
Sanjay Kariyappa, Atul Prakash, and Moinuddin K Qureshi. 2021. MAZE: Data-Free Model Stealing Attack Using Zeroth-Order Gradient Estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 13814--13823.
[15]
Manish Kesarwani, Bhaskar Mukhoty, Vijay Arya, and Sameep Mehta. 2018. Model extraction warning in mlaas paradigm. In Proceedings of the 34th Annual Computer Security Applications Conference. 371--380.
[16]
Alex Krizhevsky, Geoffrey Hinton, et al. 2009. Learning multiple layers of features from tiny images. (2009).
[17]
Jeonghyun Lee, Sungmin Han, and Sangkyun Lee. 2022. Model stealing defense against exploiting information leak through the interpretation of deep neural nets. In Int. Joint Conf. on Artificial Intelligence (IJCAI), Vienna, Austria.
[18]
Pan Li, Peizhuo Lv, Kai Chen, Yuling Cai, Fan Xiang, and Shengzhi Zhang. 2023. Model Stealing Attack against Multi-Exit Networks. CoRR abs/2305.13584 (2023). https://doi.org/10.48550/arXiv.2305.13584
[19]
Zeyu Li, Chenghui Shi, Yuwen Pu, Xuhong Zhang, Yu Li, Jinbao Li, and Shouling Ji. 2023. MEAOD: Model Extraction Attack against Object Detectors. CoRR abs/2312.14677 (2023). https://doi.org/10.48550/arXiv.2312.14677
[20]
Xinjing Liu, Zhuo Ma, Yang Liu, Zhan Qin, Junwei Zhang, and Zhuzhu Wang. 2022. SeInspect: Defending Model Stealing via Heterogeneous Semantic Inspection. In Computer Security-ESORICS 2022: 27th European Symposium on Research in Computer Security, Copenhagen, Denmark, September 26-30, 2022, Proceedings, Part I. Springer, 610--630.
[21]
Zhuo Ma, Xinjing Liu, Yang Liu, Ximeng Liu, Zhan Qin, and Kui Ren. 2023. DivTheft: An Ensemble Model Stealing Attack by Divide-and-Conquer. IEEE Transactions on Dependable and Secure Computing 20, 6 (2023), 4810--4822.
[22]
Smitha Milli, Ludwig Schmidt, Anca D Dragan, and Moritz Hardt. 2019. Model reconstruction from model explanations. In Proceedings of the Conference on Fairness, Accountability, and Transparency. 1--9.
[23]
Jaron Mink, Hadger Benkraouda, Limin Yang, Arridhana Ciptadi, Ali Ahmadzadeh, Daniel Votipka, and Gang Wang. 2023. Everybody' s Got ML, Tell Me What Else You Have: Practitioners' Perception of ML-Based Security Tools and Explanations. In Proc. of IEEE S&P 2023. IEEE Computer Society, 2068--2085.
[24]
Yuval Netzer, Tao Wang, Adam Coates, Alessandro Bissacco, Bo Wu, and Andrew Y Ng. 2011. Reading digits in natural images with unsupervised feature learning. (2011).
[25]
Daryna Oliynyk, Rudolf Mayer, and Andreas Rauber. 2023. I know what you trained last summer: A survey on stealing machine learning models and defences. Comput. Surveys (2023).
[26]
Tribhuvanesh Orekondy, Bernt Schiele, and Mario Fritz. 2019. Prediction Poisoning: Towards Defenses Against DNN Model Stealing Attacks. In International Conference on Learning Representations.
[27]
David Rolnick and Konrad Kording. 2020. Reverse-engineering deep ReLU networks. In International Conference on Machine Learning. PMLR, 8178--8187.
[28]
Ahmed Salem, Yang Zhang, Mathias Humbert, Pascal Berrang, Mario Fritz, and Michael Backes. 2018. Ml-leaks: Model and data independent membership inference attacks and defenses on machine learning models. arXiv preprint arXiv:1806.01246 (2018).
[29]
Sunandini Sanyal, Sravanti Addepalli, and R Venkatesh Babu. 2022. Towards Data-Free Model Stealing in a Hard Label Setting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 15284--15293.
[30]
Zeyang Sha, Xinlei He, Ning Yu, Michael Backes, and Yang Zhang. 2023. Can't Steal? Cont-Steal! Contrastive Stealing Attacks Against Image Encoders. In Proc. CVPR 2023. 16373--16383.
[31]
Reza Shokri, Marco Stronati, Congzheng Song, and Vitaly Shmatikov. 2017. Membership inference attacks against machine learning models. In 2017 IEEE symposium on security and privacy (SP). IEEE, 3--18.
[32]
Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. 2013. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034 (2013).
[33]
Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. 2017. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825 (2017).
[34]
Mukund Sundararajan, Ankur Taly, and Qiqi Yan. 2017. Axiomatic attribution for deep networks. arXiv preprint arXiv:1703.01365 (2017).
[35]
Sebastian Szyller, Buse Gul Atli, Samuel Marchal, and N Asokan. 2021. Dawn: Dynamic adversarial watermarking of neural networks. In Proceedings of the 29th ACM International Conference on Multimedia. 4417--4425.
[36]
Jingxuan Tan, Nan Zhong, Zhenxing Qian, Xinpeng Zhang, and Sheng Li. 2023. Deep Neural Network Watermarking against Model Extraction Attack. In Proceedings of the 31st ACM International Conference on Multimedia (, Ottawa ON, Canada,) (MM '23). Association for Computing Machinery, New York, NY, USA, 1588--1597. https://doi.org/10.1145/3581783.3612515
[37]
Ruixiang Tang, Hongye Jin, Mengnan Du, Curtis Wigington, Rajiv Jain, and Xia Hu. 2023. Exposing Model Theft: A Robust and Transferable Watermark for Thwarting Model Extraction Attacks. In Proceedings of the 32nd ACM International Conference on Information and Knowledge Management (, Birmingham, United Kingdom,) (CIKM '23). Association for Computing Machinery, New York, NY, USA, 4315--4319. https://doi.org/10.1145/3583780.3615211
[38]
Masataka Tasumi, Kazuki Iwahana, Naoto Yanai, Katsunari Shishido, Toshiya Shimizu, Yuji Higuchi, Ikuya Morikawa, and Jun Yajima. 2021. First to Possess His Statistics: Data-Free Model Extraction Attack on Tabular Data. arXiv preprint arXiv:2109.14857 (2021).
[39]
Florian Tramèr, Fan Zhang, Ari Juels, Michael K Reiter, and Thomas Ristenpart. 2016. Stealing machine learning models via prediction apis. In 25th USENIX Security Symposium (USENIX Security 16). 601--618.
[40]
Jean-Baptiste Truong, Pratyush Maini, Robert J. Walls, and Nicolas Papernot. 2021. Data-Free Model Extraction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[41]
Bas HM Van der Velden, Hugo J Kuijf, Kenneth GA Gilhuijs, and Max A Viergever. 2022. Explainable artificial intelligence (XAI) in deep learning-based medical image analysis. Medical Image Analysis (2022), 102470.
[42]
Sandra Wachter, Brent Mittelstadt, and Chris Russell. 2017. Counterfactual explanations without opening the black box: Automated decisions and the GDPR. Harv. JL & Tech. 31 (2017), 841.
[43]
Yongjie Wang, Hangwei Qian, and Chunyan Miao. 2022. DualCF: Efficient Model Extraction Attack from Counterfactual Explanations. arXiv preprint arXiv:2205.06504 (2022).
[44]
Han Xiao, Kashif Rasul, and Roland Vollgraf. 2017. Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:1708.07747 (2017).
[45]
Yi Xie, Mengdie Huang, Xiaoyu Zhang, Changyu Dong, Willy Susilo, and Xiaofeng Chen. 2022. GAME: Generative-Based Adaptive Model Extraction Attack. In Computer Security-ESORICS 2022: 27th European Symposium on Research in Computer Security, Copenhagen, Denmark, September 26-30, 2022, Proceedings, Part I. Springer, 570--588.
[46]
Anli Yan, Ruitao Hou, Hongyang Yan, and Xiaozhang Liu. 2023. Explanation-based data-free model extraction attacks. World Wide Web 26, 5 (2023), 3081--3092.
[47]
Anli Yan, Teng Huang, Lishan Ke, Xiaozhang Liu, Qi Chen, and Changyu Dong. 2023. Explanation leaks: Explanation-guided model extraction attacks. Information Sciences 632 (2023), 269--284.
[48]
Zhenrui Yue, Zhankui He, Huimin Zeng, and Julian J. McAuley. 2021. BlackBox Attacks on Sequential Recommenders via Data-Free Model Extraction. In Proceedings of Conference on Recommender Systems (RecSys). ACM, 44--54.
[49]
Zhanyuan Zhang, Yizheng Chen, and David Wagner. 2021. SEAT: Similarity Encoder by Adversarial Training for Detecting Model Extraction Attack Queries. In Proc. of AISec 2021. ACM, 37--48.
[50]
Huadi Zheng, Qingqing Ye, Haibo Hu, Chengfang Fang, and Jie Shi. 2019. BDPL: A Boundary Differentially Private Layer Against Machine Learning Model Extraction Attacks. In Proc. of ESORICS 2019 (LNCS, Vol. 11735). Springer, 66--83.
[51]
Huadi Zheng, Qingqing Ye, Haibo Hu, Chengfang Fang, and Jie Shi. 2022. Protecting Decision Boundary of Machine Learning Model With Differentially Private Perturbation. IEEE Transactions on Dependable and Secure Computing 19, 3 (2022), 2007--2022.

Cited By

View all
  • (2024)Continual Learning From a Stream of APIsIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2024.346087146:12(11432-11445)Online publication date: Dec-2024

Index Terms

  1. MEGEX: Data-Free Model Extraction Attack Against Gradient-Based Explainable AI

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      SecTL '24: Proceedings of the 2nd ACM Workshop on Secure and Trustworthy Deep Learning Systems
      July 2024
      69 pages
      ISBN:9798400706912
      DOI:10.1145/3665451
      This work is licensed under a Creative Commons Attribution International 4.0 License.

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 23 July 2024

      Check for updates

      Author Tags

      1. Data-Free Model Extraction
      2. Explainable AI
      3. Vanilla Gradient

      Qualifiers

      • Research-article
      • Research
      • Refereed limited

      Conference

      ASIA CCS '24
      Sponsor:

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)295
      • Downloads (Last 6 weeks)116
      Reflects downloads up to 27 Nov 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Continual Learning From a Stream of APIsIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2024.346087146:12(11432-11445)Online publication date: Dec-2024

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Login options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media