Nothing Special   »   [go: up one dir, main page]

Skip to main content

A Brief Systematization of Explanation-Aware Attacks

  • Conference paper
  • First Online:
KI 2024: Advances in Artificial Intelligence (KI 2024)

Abstract

Due to the overabundance of trained parameters modern machine learning models are largely considered black boxes. Explanation methods aim to shed light on the inner working of such models, and, thus can serve as debugging tools. However, recent research has demonstrated that carefully crafted manipulations at the input or the model can successfully fool the model and the explanation method. In this work, we briefly present our systematization of such explanation-aware attacks. We categorize them according to three distinct attack types, three types of scopes, and three different capabilities an adversary can have. In our full paper [12], we further present a hierarchy of robustness notion and various defensive techniques tailored toward explanation-aware attacks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 74.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Aïvodji, U., Arai, H., Fortineau, O., Gambs, S., Hara, S., Tapp, A.: Fairwashing: the risk of rationalization. In: Proceedings of the International Conference on Machine Learning (ICML), Proceedings of Machine Learning Research, vol. 97 (2019)

    Google Scholar 

  2. Aïvodji, U., Arai, H., Gambs, S., Hara, S.: Characterizing the risk of fairwashing. In: Proceedings of the Annual Conference on Neural Information Processing Systems (NeurIPS) (2021)

    Google Scholar 

  3. Anders, C.J., Pasliev, P., Dombrowski, A.K., Müller, K.R., Kessel, P.: Fairwashing explanations with off-manifold detergent. In: Proceedings of the International Conference on Machine Learning (ICML), Proceedings of Machine Learning Research, vol. 119 (2020)

    Google Scholar 

  4. Baniecki, H., Biecek, P.: Adversarial attacks and defenses in explainable artificial intelligence: a survey. In: Proceedings of the IJCAI Workshop of Explainable AI (XAI) (2023)

    Google Scholar 

  5. Dombrowski, A.K., Alber, M., Anders, C., Ackermann, M., Müller, K.R., Kessel, P.: Explanations can be manipulated and geometry is to blame. In: Proceedings of the Annual Conference on Neural Information Processing Systems (NeurIPS) (2019)

    Google Scholar 

  6. Fang, S., Choromanska, A.: Backdoor attacks on the DNN interpretation system. In: Proceedings of the National Conference on Artificial Intelligence (AAAI) (2022)

    Google Scholar 

  7. Heo, J., Joo, S., Moon, T.: Fooling neural network interpretations via adversarial model manipulation. In: Proceedings of the Annual Conference on Neural Information Processing Systems (NeurIPS) (2019)

    Google Scholar 

  8. Ivankay, A., Girardi, I., Frossard, P., Marchiori, C.: Fooling explanations in text classifiers. In: Proceedings of the International Conference on Learning Representations (ICLR) (2022)

    Google Scholar 

  9. Lakkaraju, H., Bastani, O.: “How do I fool you?”: manipulating user trust via misleading black box explanations. In: Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society (AIES) (2020)

    Google Scholar 

  10. Noppel, M., Peter, L., Wressnegger, C.: Disguising attacks with explanation-aware backdoors. In: Proceedings of the IEEE Symposium on Security and Privacy (S &P) (2023)

    Google Scholar 

  11. Noppel, M., Wressnegger, C.: Explanation-aware backdoors in a nutshell. In: Proceedings of the German Conference on Artificial Intelligence (KI) (2023)

    Google Scholar 

  12. Noppel, M., Wressnegger, C.: SoK: explainable machine learning in adversarial environments. In: Proceedings of the IEEE Symposium on Security and Privacy (S &P) (2024)

    Google Scholar 

  13. Slack, D., Hilgard, S., Jia, E., Singh, S., Lakkaraju, H.: Fooling LIME and SHAP: adversarial attacks on post hoc explanation methods. In: Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society (AIES) (2020)

    Google Scholar 

  14. Zhang, X., Wang, N., Shen, H., Ji, S., Luo, X., Wang, T.: Interpretable deep learning under fire. In: Proceedings of the USENIX Security Symposium (2020)

    Google Scholar 

Download references

Acknowledgement

The authors gratefully acknowledge funding from the German Federal Ministry of Education and Research (BMBF) under the project DataChainSec (FKZ 16KIS1700) and by the Helmholtz Association (HGF) within topic “46.23 Engineering Secure Systems.”

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Maximilian Noppel .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Noppel, M., Wressnegger, C. (2024). A Brief Systematization of Explanation-Aware Attacks. In: Hotho, A., Rudolph, S. (eds) KI 2024: Advances in Artificial Intelligence. KI 2024. Lecture Notes in Computer Science(), vol 14992 . Springer, Cham. https://doi.org/10.1007/978-3-031-70893-0_30

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-70893-0_30

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-70892-3

  • Online ISBN: 978-3-031-70893-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics