Abstract
The quality of explanations in human-agent interactions is fundamental to the development of trustworthy AI systems. In this paper we study the problem of generating robust contrastive explanations for human-neural multi-agent systems and introduce two novel verification-based algorithms to (i) identify non-robust explanations generated by other methods and (ii) generate contrastive explanations equipped with formal robustness certificates. We present an implementation and evaluate the effectiveness of the approach on two case studies involving neural agents trained on credit scoring and traffic sign recognition tasks.
Work partially supported by the DARPA Assured Autonomy programme (FA8750-18-C-0095), the UK Royal Academy of Engineering (CiET17/18-26) and an Imperial College Research Fellowship awarded to Leofante. This paper is an extended version of [26] presented at AAMAS 2023.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Akintunde, M., Botoeva, E., Kouvaros, P., Lomuscio, A.: Formal verification of neural agents in non-deterministic environments. J. Auton. Agents Multi-Agent Syst. 36(1) (2022)
Barrett, S., Rosenfeld, A., Kraus, S., Stone, P.: Making friends on the fly: cooperating with new teammates. Artif. Intell. 242, 132–171 (2017)
Björkegren, D., Blumenstock, J., Knight, S.: Manipulation-proof machine learning. arXiv preprint arXiv:2004.03865 (2020)
Black, E., Wang, Z., Fredrikson, M.: Consistent counterfactuals for deep models. In: Proceedings of the International Conference on Learning Representations (ICLR22). OpenReview.net (2022)
Botoeva, E., Kouvaros, P., Kronqvist, J., Lomuscio, A., Misener, R.: Efficient verification of neural networks via dependency analysis. In: Proceedings of the 34th AAAI Conference on Artificial Intelligence (AAAI20), pp. 3291–3299. AAAI Press (2020)
Brix, C., Müller, M.N., Bak, S., Johnson, T.T., Liu, C.: First three years of the international verification of neural networks competition (VNN-COMP). arXiv preprint arXiv:2301.05815 (2023)
Byrne, R.: Counterfactuals in explainable artificial intelligence (XAI): evidence from human reasoning. In: Proceedings of the 28th International Joint Conference on Artificial Intelligence, IJCAI19, pp. 6276–6282 (2019)
Dhurandhar, A., et al.: Explanations based on the missing: towards contrastive explanations with pertinent negatives. In: Advances in Neural Information Processing Systems (NeurIPS18), pp. 590–601 (2018)
Dutta, S., Long, J., Mishra, S., Tilli, C., Magazzeni, D.: Robust counterfactual explanations for tree-based ensembles. In: Proceedings of the International Conference on Machine Learning (ICML22). Proceedings of Machine Learning Research, vol. 162, pp. 5742–5756. PMLR (2022)
FICO Community: Explainable Machine Learning Challenge (2019). https://community.fico.com/s/explainable-machine-learning-challenge
Guidotti, D., Leofante, F., Pulina, L., Tacchella, A.: Verification of neural networks: enhancing scalability through pruning. In: Proceedings of the 24th European Conference on Artificial Intelligence (ECAI20), pp. 2505–2512. IOS Press (2020)
Guidotti, D., Pulina, L., Tacchella, A.: pyNeVer: a framework for learning and verification of neural networks. In: Hou, Z., Ganesh, V. (eds.) ATVA 2021. LNCS, vol. 12971, pp. 357–363. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-88885-5_23
Hancox-Li, L.: Robustness in machine learning explanations: does it matter? In: Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency (FAT*20), pp. 640–647. ACM (2020)
Henriksen, P., Hammernik, K., Rueckert, D., Lomuscio, A.: Bias field robustness verification of large neural image classifiers. In: Proceedings of the 32nd British Machine Vision Conference (BMVC21). BMVA Press (2021)
Henriksen, P., Lomuscio, A.: Efficient neural network verification via adaptive refinement and adversarial search. In: Proceedings of the 24th European Conference on Artificial Intelligence (ECAI20), pp. 2513–2520. IOS Press (2020)
Henriksen, P., Lomuscio, A.: DEEPSPLIT: an efficient splitting method for neural network verification via indirect effect analysis. In: Proceedings of the 30th International Joint Conference on Artificial Intelligence (IJCAI21), pp. 2549–2555. ijcai.org (2021)
Jennings, N.R., et al.: Human-agent collectives. Commun. ACM 57(12), 80–88 (2014)
Jiang, J., Leofante, F., Rago, A., Toni, F.: Formalising the robustness of counterfactual explanations for neural networks. In: Proceedings of the 37th AAAI Conference on Artificial Intelligence (AAAI23), pp. 14901–14909. AAAI Press (2023)
Johnson, T., et al.: ARCH-COMP20 category report: artificial intelligence and neural network control systems (AINNCS) for continuous and hybrid systems plants. In: Proceedings of the 7th International Workshop on Applied Verification of Continuous and Hybrid Systems (ARCH20), pp. 107–139. EasyChair (2020)
Karimi, A., Barthe, G., Balle, B., Valera, I.: Model-agnostic counterfactual explanations for consequential decisions. In: Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics (AISTATS20), pp. 895–905. PMLR (2020)
Katz, G., Barrett, C., Dill, D.L., Julian, K., Kochenderfer, M.J.: Reluplex: an efficient SMT solver for verifying deep neural networks. In: Majumdar, R., Kunčak, V. (eds.) CAV 2017. LNCS, vol. 10426, pp. 97–117. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-63387-9_5
Kenny, E., Keane, M.: On generating plausible counterfactual and semi-factual explanations for deep learning. In: Proceedings of the 35th AAAI Conference on Artificial Intelligence, AAAI21, pp. 11575–11585. AAAI Press (2021)
Kouvaros, P., et al.: Formal analysis of neural network-based systems in the aircraft domain. In: Huisman, M., Păsăreanu, C., Zhan, N. (eds.) FM 2021. LNCS, vol. 13047, pp. 730–740. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-90870-6_41
LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015)
Leofante, F., Botoeva, E., Rajani, V.: Counterfactual explanations and model multiplicity: a relational verification view. In: Proceedings of the 20th International Conference on Principles of Knowledge Representation and Reasoning (KR23) (2023, to appear)
Leofante, F., Lomuscio, A.: Towards robust contrastive explanations for human-neural multi-agent systems. In: Proceedings of the International Conference on Autonomous Agents and Multiagent Systems (AAMAS23), pp. 2343–2345. ACM (2023)
Leofante, F., Narodytska, N., Pulina, L., Tacchella, A.: Automated verification of neural networks: advances, challenges and perspectives. CoRR abs/1805.09938 (2018)
Liu, C., Arnon, T., Lazarus, C., Strong, C.A., Barrett, C.W., Kochenderfer, M.J.: Algorithms for verifying deep neural networks. Found. Trends Optim. 4(3–4), 244–404 (2021)
Lomuscio, A., Maganti, L.: An approach to reachability analysis for feed-forward ReLU neural networks. arXiv preprint arXiv:1706.07351 (2017)
Van Looveren, A., Klaise, J.: Interpretable counterfactual explanations guided by prototypes. In: Oliver, N., Pérez-Cruz, F., Kramer, S., Read, J., Lozano, J.A. (eds.) ECML PKDD 2021. LNCS (LNAI), vol. 12976, pp. 650–665. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-86520-7_40
McCloy, R., Byrne, R.: Semifactual “even if’’ thinking. Thinking Reason. 8(1), 41–67 (2002)
Miller, T.: Explanation in artificial intelligence: insights from the social sciences. Artif. Intell. 267, 1–38 (2019)
Mohammadi, K., Karimi, A., Barthe, G., Valera, I.: Scaling guarantees for nearest counterfactual explanations. In: Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society (AIES21), pp. 177–187. ACM (2021)
Mohapatra, J., Weng, T., Chen, P., Liu, S., Daniel, L.: Towards verifying robustness of neural networks against A family of semantic perturbations. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR20), pp. 241–249. IEEE (2020)
Mothilal, R.K., Sharma, A., Tan, C.: Explaining machine learning classifiers through diverse counterfactual explanations. In: Proceedings of the International Conference on Fairness, Accountability, and Transparency (FAT*20), pp. 607–617. ACM (2020)
Pawelczyk, M., Agarwal, C., Joshi, S., Upadhyay, S., Lakkaraju, H.: Exploring counterfactual explanations through the lens of adversarial examples: a theoretical and empirical analysis. In: Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS22). Proceedings of Machine Learning Research, vol. 151, pp. 4574–4594. PMLR (2022)
Pawelczyk, M., Broelemann, K., Kasneci, G.: On counterfactual explanations under predictive multiplicity. In: Proceedings of the 36th Conference on Uncertainty in Artificial Intelligence (UAI20). Proceedings of Machine Learning Research, vol. 124, pp. 809–818. AUAI Press (2020)
Pawelczyk, M., Datta, T., van den Heuvel, J., Kasneci, G., Lakkaraju, H.: Probabilistically robust recourse: navigating the trade-offs between costs and robustness in algorithmic recourse. In: Proceedings of the 11th International Conference on Learning Representations (ICLR23). OpenReview.net (2023)
ProPublica: How We Analyzed the COMPAS Recidivism Algorithm (2016). https://www.propublica.org/article/how-we-analyzed-the-compas-recidivism-algorithm
Pulina, L., Tacchella, A.: An abstraction-refinement approach to verification of artificial neural networks. In: Touili, T., Cook, B., Jackson, P. (eds.) CAV 2010. LNCS, vol. 6174, pp. 243–257. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-14295-6_24
Rosenfeld, A., Richardson, A.: Explainability in human-agent systems. Auton. Agents Multi Agent Syst. 33(6), 673–705 (2019)
Russell, C.: Efficient search for diverse coherent explanations. In: Proceedings of the International Conference on Fairness, Accountability, and Transparency (FAT*19), pp. 20–28. ACM (2019)
Sharma, S., Henderson, J., Ghosh, J.: CERTIFAI: a common framework to provide explanations and analyse the fairness and robustness of black-box models. In: Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society (AIES20), pp. 166–172. ACM (2020)
Slack, D., Hilgard, A., Lakkaraju, H., Singh, S.: Counterfactual explanations can be manipulated. In: Advances in Neural Information Processing Systems 34 (NeurIPS21), pp. 62–75 (2021)
Stallkamp, J., Schlipsing, M., Salmen, J., Igel, C.: The German traffic sign recognition benchmark: a multi-class classification competition. In: Proceedings of the International Joint Conference on Neural Networks (IJCNN11), pp. 1453–1460. IEEE (2011)
Upadhyay, S., Joshi, S., Lakkaraju, H.: Towards robust and reliable algorithmic recourse. In: Advances in Neural Information Processing Systems 34 (NeurIPS21), pp. 16926–16937 (2021)
Ustun, B., Spangher, A., Liu, Y.: Actionable recourse in linear classification. In: Proceedings of the Conference on Fairness, Accountability, and Transparency (FAT*19), pp. 10–19. ACM (2019)
Wachter, S., Mittelstadt, B., Russell, C.: Counterfactual explanations without opening the black box: automated decisions and the GDPR. Harv. JL Tech. 31, 841 (2017)
Wang, S., Pei, K., Whitehouse, J., Yang, J., Jana, S.: Efficient formal safety analysis of neural networks. In: Advances in Neural Information Processing Systems (NeurIPS18), pp. 6367–6377. Curran Associates, Inc. (2018)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Leofante, F., Lomuscio, A. (2023). Robust Explanations for Human-Neural Multi-agent Systems with Formal Verification. In: Malvone, V., Murano, A. (eds) Multi-Agent Systems. EUMAS 2023. Lecture Notes in Computer Science(), vol 14282. Springer, Cham. https://doi.org/10.1007/978-3-031-43264-4_16
Download citation
DOI: https://doi.org/10.1007/978-3-031-43264-4_16
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-43263-7
Online ISBN: 978-3-031-43264-4
eBook Packages: Computer ScienceComputer Science (R0)