research-article

Counterfactual Explanations for Reinforcement Learning Agents

Author:

Jasmina GajcinAuthors Info & Claims

AAMAS '23: Proceedings of the 2023 International Conference on Autonomous Agents and Multiagent Systems

Pages 2925 - 2927

Published: 30 May 2023 Publication History

Abstract

Reinforcement learning (RL) algorithms often use neural networks to represent agent's policy, making them difficult to interpret. Counterfactual explanations are human-friendly explanations which offer users actionable advice on how to change their features to obtain a desired output from a black-box model. However, methods for generating counterfactuals in RL ignore the stochastic and sequential nature of RL tasks, and can generate counterfactuals which are difficult to obtain, affecting user effort and trust. My dissertation focuses on developing methods that take into account the complexities of RL framework and provide counterfactual explanations that are easy to reach and confidently produce the desired output

References

[1]

Dan Amir and Ofra Amir. 2018. Highlights: Summarizing agent behavior to people. In Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems. 1168--1176.

[2]

Ziheng Chen, Fabrizio Silvestri, Gabriele Tolomei, He Zhu, Jia Wang, and Hongshik Ahn. 2021. ReLACE: Reinforcement Learning Agent for Counterfactual Explanations of Arbitrary Predictive Models. arXiv preprint arXiv:2110.11960 (2021).

[3]

Susanne Dandl, Christoph Molnar, Martin Binder, and Bernd Bischl. 2020. Multi-objective counterfactual explanations. In International Conference on Parallel Problem Solving from Nature. Springer, 448--469.

Digital Library

[4]

Richard Dazeley, Peter Vamplew, and Francisco Cruz. 2021. Explainable reinforcement learning for Broad-XAI: a conceptual framework and survey. arXiv preprint arXiv:2108.09003 (2021).

[5]

Pim De Haan, Dinesh Jayaraman, and Sergey Levine. 2019. Causal confusion in imitation learning. Advances in Neural Information Processing Systems, Vol. 32 (2019).

[6]

Jasmina Gajcin and Ivana Dusparic. 2022a. Counterfactual Explanations for Reinforcement Learning. arXiv preprint arXiv:2210.11846 (2022).

[7]

Jasmina Gajcin and Ivana Dusparic. 2022b. ReCCoVER: Detecting Causal Confusion for Explainable Reinforcement Learning. In Explainable and Transparent AI and Multi-Agent Systems, Davide Calvaresi, Amro Najjar, Michael Winikoff, and Kary Främling (Eds.). Springer International Publishing, Cham, 38--56.

[8]

Jasmina Gajcin, Rahul Nair, Tejaswini Pedapati, Radu Marinescu, Elizabeth Daly, and Ivana Dusparic. 2021. Contrastive Explanations for Comparing Preferences of Reinforcement Learning Agents. arXiv preprint arXiv:2112.09462 (2021).

[9]

Thibault Laugel, Marie-Jeanne Lesot, Christophe Marsala, Xavier Renard, and Marcin Detyniecki. 2017. Inverse classification for comparison-based interpretability in machine learning. arXiv preprint arXiv:1712.08443 (2017).

[10]

Guiliang Liu, Oliver Schulte, Wang Zhu, and Qingcan Li. 2018. Toward interpretable deep reinforcement learning with linear model u-trees. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 414--429.

[11]

Arnaud Van Looveren and Janis Klaise. 2021. Interpretable counterfactual explanations guided by prototypes. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 650--665.

Digital Library

[12]

Clare Lyle, Amy Zhang, Minqi Jiang, Joelle Pineau, and Yarin Gal. 2021. Resolving Causal Confusion in Reinforcement Learning via Robust Exploration. In Self-Supervision for Reinforcement Learning Workshop-ICLR 2021.

[13]

Prashan Madumal, Tim Miller, Liz Sonenberg, and Frank Vetere. 2020. Explainable reinforcement learning through a causal lens. In Proceedings of the AAAI conference on artificial intelligence, Vol. 34. 2493--2500.

[14]

Ramaravind K Mothilal, Amit Sharma, and Chenhao Tan. 2020. Explaining machine learning classifiers through diverse counterfactual explanations. In Proceedings of the 2020 conference on fairness, accountability, and transparency. 607--617.

Digital Library

[15]

Matthew L Olson, Roli Khanna, Lawrence Neal, Fuxin Li, and Weng-Keen Wong. 2021. Counterfactual state explanations for reinforcement learning agents via generative deep learning. Artificial Intelligence, Vol. 295 (2021), 103455.

[16]

Samira Pouyanfar, Saad Sadiq, Yilin Yan, Haiman Tian, Yudong Tao, Maria Presa Reyes, Mei-Ling Shyu, Shu-Ching Chen, and Sundaraja S Iyengar. 2018. A survey on deep learning: Algorithms, techniques, and applications. ACM Computing Surveys (CSUR), Vol. 51, 5 (2018), 1--36.

Digital Library

[17]

Rafael Poyiadzi, Kacper Sokol, Raul Santos-Rodriguez, Tijl De Bie, and Peter Flach. 2020. FACE: feasible and actionable counterfactual explanations. In Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society. 344--350.

Digital Library

[18]

Erika Puiutta and Eric Veith. 2020. Explainable reinforcement learning: A survey. In International cross-domain conference for machine learning and knowledge extraction. Springer, 77--95.

[19]

Robert-Florian Samoilescu, Arnaud Van Looveren, and Janis Klaise. 2021. Model-agnostic and Scalable Counterfactual Explanations via Reinforcement Learning. arXiv preprint arXiv:2106.02597 (2021).

[20]

Shubham Sharma, Jette Henderson, and Joydeep Ghosh. 2019. Certifai: Counterfactual explanations for robustness, transparency, interpretability, and fairness of artificial intelligence models. arXiv preprint arXiv:1905.07857 (2019).

[21]

Sahil Verma, John Dickerson, and Keegan Hines. 2020. Counterfactual explanations for machine learning: A review. arXiv preprint arXiv:2010.10596 (2020).

[22]

Sandra Wachter, Brent Mittelstadt, and Chris Russell. 2017. Counterfactual explanations without opening the black box: Automated decisions and the GDPR. Harv. JL & Tech., Vol. 31 (2017), 841.

Index Terms

Counterfactual Explanations for Reinforcement Learning Agents
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Reinforcement learning
2. Human-centered computing

Recommendations

Redefining Counterfactual Explanations for Reinforcement Learning: Overview, Challenges and Opportunities
While AI algorithms have shown remarkable success in various fields, their lack of transparency hinders their application to real-life tasks. Although explanations targeted at non-experts are necessary for user trust and human-AI collaboration, the ...
Algorithmic Recourse: from Counterfactual Explanations to Interventions
FAccT '21: Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency

As machine learning is increasingly used to inform consequential decision-making (e.g., pre-trial bail and loan approval), it becomes important to explain how the system arrived at its decision, and also suggest actions to achieve a favorable decision. ...
RACCER: Towards Reachable and Certain Counterfactual Explanations for Reinforcement Learning
AAMAS '24: Proceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems

While reinforcement learning (RL) algorithms have been successfully applied to numerous tasks, their reliance on neural networks makes their behavior difficult to understand and trust. Counterfactual explanations are human-friendly explanations that ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

AAMAS '23: Proceedings of the 2023 International Conference on Autonomous Agents and Multiagent Systems

May 2023

3131 pages

ISBN:9781450394321

General Chairs:
Noa Agmon
Bar-Ilan University, Israel
,
Bo An
Nanyang Technological University, Singapore
,
Program Chairs:
Alessandro Ricci
University of Bologna, Italy
,
William Yeoh
Washington University in St. Louis, USA

Sponsors

Publisher

International Foundation for Autonomous Agents and Multiagent Systems

Richland, SC

Publication History

Published: 30 May 2023

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Science Foundation Ireland

Conference

AAMAS '23

Sponsor:

SIGAI

AAMAS '23: International Conference on Autonomous Agents and Multiagent Systems

May 29 - June 2, 2023

London, United Kingdom

Acceptance Rates

Overall Acceptance Rate 1,155 of 5,036 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
115
Total Downloads

Downloads (Last 12 months)35
Downloads (Last 6 weeks)3

Reflects downloads up to 03 Mar 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten