Recent Developments on Accountability and Explainability for Complex Reasoning Tasks

Pepa Atanasova²

266 Accesses

Abstract

This chapter delves into the recent accountability tools tailored for the evolving landscape of machine learning models for complex reasoning tasks. With the increasing integration of language models into real-world scenarios, concerns related to misuse, biases, adversarial manipulations, and unintended behaviors have increased further. Next, the chapter explores recent advancements in explainability techniques for complex reasoning tasks. Three key areas are highlighted: interactive explanations, logical reasoning, and textual explanations through chain-of-thought reasoning. Finally, the chapter delves into diagnostic explainability methods, emphasizing recent developments in assessing the quality of natural language explanations and chain-of-thought reasoning. The faithfulness of explanations is explored through innovative tests. Information-theoretic measures are proposed to evaluate text explanation methods, providing insights into the information flow through explanation generation architectures. The chapter concludes by acknowledging the ongoing challenges and emphasizing the importance of continued research and development in these critical areas to foster responsible and accountable use of language models in real-world applications.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Abdelnabi S, Fritz M (2023) Fact-saboteurs: a taxonomy of evidence manipulation attacks against fact-verification systems. In: 32nd USENIX Security Symposium (USENIX Security 23), pp 6719–6736
Google Scholar
Aly R, Strong M, Vlachos A (2023) QA-NatVer: question answering for natural logic-based fact verification. arXiv preprint arXiv:231014198
Google Scholar
Angeli G, Manning CD (2014) NaturalLI: natural logic inference for common sense reasoning. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Association for Computational Linguistics, Doha, Qatar, pp 534–545. https://doi.org/10.3115/v1/D14-1059. https://aclanthology.org/D14-1059
Arora S, Pruthi D, Sadeh N, Cohen WW, Lipton ZC, Neubig G (2022) Explain, edit, and understand: rethinking user study design for evaluating model explanations. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol 36, pp 5277–5285
Article Google Scholar
Atanasova P, Camburu OM, Lioma C, Lukasiewicz T, Simonsen JG, Augenstein I (2023) Faithfulness tests for natural language explanations. In: Rogers A, Boyd-Graber J, Okazaki N (eds) Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), Association for Computational Linguistics, Toronto, Canada, pp 283–294. https://doi.org/10.18653/v1/2023.acl-short.25. https://aclanthology.org/2023.acl-short.25
Augenstein I, Baldwin T, Cha M, Chakraborty T, Ciampaglia GL, Corney D, DiResta R, Ferrara E, Hale S, Halevy A, Hovy E, Ji H, Menczer F, Miguez R, Nakov P, Scheufele D, Sharma S, Zagni G (2023) Factuality challenges in the era of large language models. 2310.05189
Google Scholar
Chen H, Feng S, Ganhotra J, Wan H, Gunasekara C, Joshi S, Ji Y (2021) Explaining neural network predictions on sentence pairs via learning word-group masks. In: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Association for Computational Linguistics, Online, pp 3917–3930. https://doi.org/10.18653/v1/2021.naacl-main.306. https://aclanthology.org/2021.naacl-main.306
Choudhury SR, Atanasova P, Augenstein I (2023) Explaining interactions between text spans. In: Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics
Google Scholar
Cohen R, Hamri M, Geva M, Globerson A (2023) LM vs LM: detecting factual errors via cross examination. arXiv preprint arXiv:230513281
Google Scholar
Dhuliawala S, Komeili M, Xu J, Raileanu R, Li X, Celikyilmaz A, Weston J (2023) Chain-of-verification reduces hallucination in large language models. arXiv preprint arXiv:230911495
Google Scholar
Hacker P, Engel A, Mauer M (2023) Regulating ChatGPT and other large generative ai models. In: Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency, pp 1112–1123
Google Scholar
Hao Y, Dong L, Wei F, Xu K (2021) Self-attention attribution: interpreting information interactions inside transformer. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol 35, pp 12963–12971
Article Google Scholar
Henderson P, Mitchell E, Manning C, Jurafsky D, Finn C (2023) Self-destructing models: increasing the costs of harmful dual uses of foundation models. In: Proceedings of the 2023 AAAI/ACM Conference on AI, Ethics, and Society, pp 287–296
Google Scholar
Janizek JD, Sturmfels P, Lee SI (2021) Explaining explanations: axiomatic feature interactions for deep networks. J Mach Learn Res 22(1):4687–4740
MathSciNet Google Scholar
Joshi B, Liu Z, Ramnath S, Chan A, Tong Z, Nie S, Wang Q, Choi Y, Ren X (2023) Are machine rationales (not) useful to humans? Measuring and improving human utility of free-text rationales. In: Rogers A, Boyd-Graber J, Okazaki N (eds) Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Association for Computational Linguistics, Toronto, Canada, pp 7103–7128, https://doi.org/10.18653/v1/2023.acl-long.392. https://aclanthology.org/2023.acl-long.392
Krishna A, Riedel S, Vlachos A (2022) Proofver: natural logic theorem proving for fact verification. Trans Assoc Comput Linguist 10:1013–1030
Article Google Scholar
Lanham T, Chen A, Radhakrishnan A, Steiner B, Denison C, Hernandez D, Li D, Durmus E, Hubinger E, Kernion J et al (2023) Measuring faithfulness in chain-of-thought reasoning. arXiv preprint arXiv:230713702
Google Scholar
Lim G, Perrault ST (2023) Xai in automated fact-checking? The benefits are modest and there’s no one-explanation-fits-all. arXiv preprint arXiv:230803372
Google Scholar
Lyu Q, Havaldar S, Stein A, Zhang L, Rao D, Wong E, Apidianaki M, Callison-Burch C (2023) Faithful chain-of-thought reasoning. arXiv preprint arXiv:230113379
Google Scholar
Masoomi A, Hill D, Xu Z, Hersh CP, Silverman EK, Castaldi PJ, Ioannidis S, Dy J (2022) Explanations of black-box models based on directional feature interactions. In: International Conference on Learning Representations. https://openreview.net/forum?id=45Mr7LeKR9
Nauta M, Trienes J, Pathak S, Nguyen E, Peters M, Schmitt Y, Schlötterer J, van Keulen M, Seifert C (2023) From anecdotal evidence to quantitative evaluation methods: a systematic review on evaluating explainable AI. ACM Comput Surv 55(13s):1–42
Article Google Scholar
Pan L, Wu X, Lu X, Luu AT, Wang WY, Kan MY, Nakov P (2023) Fact-checking complex claims with program-guided reasoning. arXiv preprint arXiv:230512744
Google Scholar
Pendyala VS (2022) Why the problem is still unsolved. In: Deep Learning Research Applications for Natural Language Processing, p 41
Google Scholar
Perez E, Huang S, Song F, Cai T, Ring R, Aslanides J, Glaese A, McAleese N, Irving G (2022) Red teaming language models with language models. In: Goldberg Y, Kozareva Z, Zhang Y (eds) Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Abu Dhabi, United Arab Emirates, pp 3419–3448. https://doi.org/10.18653/v1/2022.emnlp-main.225. https://aclanthology.org/2022.emnlp-main.225
Radhakrishnan A, Nguyen K, Chen A, Chen C, Denison C, Hernandez D, Durmus E, Hubinger E, Kernion J, Lukoŝiūtė K, Cheng N, Joseph N, Schiefer N, Rausch O, McCandlish S, Showk SE, Lanham T, Maxwell T, Chandrasekaran V, Hatfield-Dodds Z, Kaplan J, Brauner J, Bowman SR, Perez E (2023) Question decomposition improves the faithfulness of model-generated reasoning. 2307.11768
Google Scholar
Stacey J, Minervini P, Dubossarsky H, Rei M (2022) Logical reasoning with span-level predictions for interpretable and robust NLI models. In: Goldberg Y, Kozareva Z, Zhang Y (eds) Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Abu Dhabi, United Arab Emirates, pp 3809–3823, https://doi.org/10.18653/v1/2022.emnlp-main.251. https://aclanthology.org/2022.emnlp-main.251
Sundararajan M, Dhamdhere K, Agarwal A (2020) The Shapley Taylor interaction index. In: International Conference on Machine Learning, PMLR, pp 9259–9268
Google Scholar
Tsai CP, Yeh CK, Ravikumar P (2023) Faith-Shap: the faithful Shapley interaction index. J Mach Learn Res 24(94):1–42
MathSciNet Google Scholar
Turpin M, Michael J, Perez E, Bowman SR (2023) Language models don’t always say what they think: unfaithful explanations in chain-of-thought prompting. arXiv preprint arXiv:230504388
Google Scholar
Wang H, Shu K (2023) Explainable claim verification via knowledge-grounded reasoning with large language models. arXiv preprint arXiv:231005253
Google Scholar
Wang Y, Reddy RG, Mujahid ZM, Arora A, Rubashevskii A, Geng J, Afzal OM, Pan L, Borenstein N, Pillai A, Augenstein I, Gurevych I, Nakov P (2023) Factcheck-GPT: end-to-end fine-grained document-level fact-checking and correction of LLM output. eprint: 2311.09000
Google Scholar
Wei J, Wang X, Schuurmans D, Bosma M, Xia F, Chi E, Le QV, Zhou D et al (2022) Chain-of-thought prompting elicits reasoning in large language models. Adv Neural Inform Proc Syst 35:24824–24837
Google Scholar
Xiao M, Mayer J (2023) The challenges of machine learning for trust and safety: a case study on misinformation detection. arXiv preprint arXiv:230812215
Google Scholar
Zhao R, Li X, Joty S, Qin C, Bing L (2023) Verify-and-edit: a knowledge-enhanced chain-of-thought framework. arXiv preprint arXiv:230503268
Google Scholar
Zhu Z, Rudzicz F (2023) Measuring information in text explanations. 2310.04557
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Copenhagen, København K, Denmark
Pepa Atanasova

Authors

Pepa Atanasova
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Atanasova, P. (2024). Recent Developments on Accountability and Explainability for Complex Reasoning Tasks. In: Accountable and Explainable Methods for Complex Reasoning over Text. Springer, Cham. https://doi.org/10.1007/978-3-031-51518-7_9

Download citation

DOI: https://doi.org/10.1007/978-3-031-51518-7_9
Published: 06 April 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-51517-0
Online ISBN: 978-3-031-51518-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics