research-article

Why Attentions May Not Be Interpretable?

Authors:

Fei WangAuthors Info & Claims

KDD '21: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining

Pages 25 - 34

https://doi.org/10.1145/3447548.3467307

Published: 14 August 2021 Publication History

Abstract

Attention-based methods have played important roles in model interpretations, where the calculated attention weights are expected to highlight the critical parts of inputs (e.g., keywords in sentences). However, recent research found that attention-as-importance interpretations often do not work as we expected. For example, learned attention weights sometimes highlight less meaningful tokens like "[SEP]", ",", and ".", and are frequently uncorrelated with other feature importance indicators like gradient-based measures. A recent debate over whether attention is an explanation or not has drawn considerable interest. In this paper, we demonstrate that one root cause of this phenomenon is the combinatorial shortcuts, which means that, in addition to the highlighted parts, the attention weights themselves may carry extra information that could be utilized by downstream models after attention layers. As a result, the attention weights are no longer pure importance indicators. We theoretically analyze combinatorial shortcuts, design one intuitive experiment to show their existence, and propose two methods to mitigate this issue. We conduct empirical studies on attention-based interpretation models. The results show that the proposed methods can effectively improve the interpretability of attention mechanisms.

Supplementary Material

MP4 File (why_attentions_may_not_be-bing_bai-jian_liang-38957881-yYSn.mp4)

Presentation video for paper "Why Attentions May Not Be Interpretable?"

Download
100.52 MB

References

[1]

Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural Machine Translation by Jointly Learning to Align and Translate. In Proceedings of the International Conference on Learning Representations.

[2]

Bing Bai, Guanhua Zhang, Ye Lin, Hao Li, Kun Bai, and Bo Luo. 2020. CSRN: Collaborative Sequential Recommendation Networks for News Retrieval. arXiv preprint arXiv:2004.04816 (2020).

[3]

Seojin Bang, Pengtao Xie, Wei Wu, and Eric Xing. 2019. Explaining a Black-box using Deep Variational Information Bottleneck Approach. arXiv preprint arXiv:1902.06918 (2019).

[4]

Yonatan Belinkov and James Glass. 2019. Analysis methods in neural language processing: A survey. Transactions of the Association for Computational Linguistics, Vol. 7 (2019), 49--72.

[5]

Tathagata Chakraborti, Anagha Kulkarni, Sarath Sreedharan, David E Smith, and Subbarao Kambhampati. 2019. Explicability? Legibility? Predictability? Transparency? Privacy? Security? The Emerging Landscape of Interpretable Agent Behavior. In Proceedings of the International Conference on Automated Planning and Scheduling, Vol. 29(1). 86--96.

[6]

Jianbo Chen, Le Song, Martin Wainwright, and Michael Jordan. 2018. Learning to Explain: An Information-Theoretic Perspective on Model Interpretation. In International Conference on Machine Learning. 883--892.

[7]

Edward Choi, Mohammad Taha Bahadori, Jimeng Sun, Joshua Kulas, Andy Schuetz, and Walter Stewart. 2016. RETAIN: An Interpretable Predictive Model for Healthcare Using Reverse Time Attention Mechanism. In Advances in Neural Information Processing Systems. 3504--3512.

Digital Library

[8]

Kevin Clark, Urvashi Khandelwal, Omer Levy, and Christopher D Manning. 2019. What Does BERT Look at? An Analysis of BERT's Attention. In Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP. 276--286.

[9]

Ashkan Ertefaie and David A Stephens. 2010. Comparing Approaches to Causal Inference for Longitudinal Data: Inverse Probability Weighting Versus Propensity Scores. The International Journal of Biostatistics, Vol. 6, 2 (2010).

[10]

Wei Fan, Ian Davidson, Bianca Zadrozny, and Philip S Yu. 2005. An Improved Categorization of Classifier's Sensitivity on Sample Selection Bias. In Proceedings of the Fifth IEEE International Conference on Data Mining. 605--608.

Digital Library

[11]

Kun Fu, Junqi Jin, Runpeng Cui, Fei Sha, and Changshui Zhang. 2016. Aligning Where to See and What to Tell: Image Captioning with Region-based Attention and Scene-specific Contexts. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 39, 12 (2016), 2321--2334.

[12]

Sarthak Jain and Byron C Wallace. 2019. Attention is Not Explanation. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 3543--3556.

[13]

Eric Jang, Shixiang Gu, and Ben Poole. 2017. Categorical Reparametrization with Gumble-Softmax. In International Conference on Learning Representations.

[14]

Siwei Lai, Liheng Xu, Kang Liu, and Jun Zhao. 2015. Recurrent Convolutional Neural Networks for Text Classification. In Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence. 2267--2273.

[15]

Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. 1998. Gradient-based Learning Applied to Document Recognition. Proc. IEEE, Vol. 86, 11 (1998), 2278--2324.

[16]

Yanwei Li, Xinze Chen, Zheng Zhu, Lingxi Xie, Guan Huang, Dalong Du, and Xingang Wang. 2019. Attention-guided Unified Network for Panoptic Segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7026--7035.

[17]

Jian Liang, Bing Bai, Yuren Cao, Kun Bai, and Fei Wang. 2020. Adversarial Infidelity Learning for Model Interpretation. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 286--296.

Digital Library

[18]

Zachary C Lipton. 2018. The Mythos of Model Interpretability. Queue, Vol. 16, 3 (2018), 31--57.

Digital Library

[19]

Andrew L Maas, Raymond E Daly, Peter T Pham, Dan Huang, Andrew Y Ng, and Christopher Potts. 2011. Learning Word Vectors for Sentiment Analysis. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-volume 1. Association for Computational Linguistics, 142--150.

Digital Library

[20]

Andre Martins and Ramon Astudillo. 2016. From Softmax to Sparsemax: A Sparse Model of Attention and Multi-label Classification. In International Conference on Machine Learning. 1614--1623.

[21]

Jeffrey Pennington, Richard Socher, and Christopher D Manning. 2014. Glove: Global Vectors for Word Representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). 1532--1543.

[22]

Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. "Why Should I Trust You?": Explaining the Predictions of Any Classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1135--1144.

Digital Library

[23]

Paul R Rosenbaum and Donald B Rubin. 1983. The Central Role of the Propensity Score in Observational Studies for Causal Effects. Biometrika, Vol. 70, 1 (1983), 41--55.

[24]

Donald B Rubin. 1974. Estimating Causal Effects of Treatments in Randomized and Nonrandomized Studies. Journal of Educational Psychology, Vol. 66, 5 (1974), 688.

[25]

Patrick Schwab and Walter Karlen. 2019. CXPlain: Causal Explanations for Model Interpretation Under Uncertainty. In Advances in Neural Information Processing Systems. 10220--10230.

[26]

Sofia Serrano and Noah A Smith. 2019. Is Attention Interpretable?. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2931--2951.

[27]

William R Shadish, Margaret H Clark, and Peter M Steiner. 2008. Can Nonrandomized Experiments Yield Accurate Answers? A Randomized Experiment Comparing Random and Nonrandom Assignments. Journal of the American statistical association, Vol. 103, 484 (2008), 1334--1344.

[28]

Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. 2013. Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps. arXiv preprint arXiv:1312.6034 (2013).

[29]

Vladimir Vapnik. 1992. Principles of Risk Minimization for Learning Theory. In Advances in Neural Information Processing Systems. 831--838.

[30]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is All You Need. In Advances in Neural Information Processing Systems. 5998--6008.

[31]

Oriol Vinyals, Łukasz Kaiser, Terry Koo, Slav Petrov, Ilya Sutskever, and Geoffrey Hinton. 2015. Grammar as a Foreign Language. In Advances in Neural Information Processing Systems. 2773--2781.

[32]

Fei Wang, Rainu Kaushal, and Dhruv Khullar. 2019. Should Health Care Demand Interpretable Artificial Intelligence or Accept "Black Box" Medicine? Annals of Internal Medicine (2019).

[33]

Yequan Wang, Minlie Huang, Xiaoyan Zhu, and Li Zhao. 2016. Attention-based LSTM for Aspect-level Sentiment Classification. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. 606--615.

[34]

Sarah Wiegreffe and Yuval Pinter. 2019. Attention is not not Explanation. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 11--20.

[35]

Christopher Winship and Stephen L Morgan. 1999. The Estimation of Causal Effects From Observational Data. Annual review of sociology, Vol. 25, 1 (1999), 659--706.

[36]

Han Xiao, Kashif Rasul, and Roland Vollgraf. 2017. Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms. arXiv preprint arXiv:1708.07747 (2017).

[37]

Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhudinov, Rich Zemel, and Yoshua Bengio. 2015. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention. In International Conference on Machine Learning. 2048--2057.

[38]

Mo Yu, Shiyu Chang, Yang Zhang, and Tommi Jaakkola. 2019. Rethinking Cooperative Rationalization: Introspective Extraction and Complement Control. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 4085--4094.

[39]

Bianca Zadrozny. 2004. Learning and Evaluating Classifiers Under Sample Selection Bias. In Proceedings of the Twenty-first International Conference on Machine Learning. 114.

Digital Library

[40]

Guanhua Zhang, Bing Bai, Jian Liang, Kun Bai, Shiyu Chang, Mo Yu, Conghui Zhu, and Tiejun Zhao. 2019. Selection Bias Explorations and Debias Methods for Natural Language Sentence Matching Datasets. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 4418--4429.

[41]

Guanhua Zhang, Bing Bai, Junqi Zhang, Kun Bai, Conghui Zhu, and Tiejun Zhao. 2020 b. Demographics Should Not Be the Reason of Toxicity: Mitigating Discrimination in Text Classifications with Instance Weighting. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 4134--4145.

[42]

Junqi Zhang, Bing Bai, Ye Lin, Jian Liang, Kun Bai, and Fei Wang. 2020 a. General-Purpose User Embeddings based on Mobile App Usage. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2831--2840.

Digital Library

[43]

Xiang Zhang, Junbo Zhao, and Yann LeCun. 2015. Character-level Convolutional Networks for Text Classification. In Advances in Neural Information Processing Systems. 649--657.

[44]

Feng Zhu, Hongsheng Li, Wanli Ouyang, Nenghai Yu, and Xiaogang Wang. 2017. Learning Spatial Regularization with Image-level Supervisions for Multi-label Image Classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5513--5522.

Cited By

Luo SIvison HHan SPoon J(2024)Local Interpretations for Explainable Natural Language Processing: A SurveyACM Computing Surveys10.1145/364945056:9(1-36)Online publication date: 25-Apr-2024
https://dl.acm.org/doi/10.1145/3649450
Zhao HChen HYang FLiu NDeng HCai HWang SYin DDu M(2024)Explainability for Large Language Models: A SurveyACM Transactions on Intelligent Systems and Technology10.1145/363937215:2(1-38)Online publication date: 22-Feb-2024
https://dl.acm.org/doi/10.1145/3639372
Chen VYang MCui WKim JTalwalkar AMa J(2024)Applying interpretable machine learning in computational biology—pitfalls, recommendations and opportunities for new developmentsNature Methods10.1038/s41592-024-02359-721:8(1454-1461)Online publication date: 9-Aug-2024
https://doi.org/10.1038/s41592-024-02359-7
Show More Cited By

Index Terms

Why Attentions May Not Be Interpretable?
1. Computing methodologies
  1. Machine learning
    1. Machine learning algorithms
      1. Feature selection
    2. Machine learning approaches
      1. Neural networks

Recommendations

Corpus-level and Concept-based Explanations for Interpretable Document Classification
Using attention weights to identify information that is important for models’ decision making is a popular approach to interpret attention-based neural networks. This is commonly realized in practice through the generation of a heat-map for every single ...
A Novel Deep Learning Based Attention Mechanism for Android Malware Detection and Explanation
ICCCM '22: Proceedings of the 10th International Conference on Computer and Communications Management

With the popularity of Android mobile devices and the increase of related applications, hackers regard it as the primary attack target. Therefore, malware detection is essential nowadays, and many of these studies employ deep learning techniques. In ...
Building interpretable models for business process prediction using shared and specialised attention mechanisms
Abstract
Predictive process analytics, often underpinned by deep learning techniques, is a newly emerged discipline dedicated for providing business process intelligence in modern organisations. Whilst accuracy has been a dominant criterion in ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

KDD '21: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining

August 2021

4259 pages

ISBN:9781450383325

DOI:10.1145/3447548

General Chairs:
Feida Zhu
Singapore Management University
,
Beng Chin Ooi
National University of Singapore
,
Chunyan Miao
Nanyang Technology University
,
Program Chairs:
Haixun Wang,
Iryna Skrypnyk,
Wynne Hsu,
Sanjay Chawla

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 August 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

KDD '21

Sponsor:

KDD '21: The 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 14 - 18, 2021

Virtual Event, Singapore

Acceptance Rates

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

17
Total Citations
View Citations
952
Total Downloads

Downloads (Last 12 months)117
Downloads (Last 6 weeks)16

Reflects downloads up to 21 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Luo SIvison HHan SPoon J(2024)Local Interpretations for Explainable Natural Language Processing: A SurveyACM Computing Surveys10.1145/364945056:9(1-36)Online publication date: 25-Apr-2024
https://dl.acm.org/doi/10.1145/3649450
Zhao HChen HYang FLiu NDeng HCai HWang SYin DDu M(2024)Explainability for Large Language Models: A SurveyACM Transactions on Intelligent Systems and Technology10.1145/363937215:2(1-38)Online publication date: 22-Feb-2024
https://dl.acm.org/doi/10.1145/3639372
Chen VYang MCui WKim JTalwalkar AMa J(2024)Applying interpretable machine learning in computational biology—pitfalls, recommendations and opportunities for new developmentsNature Methods10.1038/s41592-024-02359-721:8(1454-1461)Online publication date: 9-Aug-2024
https://doi.org/10.1038/s41592-024-02359-7
Zhang YPaquette LBosch N(2024)Conditional and Marginal Strengths of Affect Transitions During Computer-Based LearningInternational Journal of Artificial Intelligence in Education10.1007/s40593-024-00430-0Online publication date: 26-Oct-2024
https://doi.org/10.1007/s40593-024-00430-0
Wang CZhou Y(2024)Rethinking the role of attention mechanism: a causality perspectiveApplied Intelligence10.1007/s10489-024-05279-354:2(1862-1878)Online publication date: 26-Jan-2024
https://doi.org/10.1007/s10489-024-05279-3
Liu YShao HBai B(2024)A Novel Convolutional Neural Network Architecture with a Continuous SymmetryArtificial Intelligence10.1007/978-981-99-9119-8_28(310-321)Online publication date: 3-Feb-2024
https://doi.org/10.1007/978-981-99-9119-8_28
Kang SVankerschaver JOzbulak U(2024)Identifying Critical Tokens for Accurate Predictions in Transformer-Based Medical Imaging ModelsMachine Learning in Medical Imaging10.1007/978-3-031-73290-4_17(169-179)Online publication date: 23-Oct-2024
https://doi.org/10.1007/978-3-031-73290-4_17
Wei CWang YBai BNi KBrady DFang LKrause ABrunskill ECho KEngelhardt BSabato SScarlett J(2023)Boosting graph contrastive learning via graph contrastive saliencyProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3619940(36839-36855)Online publication date: 23-Jul-2023
https://dl.acm.org/doi/10.5555/3618408.3619940
Wang YSachurengui SZhang DAziguli Wulamu Bao H(2023)A Factor Marginal Effect Analysis Approach and Its Application in E-Commerce Search SystemInternational Journal of Intelligent Systems10.1155/2023/69688542023Online publication date: 11-Oct-2023
https://dl.acm.org/doi/10.1155/2023/6968854
Cau FHauptmann HSpano LTintarev N(2023)Effects of AI and Logic-Style Explanations on Users’ Decisions Under Different Levels of UncertaintyACM Transactions on Interactive Intelligent Systems10.1145/358832013:4(1-42)Online publication date: 8-Dec-2023
https://dl.acm.org/doi/10.1145/3588320
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents