Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3447548.3467307acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Why Attentions May Not Be Interpretable?

Published: 14 August 2021 Publication History

Abstract

Attention-based methods have played important roles in model interpretations, where the calculated attention weights are expected to highlight the critical parts of inputs (e.g., keywords in sentences). However, recent research found that attention-as-importance interpretations often do not work as we expected. For example, learned attention weights sometimes highlight less meaningful tokens like "[SEP]", ",", and ".", and are frequently uncorrelated with other feature importance indicators like gradient-based measures. A recent debate over whether attention is an explanation or not has drawn considerable interest. In this paper, we demonstrate that one root cause of this phenomenon is the combinatorial shortcuts, which means that, in addition to the highlighted parts, the attention weights themselves may carry extra information that could be utilized by downstream models after attention layers. As a result, the attention weights are no longer pure importance indicators. We theoretically analyze combinatorial shortcuts, design one intuitive experiment to show their existence, and propose two methods to mitigate this issue. We conduct empirical studies on attention-based interpretation models. The results show that the proposed methods can effectively improve the interpretability of attention mechanisms.

Supplementary Material

MP4 File (why_attentions_may_not_be-bing_bai-jian_liang-38957881-yYSn.mp4)
Presentation video for paper "Why Attentions May Not Be Interpretable?"

References

[1]
Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural Machine Translation by Jointly Learning to Align and Translate. In Proceedings of the International Conference on Learning Representations.
[2]
Bing Bai, Guanhua Zhang, Ye Lin, Hao Li, Kun Bai, and Bo Luo. 2020. CSRN: Collaborative Sequential Recommendation Networks for News Retrieval. arXiv preprint arXiv:2004.04816 (2020).
[3]
Seojin Bang, Pengtao Xie, Wei Wu, and Eric Xing. 2019. Explaining a Black-box using Deep Variational Information Bottleneck Approach. arXiv preprint arXiv:1902.06918 (2019).
[4]
Yonatan Belinkov and James Glass. 2019. Analysis methods in neural language processing: A survey. Transactions of the Association for Computational Linguistics, Vol. 7 (2019), 49--72.
[5]
Tathagata Chakraborti, Anagha Kulkarni, Sarath Sreedharan, David E Smith, and Subbarao Kambhampati. 2019. Explicability? Legibility? Predictability? Transparency? Privacy? Security? The Emerging Landscape of Interpretable Agent Behavior. In Proceedings of the International Conference on Automated Planning and Scheduling, Vol. 29(1). 86--96.
[6]
Jianbo Chen, Le Song, Martin Wainwright, and Michael Jordan. 2018. Learning to Explain: An Information-Theoretic Perspective on Model Interpretation. In International Conference on Machine Learning. 883--892.
[7]
Edward Choi, Mohammad Taha Bahadori, Jimeng Sun, Joshua Kulas, Andy Schuetz, and Walter Stewart. 2016. RETAIN: An Interpretable Predictive Model for Healthcare Using Reverse Time Attention Mechanism. In Advances in Neural Information Processing Systems. 3504--3512.
[8]
Kevin Clark, Urvashi Khandelwal, Omer Levy, and Christopher D Manning. 2019. What Does BERT Look at? An Analysis of BERT's Attention. In Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP. 276--286.
[9]
Ashkan Ertefaie and David A Stephens. 2010. Comparing Approaches to Causal Inference for Longitudinal Data: Inverse Probability Weighting Versus Propensity Scores. The International Journal of Biostatistics, Vol. 6, 2 (2010).
[10]
Wei Fan, Ian Davidson, Bianca Zadrozny, and Philip S Yu. 2005. An Improved Categorization of Classifier's Sensitivity on Sample Selection Bias. In Proceedings of the Fifth IEEE International Conference on Data Mining. 605--608.
[11]
Kun Fu, Junqi Jin, Runpeng Cui, Fei Sha, and Changshui Zhang. 2016. Aligning Where to See and What to Tell: Image Captioning with Region-based Attention and Scene-specific Contexts. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 39, 12 (2016), 2321--2334.
[12]
Sarthak Jain and Byron C Wallace. 2019. Attention is Not Explanation. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 3543--3556.
[13]
Eric Jang, Shixiang Gu, and Ben Poole. 2017. Categorical Reparametrization with Gumble-Softmax. In International Conference on Learning Representations.
[14]
Siwei Lai, Liheng Xu, Kang Liu, and Jun Zhao. 2015. Recurrent Convolutional Neural Networks for Text Classification. In Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence. 2267--2273.
[15]
Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. 1998. Gradient-based Learning Applied to Document Recognition. Proc. IEEE, Vol. 86, 11 (1998), 2278--2324.
[16]
Yanwei Li, Xinze Chen, Zheng Zhu, Lingxi Xie, Guan Huang, Dalong Du, and Xingang Wang. 2019. Attention-guided Unified Network for Panoptic Segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7026--7035.
[17]
Jian Liang, Bing Bai, Yuren Cao, Kun Bai, and Fei Wang. 2020. Adversarial Infidelity Learning for Model Interpretation. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 286--296.
[18]
Zachary C Lipton. 2018. The Mythos of Model Interpretability. Queue, Vol. 16, 3 (2018), 31--57.
[19]
Andrew L Maas, Raymond E Daly, Peter T Pham, Dan Huang, Andrew Y Ng, and Christopher Potts. 2011. Learning Word Vectors for Sentiment Analysis. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-volume 1. Association for Computational Linguistics, 142--150.
[20]
Andre Martins and Ramon Astudillo. 2016. From Softmax to Sparsemax: A Sparse Model of Attention and Multi-label Classification. In International Conference on Machine Learning. 1614--1623.
[21]
Jeffrey Pennington, Richard Socher, and Christopher D Manning. 2014. Glove: Global Vectors for Word Representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). 1532--1543.
[22]
Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. "Why Should I Trust You?": Explaining the Predictions of Any Classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1135--1144.
[23]
Paul R Rosenbaum and Donald B Rubin. 1983. The Central Role of the Propensity Score in Observational Studies for Causal Effects. Biometrika, Vol. 70, 1 (1983), 41--55.
[24]
Donald B Rubin. 1974. Estimating Causal Effects of Treatments in Randomized and Nonrandomized Studies. Journal of Educational Psychology, Vol. 66, 5 (1974), 688.
[25]
Patrick Schwab and Walter Karlen. 2019. CXPlain: Causal Explanations for Model Interpretation Under Uncertainty. In Advances in Neural Information Processing Systems. 10220--10230.
[26]
Sofia Serrano and Noah A Smith. 2019. Is Attention Interpretable?. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2931--2951.
[27]
William R Shadish, Margaret H Clark, and Peter M Steiner. 2008. Can Nonrandomized Experiments Yield Accurate Answers? A Randomized Experiment Comparing Random and Nonrandom Assignments. Journal of the American statistical association, Vol. 103, 484 (2008), 1334--1344.
[28]
Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. 2013. Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps. arXiv preprint arXiv:1312.6034 (2013).
[29]
Vladimir Vapnik. 1992. Principles of Risk Minimization for Learning Theory. In Advances in Neural Information Processing Systems. 831--838.
[30]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is All You Need. In Advances in Neural Information Processing Systems. 5998--6008.
[31]
Oriol Vinyals, Łukasz Kaiser, Terry Koo, Slav Petrov, Ilya Sutskever, and Geoffrey Hinton. 2015. Grammar as a Foreign Language. In Advances in Neural Information Processing Systems. 2773--2781.
[32]
Fei Wang, Rainu Kaushal, and Dhruv Khullar. 2019. Should Health Care Demand Interpretable Artificial Intelligence or Accept "Black Box" Medicine? Annals of Internal Medicine (2019).
[33]
Yequan Wang, Minlie Huang, Xiaoyan Zhu, and Li Zhao. 2016. Attention-based LSTM for Aspect-level Sentiment Classification. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. 606--615.
[34]
Sarah Wiegreffe and Yuval Pinter. 2019. Attention is not not Explanation. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 11--20.
[35]
Christopher Winship and Stephen L Morgan. 1999. The Estimation of Causal Effects From Observational Data. Annual review of sociology, Vol. 25, 1 (1999), 659--706.
[36]
Han Xiao, Kashif Rasul, and Roland Vollgraf. 2017. Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms. arXiv preprint arXiv:1708.07747 (2017).
[37]
Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhudinov, Rich Zemel, and Yoshua Bengio. 2015. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention. In International Conference on Machine Learning. 2048--2057.
[38]
Mo Yu, Shiyu Chang, Yang Zhang, and Tommi Jaakkola. 2019. Rethinking Cooperative Rationalization: Introspective Extraction and Complement Control. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 4085--4094.
[39]
Bianca Zadrozny. 2004. Learning and Evaluating Classifiers Under Sample Selection Bias. In Proceedings of the Twenty-first International Conference on Machine Learning. 114.
[40]
Guanhua Zhang, Bing Bai, Jian Liang, Kun Bai, Shiyu Chang, Mo Yu, Conghui Zhu, and Tiejun Zhao. 2019. Selection Bias Explorations and Debias Methods for Natural Language Sentence Matching Datasets. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 4418--4429.
[41]
Guanhua Zhang, Bing Bai, Junqi Zhang, Kun Bai, Conghui Zhu, and Tiejun Zhao. 2020 b. Demographics Should Not Be the Reason of Toxicity: Mitigating Discrimination in Text Classifications with Instance Weighting. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 4134--4145.
[42]
Junqi Zhang, Bing Bai, Ye Lin, Jian Liang, Kun Bai, and Fei Wang. 2020 a. General-Purpose User Embeddings based on Mobile App Usage. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2831--2840.
[43]
Xiang Zhang, Junbo Zhao, and Yann LeCun. 2015. Character-level Convolutional Networks for Text Classification. In Advances in Neural Information Processing Systems. 649--657.
[44]
Feng Zhu, Hongsheng Li, Wanli Ouyang, Nenghai Yu, and Xiaogang Wang. 2017. Learning Spatial Regularization with Image-level Supervisions for Multi-label Image Classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5513--5522.

Cited By

View all
  • (2024)Local Interpretations for Explainable Natural Language Processing: A SurveyACM Computing Surveys10.1145/364945056:9(1-36)Online publication date: 25-Apr-2024
  • (2024)Explainability for Large Language Models: A SurveyACM Transactions on Intelligent Systems and Technology10.1145/363937215:2(1-38)Online publication date: 22-Feb-2024
  • (2024)Applying interpretable machine learning in computational biology—pitfalls, recommendations and opportunities for new developmentsNature Methods10.1038/s41592-024-02359-721:8(1454-1461)Online publication date: 9-Aug-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
KDD '21: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining
August 2021
4259 pages
ISBN:9781450383325
DOI:10.1145/3447548
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 August 2021

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. attention mechanism
  2. casual effect estimation
  3. model interpretation

Qualifiers

  • Research-article

Conference

KDD '21
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)117
  • Downloads (Last 6 weeks)16
Reflects downloads up to 21 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Local Interpretations for Explainable Natural Language Processing: A SurveyACM Computing Surveys10.1145/364945056:9(1-36)Online publication date: 25-Apr-2024
  • (2024)Explainability for Large Language Models: A SurveyACM Transactions on Intelligent Systems and Technology10.1145/363937215:2(1-38)Online publication date: 22-Feb-2024
  • (2024)Applying interpretable machine learning in computational biology—pitfalls, recommendations and opportunities for new developmentsNature Methods10.1038/s41592-024-02359-721:8(1454-1461)Online publication date: 9-Aug-2024
  • (2024)Conditional and Marginal Strengths of Affect Transitions During Computer-Based LearningInternational Journal of Artificial Intelligence in Education10.1007/s40593-024-00430-0Online publication date: 26-Oct-2024
  • (2024)Rethinking the role of attention mechanism: a causality perspectiveApplied Intelligence10.1007/s10489-024-05279-354:2(1862-1878)Online publication date: 26-Jan-2024
  • (2024)A Novel Convolutional Neural Network Architecture with a Continuous SymmetryArtificial Intelligence10.1007/978-981-99-9119-8_28(310-321)Online publication date: 3-Feb-2024
  • (2024)Identifying Critical Tokens for Accurate Predictions in Transformer-Based Medical Imaging ModelsMachine Learning in Medical Imaging10.1007/978-3-031-73290-4_17(169-179)Online publication date: 23-Oct-2024
  • (2023)Boosting graph contrastive learning via graph contrastive saliencyProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3619940(36839-36855)Online publication date: 23-Jul-2023
  • (2023)A Factor Marginal Effect Analysis Approach and Its Application in E-Commerce Search SystemInternational Journal of Intelligent Systems10.1155/2023/69688542023Online publication date: 11-Oct-2023
  • (2023)Effects of AI and Logic-Style Explanations on Users’ Decisions Under Different Levels of UncertaintyACM Transactions on Interactive Intelligent Systems10.1145/358832013:4(1-42)Online publication date: 8-Dec-2023
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media