Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3394486.3403071acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Adversarial Infidelity Learning for Model Interpretation

Published: 20 August 2020 Publication History

Abstract

Model interpretation is essential in data mining and knowledge discovery. It can help understand the intrinsic model working mechanism and check if the model has undesired characteristics. A popular way of performing model interpretation is Instance-wise Feature Selection (IFS), which provides an importance score of each feature representing the data samples to explain how the model generates the specific output. In this paper, we propose a Model-agnostic Effective Efficient Direct (MEED) IFS framework for model interpretation, mitigating concerns about sanity, combinatorial shortcuts, model identifiability, and information transmission. Also, we focus on the following setting: using selected features to directly predict the output of the given model, which serves as a primary evaluation metric for model-interpretation methods. Apart from the features, we involve the output of the given model as an additional input to learn an explainer based on more accurate information. To learn the explainer, besides fidelity, we propose an Adversarial Infidelity Learning (AIL) mechanism to boost the explanation learning by screening relatively unimportant features. Through theoretical and experimental analysis, we show that our AIL mechanism can help learn the desired conditional distribution between selected features and targets. Moreover, we extend our framework by integrating efficient interpretation methods as proper priors to provide a warm start. Comprehensive empirical evaluation results are provided by quantitative metrics and human evaluation to demonstrate the effectiveness and superiority of our proposed method. Our code is publicly available online at https://github.com/langlrsw/MEED.

Supplementary Material

MP4 File (3394486.3403071.mp4)
This video introduces our proposed Model-agnostic Effective Efficient Direct (MEED) IFS framework for model interpretation, mitigating concerns about sanity, combinatorial shortcuts, model identifiability, and information transmission.

References

[1]
Mart'in Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, et al. 2016. Tensorflow: a system for large-scale machine learning. In OSDI, Vol. 16. 265--283.
[2]
Julius Adebayo, Justin Gilmer, Ian Goodfellow, and Been Kim. 2018a. Local explanation methods for deep neural networks lack sensitivity to parameter values. arXiv preprint arXiv:1810.03307 (2018).
[3]
Julius Adebayo, Justin Gilmer, Michael Muelly, Ian Goodfellow, Moritz Hardt, and Been Kim. 2018b. Sanity checks for saliency maps. In Advances in Neural Information Processing Systems. 9505--9515.
[4]
Marco Ancona, Enea Ceolini, Cengiz Öztireli, and Markus Gross. 2017. A unified view of gradient-based attribution methods for deep neural networks. In NIPS 2017-Workshop on Interpreting, Explaining and Visualizing Deep Learning. ETH Zurich.
[5]
Robert Andrews, Joachim Diederich, and Alan B Tickle. 1995. Survey and critique of techniques for extracting rules from trained artificial neural networks. Knowledge-based systems, Vol. 8, 6 (1995), 373--389.
[6]
Sebastian Bach, Alexander Binder, Grégoire Montavon, Frederick Klauschen, Klaus-Robert Müller, and Wojciech Samek. 2015. On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation. PloS one, Vol. 10, 7 (2015), e0130140.
[7]
Seojin Bang, Pengtao Xie, Wei Wu, and Eric Xing. 2019. Explaining a black-box using Deep Variational Information Bottleneck Approach. arXiv preprint arXiv:1902.06918 (2019).
[8]
Tathagata Chakraborti, Anagha Kulkarni, Sarath Sreedharan, David E Smith, and Subbarao Kambhampati. 2019. Explicability? legibility? predictability? transparency? privacy? security? the emerging landscape of interpretable agent behavior. In Proceedings of the International Conference on Automated Planning and Scheduling, Vol. 29. 86--96.
[9]
Jianbo Chen, Le Song, Martin Wainwright, and Michael Jordan. 2018. Learning to Explain: An Information-Theoretic Perspective on Model Interpretation. In International Conference on Machine Learning. 883--892.
[10]
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition. Ieee, 248--255.
[11]
Ann-Kathrin Dombrowski, Maximillian Alber, Christopher Anders, Marcel Ackermann, Klaus-Robert Müller, and Pan Kessel. 2019. Explanations can be manipulated and geometry is to blame. In Advances in Neural Information Processing Systems 32, H. Wallach, H. Larochelle, A. Beygelzimer, F. dtextquotesingle Alché-Buc, E. Fox, and R. Garnett (Eds.). Curran Associates, Inc., 13567--13578. http://papers.nips.cc/paper/9511-explanations-can-be-manipulated-and-geometry-is-to-blame.pdf
[12]
Mengnan Du, Ninghao Liu, and Xia Hu. 2019. Techniques for interpretable machine learning. Commun. ACM, Vol. 63, 1 (2019), 68--77.
[13]
Satoshi Hara, Koichi Ikeno, Tasuku Soma, and Takanori Maehara. 2019. Feature Attribution As Feature Selection. https://openreview.net/forum?id=H1lS8oA5YQ
[14]
Juyeon Heo, Sunghwan Joo, and Taesup Moon. 2019. Fooling Neural Network Interpretations via Adversarial Model Manipulation. In Advances in Neural Information Processing Systems 32, H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett (Eds.). Curran Associates, Inc., 2921--2932. http://papers.nips.cc/paper/8558-fooling-neural-network-interpretations-via-adversarial-model-manipulation.pdf
[15]
Sara Hooker, Dumitru Erhan, Pieter-Jan Kindermans, and Been Kim. 2019. A benchmark for interpretability methods in deep neural networks. In Advances in Neural Information Processing Systems. 9734--9745.
[16]
Andrew G Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. 2017. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017).
[17]
Sarthak Jain and Byron C Wallace. 2019. Attention is not Explanation. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 3543--3556.
[18]
Alexia Jolicoeur-Martineau. 2018. The relativistic discriminator: a key element missing from standard GAN. arXiv preprint arXiv:1807.00734 (2018).
[19]
Ashkan Khakzar, Soroosh Baselizadeh, Saurabh Khanduja, Seong Tae Kim, and Nassir Navab. 2019. Explaining Neural Networks via Perturbing Important Learned Features. arXiv preprint arXiv:1911.11081 (2019).
[20]
Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. 1998. Gradient-based learning applied to document recognition. Proc. IEEE, Vol. 86, 11 (1998), 2278--2324.
[21]
Chen-Yu Lee, Tanmay Batra, Mohammad Haris Baig, and Daniel Ulbricht. 2019. Sliced wasserstein discrepancy for unsupervised domain adaptation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 10285--10295.
[22]
Zachary C Lipton. 2018. The mythos of model interpretability. Queue, Vol. 16, 3 (2018), 31--57.
[23]
Scott M Lundberg and Su-In Lee. 2017. A unified approach to interpreting model predictions. In Advances in Neural Information Processing Systems. 4765--4774.
[24]
Andrew L Maas, Raymond E Daly, Peter T Pham, Dan Huang, Andrew Y Ng, and Christopher Potts. 2011. Learning word vectors for sentiment analysis. In Proceedings of the 49th annual meeting of the association for computational linguistics: Human language technologies-volume 1. Association for Computational Linguistics, 142--150.
[25]
Seongsik Park, Seijoon Kim, Hyeokjun Choe, and Sungroh Yoon. 2019. Fast and efficient information transmission with burst spikes in deep spiking neural networks. In 2019 56th ACM/IEEE Design Automation Conference (DAC). IEEE, 1--6.
[26]
Gregory Plumb, Denali Molitor, and Ameet S Talwalkar. 2018. Model agnostic supervised local explanations. In Advances in Neural Information Processing Systems. 2515--2524.
[27]
Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. Why should i trust you?: Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. ACM, 1135--1144.
[28]
Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention. Springer, 234--241.
[29]
Patrick Schwab and Helmut Hlavacs. 2015. Capturing the essence: Towards the automated generation of transparent behavior models. In Eleventh Artificial Intelligence and Interactive Digital Entertainment Conference .
[30]
Patrick Schwab and Walter Karlen. 2019. CXPlain: Causal explanations for model interpretation under uncertainty. In Advances in Neural Information Processing Systems. 10220--10230.
[31]
Avanti Shrikumar, Peyton Greenside, and Anshul Kundaje. 2017. Learning important features through propagating activation differences. In Proceedings of the 34th International Conference on Machine Learning-Volume 70. JMLR. org, 3145--3153.
[32]
Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. 2013. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034 (2013).
[33]
Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. 2017. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825 (2017).
[34]
J Springenberg, Alexey Dosovitskiy, Thomas Brox, and M Riedmiller. 2015. Striving for Simplicity: The All Convolutional Net. In ICLR (workshop track) .
[35]
Mukund Sundararajan, Ankur Taly, and Qiqi Yan. 2017. Axiomatic attribution for deep networks. In Proceedings of the 34th International Conference on Machine Learning-Volume 70. JMLR. org, 3319--3328.
[36]
Fei Wang, Rainu Kaushal, and Dhruv Khullar. 2019. Should Health Care Demand Interpretable Artificial Intelligence or Accept "Black Box" Medicine? Annals of Internal Medicine (2019).
[37]
Sarah Wiegreffe and Yuval Pinter. 2019. Attention is not not Explanation. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 11--20.
[38]
Han Xiao, Kashif Rasul, and Roland Vollgraf. 2017. Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:1708.07747 (2017).
[39]
Chih-Kuan Yeh, Cheng-Yu Hsieh, Arun Suggala, David I Inouye, and Pradeep K Ravikumar. 2019. On the (In) fidelity and Sensitivity of Explanations. In Advances in Neural Information Processing Systems. 10965--10976.
[40]
Xinyang Zhang, Ningfei Wang, Shouling Ji, Hua Shen, and Ting Wang. 2018. Interpretable Deep Learning under Fire. arXiv preprint arXiv:1812.00891 (2018).
[41]
Suyang Zhu, Shoushan Li, and Guodong Zhou. 2019. Adversarial Attention Modeling for Multi-dimensional Emotion Regression. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 471--480.

Cited By

View all
  • (2023)An Empirical Survey on Explainable AI Technologies: Recent Trends, Use-Cases, and Categories from Technical and Application PerspectivesElectronics10.3390/electronics1205109212:5(1092)Online publication date: 22-Feb-2023
  • (2023)From Anecdotal Evidence to Quantitative Evaluation Methods: A Systematic Review on Evaluating Explainable AIACM Computing Surveys10.1145/358355855:13s(1-42)Online publication date: 13-Jul-2023
  • (2023)Reinforced Causal Explainer for Graph Neural NetworksIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2022.317030245:2(2297-2309)Online publication date: 1-Feb-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
KDD '20: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining
August 2020
3664 pages
ISBN:9781450379984
DOI:10.1145/3394486
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 August 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. adversarial learning
  2. black-box explanations
  3. infidelity
  4. model interpretation

Qualifiers

  • Research-article

Funding Sources

  • The TuringShield team of Tencent
  • AWS Machine Learning for Research Award
  • Google Faculty Research Award

Conference

KDD '20
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)75
  • Downloads (Last 6 weeks)5
Reflects downloads up to 21 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2023)An Empirical Survey on Explainable AI Technologies: Recent Trends, Use-Cases, and Categories from Technical and Application PerspectivesElectronics10.3390/electronics1205109212:5(1092)Online publication date: 22-Feb-2023
  • (2023)From Anecdotal Evidence to Quantitative Evaluation Methods: A Systematic Review on Evaluating Explainable AIACM Computing Surveys10.1145/358355855:13s(1-42)Online publication date: 13-Jul-2023
  • (2023)Reinforced Causal Explainer for Graph Neural NetworksIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2022.317030245:2(2297-2309)Online publication date: 1-Feb-2023
  • (2023)Local-to-Global Causal Reasoning for Cross-Document Relation ExtractionIEEE/CAA Journal of Automatica Sinica10.1109/JAS.2023.12354010:7(1608-1621)Online publication date: Jul-2023
  • (2023)Robin: A Novel Method to Produce Robust Interpreters for Deep Learning-Based Code Classifiers2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE)10.1109/ASE56229.2023.00164(27-39)Online publication date: 11-Sep-2023
  • (2022)Self-explaining deep models with logic rule reasoningProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3600502(3203-3216)Online publication date: 28-Nov-2022
  • (2022)Adversarial Filtering Modeling on Long-term User Behavior Sequences for Click-Through Rate PredictionProceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3477495.3531788(1969-1973)Online publication date: 6-Jul-2022
  • (2022)On Glocal Explainability of Graph Neural NetworksDatabase Systems for Advanced Applications10.1007/978-3-031-00123-9_52(648-664)Online publication date: 11-Apr-2022
  • (2021)Why Attentions May Not Be Interpretable?Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining10.1145/3447548.3467307(25-34)Online publication date: 14-Aug-2021

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media