Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3544548.3581373acmconferencesArticle/Chapter ViewAbstractPublication PageschiConference Proceedingsconference-collections
research-article

ESCAPE: Countering Systematic Errors from Machine’s Blind Spots via Interactive Visual Analysis

Published: 19 April 2023 Publication History

Abstract

Classification models learn to generalize the associations between data samples and their target classes. However, researchers have increasingly observed that machine learning practice easily leads to systematic errors in AI applications, a phenomenon referred to as “AI blindspots.” Such blindspots arise when a model is trained with training samples (e.g., cat/dog classification) where important patterns (e.g., black cats) are missing or periphery/undesirable patterns (e.g., dogs with grass background) are misleading towards a certain class. Even more sophisticated techniques cannot guarantee to capture, reason about, and prevent the spurious associations. In this work, we propose ESCAPE, a visual analytic system that promotes a human-in-the-loop workflow for countering systematic errors. By allowing human users to easily inspect spurious associations, the system facilitates users to spontaneously recognize concepts associated misclassifications and evaluate mitigation strategies that can reduce biased associations. We also propose two statistical approaches, relative concept association to better quantify the associations between a concept and instances, and debias method to mitigate spurious associations. We demonstrate the utility of our proposed ESCAPE system and statistical measures through extensive evaluation including quantitative experiments, usage scenarios, expert interviews, and controlled user experiments.

Supplementary Material

MP4 File (3544548.3581373-talk-video.mp4)
Pre-recorded Video Presentation

References

[1]
Yongsu Ahn and Yu-Ru Lin. 2019. FairSight: Visual Analytics for Fairness in Decision Making. IEEE Transactions on Visualization and Computer Graphics (2019), 1–1. https://doi.org/10.1109/TVCG.2019.2934262 arXiv:1908.00176.
[2]
Bilal Alsallakh, Allan Hanbury, Helwig Hauser, Silvia Miksch, and Andreas Rauber. 2014. Visual Methods for Analyzing Probabilistic Classification Data. IEEE Transactions on Visualization and Computer Graphics 20, 12 (Dec. 2014), 1703–1712. https://doi.org/10.1109/TVCG.2014.2346660
[3]
Joshua Attenberg, Panos Ipeirotis, and Foster Provost. 2015. Beat the Machine: Challenging Humans to Find a Predictive Model’s “Unknown Unknowns”. Journal of Data and Information Quality 6, 1 (March 2015), 1–17. https://doi.org/10.1145/2700832
[4]
Gagan Bansal and Daniel Weld. 2018. A coverage-based utility model for identifying unknown unknowns. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32.
[5]
Rick Barraza, Russell Eames, Yan Esteve Balducci, Josh Hinds, Scott Hoogerwerf, Eric Horvitz, Ece Kamar, Jacquelyn Krones, Josh Lovejoy, Parham Mohadjer, 2019. Error terrain analysis for machine learning: Tool and visualizations. In ICLR workshop on debugging machine learning models.
[6]
Jonathan Bischof and Edoardo M Airoldi. 2012. Summarizing topical content with word frequency and exclusivity. In Proceedings of the 29th International Conference on Machine Learning (ICML-12). 201–208.
[7]
Tolga Bolukbasi, Kai-Wei Chang, James Y Zou, Venkatesh Saligrama, and Adam T Kalai. 2016. Man is to computer programmer as woman is to homemaker? debiasing word embeddings. Advances in neural information processing systems 29 (2016).
[8]
Glenn W Brier 1950. Verification of forecasts expressed in terms of probability. Monthly weather review 78, 1 (1950), 1–3.
[9]
Ángel Alexander Cabrera, Abraham J. Druck, Jason I. Hong, and Adam Perer. 2021. Discovering and Validating AI Errors With Crowdsourced Failure Reports. Proceedings of the ACM on Human-Computer Interaction 5, CSCW2 (Oct. 2021), 1–22. https://doi.org/10.1145/3479569
[10]
Ángel Alexander Cabrera, Will Epperson, Fred Hohman, Minsuk Kahng, Jamie Morgenstern, and Duen Horng Chau. 2019. FairVis: Visual Analytics for Discovering Intersectional Bias in Machine Learning. arXiv:1904.05419 [cs, stat] (April 2019). http://arxiv.org/abs/1904.05419 arXiv:1904.05419.
[11]
Ángel Alexander Cabrera, Marco Tulio Ribeiro, Bongshin Lee, Rob DeLine, Adam Perer, and Steven M. Drucker. 2022. What Did My AI Learn? How Data Scientists Make Sense of Model Behavior. ACM Transactions on Computer-Human Interaction (June 2022), 3542921. https://doi.org/10.1145/3542921
[12]
Changjian Chen, Jun Yuan, Yafeng Lu, Yang Liu, Hang Su, Songtao Yuan, and Shixia Liu. 2020. OoDAnalyzer: Interactive Analysis of Out-of-Distribution Samples. arXiv:2002.03103 [cs] (Feb. 2020). http://arxiv.org/abs/2002.03103 arXiv:2002.03103.
[13]
Zhi Chen, Yijie Bei, and Cynthia Rudin. 2020. Concept Whitening for Interpretable Image Recognition. Nature Machine Intelligence 2, 12 (Dec. 2020), 772–782. https://doi.org/10.1038/s42256-020-00265-z arxiv:2002.01650
[14]
Yeounoh Chung, Peter J Haas, Eli Upfal, and Tim Kraska. 2018. Unknown examples & machine learning model generalization. arXiv preprint arXiv:1808.08294(2018).
[15]
Dennis Collaris and Jarke J. van Wijk. 2020. ExplainExplore: Visual Exploration of Machine Learning Explanations. In 2020 IEEE Pacific Visualization Symposium (PacificVis). IEEE, Tianjin, China, 26–35. https://doi.org/10.1109/PacificVis48177.2020.7090
[16]
Kate Crawford and Ryan Calo. 2016. There is a blind spot in AI research. Nature 538, 7625 (2016), 311–313.
[17]
Greg d’Eon, Jason d’Eon, James R. Wright, and Kevin Leyton-Brown. 2021. The Spotlight: A General Method for Discovering Systematic Errors in Deep Learning Models. http://arxiv.org/abs/2107.00758 arXiv:2107.00758 [cs, stat].
[18]
M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman. 2010. The Pascal Visual Object Classes (VOC) Challenge. International Journal of Computer Vision 88, 2 (June 2010), 303–338.
[19]
Sabri Eyuboglu, Maya Varma, Khaled Saab, Jean-Benoit Delbrouck, Christopher Lee-Messer, Jared Dunnmon, James Zou, and Christopher Ré. 2022. Domino: Discovering Systematic Errors with Cross-Modal Embeddings. http://arxiv.org/abs/2203.14960 arXiv:2203.14960 [cs].
[20]
Amirata Ghorbani, James Wexler, James Zou, and Been Kim. 2019. Towards Automatic Concept-based Explanations. arXiv:1902.03129 [cs, stat] (Oct. 2019). http://arxiv.org/abs/1902.03129 arXiv:1902.03129.
[21]
Samuel Gratzl, Nils Gehlenborg, Alexander Lex, Hanspeter Pfister, and Marc Streit. 2014. Domino: Extracting, Comparing, and Manipulating Subsets Across Multiple Tabular Datasets. IEEE Transactions on Visualization and Computer Graphics 20, 12 (Dec. 2014), 2023–2032. https://doi.org/10.1109/TVCG.2014.2346260 Conference Name: IEEE Transactions on Visualization and Computer Graphics.
[22]
Samuel Gratzl, Nils Gehlenborg, Alexander Lex, Hanspeter Pfister, and Marc Streit. 2014. Domino: Extracting, Comparing, and Manipulating Subsets Across Multiple Tabular Datasets. IEEE Transactions on Visualization and Computer Graphics 20, 12 (Dec. 2014), 2023–2032. https://doi.org/10.1109/TVCG.2014.2346260
[23]
Lei Han, Xiao Dong, and Gianluca Demartini. 2021. Iterative Human-in-the-Loop Discovery of Unknown Unknowns in Image Datasets. In Proceedings of the AAAI Conference on Human Computation and Crowdsourcing, Vol. 9. 72–83.
[24]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770–778.
[25]
Fred Hohman, Nathan Hodas, and Duen Horng Chau. 2017. ShapeShop: Towards Understanding Deep Learning Representations via Interactive Experimentation. In Proceedings of the 2017 CHI Conference Extended Abstracts on Human Factors in Computing Systems - CHI EA ’17. ACM Press, Denver, Colorado, USA, 1694–1699. https://doi.org/10.1145/3027063.3053103
[26]
Md Naimul Hoque, Wenbin He, Arvind Kumar Shekar, Liang Gou, and Liu Ren. 2022. Visual Concept Programming: A Visual Analytics Approach to Injecting Human Intelligence At Scale. IEEE Transactions on Visualization and Computer Graphics (2022), 1–10. https://doi.org/10.1109/TVCG.2022.3209466 Conference Name: IEEE Transactions on Visualization and Computer Graphics.
[27]
Jinbin Huang, Aditi Mishra, Bum-Chul Kwon, and Chris Bryan. 2022. ConceptExplainer: Understanding the Mental Model of Deep Learning Algorithms via Interactive Concept-based Explanations. http://arxiv.org/abs/2204.01888 arXiv:2204.01888 [cs].
[28]
Shichao Jia, Peiwen Lin, Zeyu Li, Jiawan Zhang, and Shixia Liu. 2019. Visualizing surrogate decision trees of convolutional neural networks. Journal of Visualization (Nov. 2019). https://doi.org/10.1007/s12650-019-00607-z
[29]
Kaggle. 2014. Dogs vs. Cats. https://www.kaggle.com/c/dogs-vs-cats
[30]
Smiti Kaul, David Borland, Nan Cao, and David Gotz. 2021. Improving Visualization Interpretation Using Counterfactuals. IEEE Transactions on Visualization and Computer Graphics 28, 1(2021), 998–1008.
[31]
Been Kim, Cynthia Rudin, and Julie A Shah. 2014. The Bayesian Case Model: A Generative Approach for Case-Based Reasoning and Prototype Classification. In Advances in Neural Information Processing Systems 27, Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger (Eds.). Curran Associates, Inc., 1952–1960. http://papers.nips.cc/paper/5313-the-bayesian-case-model-a-generative-approach-for-case-based-reasoning-and-prototype-classification.pdf
[32]
Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, and Rory Sayres. 2018. Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV). arXiv:1711.11279 [stat] (June 2018). http://arxiv.org/abs/1711.11279 arXiv:1711.11279.
[33]
Josua Krause, Aritra Dasgupta, Jordan Swartz, Yindalon Aphinyanaphongs, and Enrico Bertini. 2017. A workflow for visual diagnostics of binary classifiers using instance-level explanations. In 2017 IEEE Conference on Visual Analytics Science and Technology (VAST). IEEE, 162–172.
[34]
Bum Chul Kwon, Jungsoo Lee, Chaeyeon Chung, Nyoungwoo Lee, Ho-Jin Choi, and Jaegul Choo. 2022. DASH: Visual Analytics for Debiasing Image Classification via User-Driven Synthetic Data Augmentation. In Eurographics Conference on Visualization.
[35]
Bum Chul Kwon, Jungsoo Lee, Chaeyeon Chung, Nyoungwoo Lee, Ho-Jin Choi, and Jaegul Choo. 2022. DASH: Visual Analytics for Debiasing Image Classification via User-Driven Synthetic Data Augmentation. https://doi.org/10.2312/evs.20221099
[36]
Himabindu Lakkaraju, Ece Kamar, Rich Caruana, and Eric Horvitz. 2017. Identifying unknown unknowns in the open world: Representations and policies for guided exploration. In Thirty-first aaai conference on artificial intelligence.
[37]
Cheng-Han Lee, Ziwei Liu, Lingyun Wu, and Ping Luo. 2020. MaskGAN: Towards Diverse and Interactive Facial Image Manipulation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[38]
Anthony Liu, Santiago Guerra, Isaac Fung, Gabriel Matute, Ece Kamar, and Walter Lasecki. 2020. Towards Hybrid Human-AI Workflows for Unknown Unknown Detection. In Proceedings of The Web Conference 2020. ACM, Taipei Taiwan, 2432–2442. https://doi.org/10.1145/3366423.3380306
[39]
Mengchen Liu, Jiaxin Shi, Kelei Cao, Jun Zhu, and Shixia Liu. 2017. Analyzing the training processes of deep generative models. IEEE transactions on visualization and computer graphics 24, 1(2017), 77–87.
[40]
Leland McInnes, John Healy, and James Melville. 2018. Umap: Uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:1802.03426(2018).
[41]
Yao Ming, Panpan Xu, Furui Cheng, Huamin Qu, and Liu Ren. 2019. ProtoSteer: Steering Deep Sequence Model with Prototypes. IEEE Transactions on Visualization and Computer Graphics (2019), 1–1. https://doi.org/10.1109/TVCG.2019.2934267
[42]
Matthew L Olson, Thuy-Vy Nguyen, Gaurav Dixit, Neale Ratzlaff, Weng-Keen Wong, and Minsuk Kahng. 2021. Contrastive identification of covariate shift in image data. In 2021 IEEE Visualization Conference (VIS). IEEE, 36–40.
[43]
Deokgun Park, Seungyeon Kim, Jurim Lee, Jaegul Choo, Nicholas Diakopoulos, and Niklas Elmqvist. 2018. ConceptVector: Text Visual Analytics via Interactive Lexicon Building Using Word Embedding. IEEE Transactions on Visualization and Computer Graphics 24, 1 (Jan. 2018), 361–370. https://doi.org/10.1109/TVCG.2017.2744478
[44]
Haekyu Park, Nilaksh Das, Rahul Duggal, Austin P. Wright, Omar Shaikh, Fred Hohman, and Duen Horng Chau. 2021. NeuroCartography: Scalable Automatic Visual Summarization of Concepts in Deep Neural Networks. arXiv:2108.12931 [cs] (Aug. 2021). http://arxiv.org/abs/2108.12931 arXiv:2108.12931.
[45]
Gregory Plumb, Nari Johnson, Ángel Alexander Cabrera, Marco Tulio Ribeiro, and Ameet Talwalkar. 2022. Evaluating Systemic Error Detection Methods using Synthetic Images. http://arxiv.org/abs/2207.04104 arXiv:2207.04104 [cs].
[46]
Nazneen Rajani, Weixin Liang, Lingjiao Chen, Meg Mitchell, and James Zou. 2022. SEAL : Interactive Tool for Systematic Error Analysis and Labeling. http://arxiv.org/abs/2210.05839 arXiv:2210.05839 [cs].
[47]
Archit Rathore, Nithin Chalapathi, Sourabh Palande, and Bei Wang. 2020. TopoAct: Visually Exploring the Shape of Activations in Deep Learning. arXiv:1912.06332 [cs] (July 2020). http://arxiv.org/abs/1912.06332 arXiv:1912.06332.
[48]
Donghao Ren, Saleema Amershi, Bongshin Lee, Jina Suh, and Jason D. Williams. 2017. Squares: Supporting Interactive Performance Analysis for Multiclass Classifiers. IEEE Transactions on Visualization and Computer Graphics 23, 1 (Jan. 2017), 61–70. https://doi.org/10.1109/TVCG.2016.2598828
[49]
Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. "Why Should I Trust You?": Explaining the Predictions of Any Classifier. arXiv:1602.04938 [cs, stat] (Feb. 2016). http://arxiv.org/abs/1602.04938 arXiv:1602.04938.
[50]
Hong Shen, Haojian Jin, Ángel Alexander Cabrera, Adam Perer, Haiyi Zhu, and Jason I Hong. 2020. Designing alternative representations of confusion matrices to support non-expert public understanding of algorithm performance. Proceedings of the ACM on Human-Computer Interaction 4, CSCW2(2020), 1–22.
[51]
Jina Suh, Soroush Ghorashi, Gonzalo Ramos, Nan-Chen Chen, Steven Drucker, Johan Verwey, and Patrice Simard. 2019. AnchorViz: Facilitating Semantic Data Exploration and Concept Discovery for Interactive Machine Learning. ACM Transactions on Interactive Intelligent Systems 10, 1 (Aug. 2019), 1–38. https://doi.org/10.1145/3241379
[52]
Adam Sutton, Thomas Lansdall-Welfare, and Nello Cristianini. 2018. Biased embeddings from wild data: Measuring, understanding and removing. In International Symposium on Intelligent Data Analysis. Springer, 328–339.
[53]
Sarah Tan, Julius Adebayo, Kori Inkpen, and Ece Kamar. 2018. Investigating Human + Machine Complementarity for Recidivism Predictions. arXiv:1808.09123 [cs, stat] (Dec. 2018). http://arxiv.org/abs/1808.09123 arXiv:1808.09123.
[54]
Colin Vandenhof and Edith Law. 2019. Contradict the Machine: A Hybrid Approach to Identifying Unknown Unknowns. (2019), 3.
[55]
Junpeng Wang, Liang Gou, Wei Zhang, Hao Yang, and Han-Wei Shen. 2019. DeepVID: Deep Visual Interpretation and Diagnosis for Image Classifiers via Knowledge Distillation. IEEE Transactions on Visualization and Computer Graphics 25, 6 (June 2019), 2168–2180. https://doi.org/10.1109/TVCG.2019.2903943 Conference Name: IEEE Transactions on Visualization and Computer Graphics.
[56]
X. Wang, W. Chen, J. Xia, Z. Chen, D. Xu, X. Wu, M. Xu, and T. Schreck. 2020. ConceptExplorer: Visual Analysis of Concept Drifts in Multi-source Time-series Data. In 2020 IEEE Conference on Visual Analytics Science and Technology (VAST). 1–11. https://doi.org/10.1109/VAST50239.2020.00006
[57]
James Wexler, Mahima Pushkarna, Tolga Bolukbasi, Martin Wattenberg, Fernanda Viegas, and Jimbo Wilson. 2019. The What-If Tool: Interactive Probing of Machine Learning Models. IEEE Transactions on Visualization and Computer Graphics (2019), 1–1. https://doi.org/10.1109/TVCG.2019.2934619
[58]
Weikai Yang, Zhen Li, Mengchen Liu, Yafeng Lu, Kelei Cao, Ross Maciejewski, and Shixia Liu. 2020. Diagnosing Concept Drift with Visual Analytics. arXiv:2007.14372 [cs, stat] (Sept. 2020). http://arxiv.org/abs/2007.14372 arXiv:2007.14372.
[59]
Jiawei Zhang, Yang Wang, Piero Molino, Lezhi Li, and David S. Ebert. 2019. Manifold: A Model-Agnostic Framework for Interpretation and Diagnosis of Machine Learning Models. IEEE Transactions on Visualization and Computer Graphics 25, 1 (Jan. 2019), 364–373. https://doi.org/10.1109/TVCG.2018.2864499 arXiv:1808.00196.
[60]
Jieyu Zhao, Yichao Zhou, Zeyu Li, Wei Wang, and Kai-Wei Chang. 2018. Learning gender-neutral word embeddings. arXiv preprint arXiv:1809.01496(2018).
[61]
Zhenge Zhao, Panpan Xu, Carlos Scheidegger, and Liu Ren. 2021. Human-in-the-loop extraction of interpretable concepts in deep learning models. IEEE Transactions on Visualization and Computer Graphics 28, 1(2021), 780–790.

Cited By

View all
  • (2024)Understanding Human-AI Workflows for Generating PersonasProceedings of the 2024 ACM Designing Interactive Systems Conference10.1145/3643834.3660729(757-781)Online publication date: 1-Jul-2024
  • (2023)Effective human-AI teams via learned natural language rules and onboardingProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3667450(30466-30498)Online publication date: 10-Dec-2023

Index Terms

  1. ESCAPE: Countering Systematic Errors from Machine’s Blind Spots via Interactive Visual Analysis

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    CHI '23: Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems
    April 2023
    14911 pages
    ISBN:9781450394215
    DOI:10.1145/3544548
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 19 April 2023

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. blind spot
    2. concept interpretability
    3. human-AI interaction
    4. systematic error
    5. unknown-unknowns
    6. visual analytics
    7. visualization

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    CHI '23
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 6,199 of 26,314 submissions, 24%

    Upcoming Conference

    CHI '25
    CHI Conference on Human Factors in Computing Systems
    April 26 - May 1, 2025
    Yokohama , Japan

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)234
    • Downloads (Last 6 weeks)13
    Reflects downloads up to 10 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Understanding Human-AI Workflows for Generating PersonasProceedings of the 2024 ACM Designing Interactive Systems Conference10.1145/3643834.3660729(757-781)Online publication date: 1-Jul-2024
    • (2023)Effective human-AI teams via learned natural language rules and onboardingProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3667450(30466-30498)Online publication date: 10-Dec-2023

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    Full Text

    HTML Format

    View this article in HTML Format.

    HTML Format

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media