Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3447535.3462487acmconferencesArticle/Chapter ViewAbstractPublication PageswebsciConference Proceedingsconference-collections
research-article
Open access

You’d Better Stop! Understanding Human Reliance on Machine Learning Models under Covariate Shift

Published: 22 June 2021 Publication History

Abstract

Decision-making aids powered by machine learning models become increasingly prevalent on the web today. However, when applied to a new distribution of data that is different from the training data (i.e., when covariate shift occurs), machine learning models often suffer from performance degradation and may provide misleading recommendations to human decision-makers. In this paper, we conduct a randomized experiment to investigate how people rely on machine learning models to make decisions under covariate shift. Surprisingly, we find that people rely on machine learning models more when making decisions on out-of-distribution data than in-distribution data. Moreover, while increasing people’s awareness of the machine learning model’s possible performance disparity on different data helps decrease people’s over-reliance on the model under covariate shift, enabling people to visualize the data distributions and the model’s performance does not seem to help. We conclude by discussing the implication of our results.

Supplementary Material

MP4 File (PS3.3_Chun-WeiChiang_YoudBetterStop-UnderstandingHumanReliance_onMachineLearningModels_underCovariateShift.mp4)
You?d Better Stop! Understanding Human Reliance on Machine Learning Models under Covariate Shift

References

[1]
Saleema Amershi, Dan Weld, Mihaela Vorvoreanu, Adam Fourney, Besmira Nushi, Penny Collisson, Jina Suh, Shamsi Iqbal, Paul N Bennett, Kori Inkpen, 2019. Guidelines for human-AI interaction. In Proceedings of the 2019 chi conference on human factors in computing systems. 1–13.
[2]
Gagan Bansal, Besmira Nushi, Ece Kamar, Walter S Lasecki, Daniel S Weld, and Eric Horvitz. 2019. Beyond Accuracy: The Role of Mental Models in Human-AI Team Performance. In Proceedings of the AAAI Conference on Human Computation and Crowdsourcing, Vol. 7. 2–11.
[3]
Katy Börner, Andreas Bueckle, and Michael Ginda. 2019. Data visualization literacy: Definitions, conceptual frameworks, exercises, and assessments. Proceedings of the National Academy of Sciences 116, 6 (2019), 1857–1864.
[4]
Joy Buolamwini and Timnit Gebru. 2018. Gender shades: Intersectional accuracy disparities in commercial gender classification. In Conference on fairness, accountability and transparency. 77–91.
[5]
Hao-Fei Cheng, Ruotong Wang, Zheng Zhang, Fiona O’Connell, Terrance Gray, F Maxwell Harper, and Haiyi Zhu. 2019. Explaining decision-making algorithms through UI: Strategies to help non-expert stakeholders. In Proceedings of the 2019 chi conference on human factors in computing systems. 1–12.
[6]
Maria De-Arteaga, Riccardo Fogliato, and Alexandra Chouldechova. 2020. A Case for Humans-in-the-Loop: Decisions in the Presence of Erroneous Algorithmic Scores. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems. 1–12.
[7]
Dean De Cock. 2011. Ames, Iowa: Alternative to the Boston housing data as an end of semester regression project. Journal of Statistics Education 19, 3 (2011).
[8]
Berkeley J Dietvorst, Joseph P Simmons, and Cade Massey. 2015. Algorithm aversion: People erroneously avoid algorithms after seeing them err.Journal of Experimental Psychology: General 144, 1 (2015), 114.
[9]
Alexander Erlei, Franck Nekdem, Lukas Meub, Avishek Anand, and Ujwal Gadiraju. 2020. Impact of Algorithmic Decision Making on Human Behavior: Evidence from Ultimatum Bargaining. In Proceedings of the AAAI Conference on Human Computation and Crowdsourcing, Vol. 8. 43–52.
[10]
Yannick Forster, Sebastian Hergeth, Frederik Naujoks, Josef Krems, and Andreas Keinath. 2019. User Education in Automated Driving: Owner’s Manual and Interactive Tutorial Support Mental Model Formation and Human-Automation Interaction. Information 10, 4 (2019), 143.
[11]
Fred Hohman, Andrew Head, Rich Caruana, Robert DeLine, and Steven M Drucker. 2019. Gamut: A design probe to understand how data scientists understand machine learning models. In Proceedings of the 2019 CHI conference on human factors in computing systems. 1–13.
[12]
Spencer C Kohn, Daniel Quinn, Richard Pak, Ewart J de Visser, and Tyler H Shaw. 2018. Trust repair strategies with self-driving vehicles: An exploratory study. In Proceedings of the human factors and ergonomics society annual meeting, Vol. 62. SAGE Publications Sage CA: Los Angeles, CA, 1108–1112.
[13]
Jennifer M Logg, Julia A Minson, and Don A Moore. 2019. Algorithm appreciation: People prefer algorithmic to human judgment. Organizational Behavior and Human Decision Processes 151 (2019), 90–103.
[14]
Duri Long and Brian Magerko. 2020. What is AI Literacy? Competencies and Design Considerations. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems. 1–16.
[15]
Zhuoran Lu and Ming Yin. 2021. Human Reliance on Machine Learning Models When Performance Feedback is Limited: Heuristics and Risks. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems.
[16]
Adam V Maltese, Joseph A Harsh, and Dubravka Svetina. 2015. Data visualization literacy: Investigating data interpretation along the novice—expert continuum. Journal of College Science Teaching 45, 1 (2015), 84–90.
[17]
Margaret Mitchell, Simone Wu, Andrew Zaldivar, Parker Barnes, Lucy Vasserman, Ben Hutchinson, Elena Spitzer, Inioluwa Deborah Raji, and Timnit Gebru. 2019. Model cards for model reporting. In Proceedings of the conference on fairness, accountability, and transparency. 220–229.
[18]
Jose G Moreno-Torres, Troy Raeder, RocíO Alaiz-RodríGuez, Nitesh V Chawla, and Francisco Herrera. 2012. A unifying view on dataset shift in classification. Pattern recognition 45, 1 (2012), 521–530.
[19]
Yaniv Ovadia, Emily Fertig, Jie Ren, Zachary Nado, David Sculley, Sebastian Nowozin, Joshua Dillon, Balaji Lakshminarayanan, and Jasper Snoek. 2019. Can you trust your model’s uncertainty? Evaluating predictive uncertainty under dataset shift. In Advances in Neural Information Processing Systems. 13991–14002.
[20]
Forough Poursabzi-Sangdeh, Daniel G Goldstein, Jake M Hofman, Jennifer Wortman Vaughan, and Hanna Wallach. 2018. Manipulating and measuring model interpretability. arXiv preprint arXiv:1802.07810(2018).
[21]
Joaquin Quionero-Candela, Masashi Sugiyama, Anton Schwaighofer, and Neil D Lawrence. 2009. Dataset shift in machine learning. The MIT Press.
[22]
Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. ” Why should I trust you?” Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. 1135–1144.
[23]
Susumu Saito, Chun-Wei Chiang, Saiph Savage, Teppei Nakano, Tetsunori Kobayashi, and Jeffrey Bigham. 2019. Predicting the Working Time of Microtasks Based on Workers’ Perception of Prediction Errors. Human Computation 6, 1 (2019), 192–219.
[24]
Hidetoshi Shimodaira. 2000. Improving predictive inference under covariate shift by weighting the log-likelihood function. Journal of statistical planning and inference 90, 2 (2000), 227–244.
[25]
Masashi Sugiyama and Motoaki Kawanabe. 2012. Machine learning in non-stationary environments: Introduction to covariate shift adaptation. MIT press.
[26]
Suzanne Tolmeijer, Ujwal Gadiraju, Ramya Ghantasala, Akshit Gupta, and Abraham Bernstein. 2021. Second Chance for a First Impression? Trust Development in Intelligent System Interaction. In Proceedings of the 29th ACM Conference on User Modeling, Adaptation and Personalization (UMAP 2021).
[27]
Anne Marthe van der Bles, Sander van der Linden, Alexandra LJ Freeman, James Mitchell, Ana B Galvao, Lisa Zaval, and David J Spiegelhalter. 2019. Communicating uncertainty about facts, numbers and science. Royal Society open science 6, 5 (2019), 181870.
[28]
Mei Wang, Weihong Deng, Jiani Hu, Xunqiang Tao, and Yaohai Huang. 2019. Racial faces in the wild: Reducing racial bias by information maximization adaptation network. In Proceedings of the IEEE International Conference on Computer Vision. 692–702.
[29]
Xinru Wang and Ming Yin. 2021. Are Explanations Helpful? A Comparative Study of the Effects of Explanations in AI-Assisted Decision-Making. In 26th International Conference on Intelligent User Interfaces. 318–328.
[30]
Michael Yeomans, Anuj Shah, Sendhil Mullainathan, and Jon Kleinberg. 2019. Making sense of recommendations. Journal of Behavioral Decision Making 32, 4 (2019), 403–414.
[31]
Ming Yin, Jennifer Wortman Vaughan, and Hanna Wallach. 2019. Understanding the effect of accuracy on trust in machine learning models. In Proceedings of the 2019 chi conference on human factors in computing systems. 1–12.
[32]
Yunfeng Zhang, Q Vera Liao, and Rachel KE Bellamy. 2020. Effect of confidence and explanation on accuracy and trust calibration in AI-assisted decision making. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency. 295–305.
[33]
Zijian Zhang, Jaspreet Singh, Ujwal Gadiraju, and Avishek Anand. 2019. Dissonance between human and machine understanding. Proceedings of the ACM on Human-Computer Interaction 3, CSCW(2019), 1–23.

Cited By

View all
  • (2024)You Can Only Verify When You Know the Answer: Feature-Based Explanations Reduce Overreliance on AI for Easy Decisions, but Not for Hard OnesProceedings of Mensch und Computer 202410.1145/3670653.3670660(156-170)Online publication date: 1-Sep-2024
  • (2024)Does More Advice Help? The Effects of Second Opinions in AI-Assisted Decision MakingProceedings of the ACM on Human-Computer Interaction10.1145/36537088:CSCW1(1-31)Online publication date: 26-Apr-2024
  • (2024)To Err Is AI! Debugging as an Intervention to Facilitate Appropriate Reliance on AI SystemsProceedings of the 35th ACM Conference on Hypertext and Social Media10.1145/3648188.3675130(98-105)Online publication date: 10-Sep-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
WebSci '21: Proceedings of the 13th ACM Web Science Conference 2021
June 2021
328 pages
ISBN:9781450383301
DOI:10.1145/3447535
This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 June 2021

Check for updates

Author Tags

  1. Machine Learning
  2. appropriate reliance
  3. covariate shift
  4. human-AI interaction

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

Conference

WebSci '21
Sponsor:
WebSci '21: WebSci '21 13th ACM Web Science Conference 2021
June 21 - 25, 2021
Virtual Event, United Kingdom

Acceptance Rates

Overall Acceptance Rate 245 of 933 submissions, 26%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)406
  • Downloads (Last 6 weeks)69
Reflects downloads up to 13 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)You Can Only Verify When You Know the Answer: Feature-Based Explanations Reduce Overreliance on AI for Easy Decisions, but Not for Hard OnesProceedings of Mensch und Computer 202410.1145/3670653.3670660(156-170)Online publication date: 1-Sep-2024
  • (2024)Does More Advice Help? The Effects of Second Opinions in AI-Assisted Decision MakingProceedings of the ACM on Human-Computer Interaction10.1145/36537088:CSCW1(1-31)Online publication date: 26-Apr-2024
  • (2024)To Err Is AI! Debugging as an Intervention to Facilitate Appropriate Reliance on AI SystemsProceedings of the 35th ACM Conference on Hypertext and Social Media10.1145/3648188.3675130(98-105)Online publication date: 10-Sep-2024
  • (2024)Enhancing AI-Assisted Group Decision Making through LLM-Powered Devil's AdvocateProceedings of the 29th International Conference on Intelligent User Interfaces10.1145/3640543.3645199(103-119)Online publication date: 18-Mar-2024
  • (2024)"This is not a data problem": Algorithms and Power in Public Higher Education in CanadaProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642451(1-14)Online publication date: 11-May-2024
  • (2024)AI Pilot in the Cockpit: An Investigation of Public AcceptanceInternational Journal of Human–Computer Interaction10.1080/10447318.2024.2301856(1-14)Online publication date: 12-Jan-2024
  • (2023)How Time Pressure in Different Phases of Decision-Making Influences Human-AI CollaborationProceedings of the ACM on Human-Computer Interaction10.1145/36100687:CSCW2(1-26)Online publication date: 4-Oct-2023
  • (2023)How Stated Accuracy of an AI System and Analogies to Explain Accuracy Affect Human Reliance on the SystemProceedings of the ACM on Human-Computer Interaction10.1145/36100677:CSCW2(1-29)Online publication date: 4-Oct-2023
  • (2023)Appropriate Reliance on AI Advice: Conceptualization and the Effect of ExplanationsProceedings of the 28th International Conference on Intelligent User Interfaces10.1145/3581641.3584066(410-422)Online publication date: 27-Mar-2023
  • (2023)Toward Supporting Perceptual Complementarity in Human-AI Collaboration via Reflection on UnobservablesProceedings of the ACM on Human-Computer Interaction10.1145/35796287:CSCW1(1-20)Online publication date: 16-Apr-2023
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media