Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3582269.3615595acmconferencesArticle/Chapter ViewAbstractPublication PagesciConference Proceedingsconference-collections
research-article
Open access

Training Towards Critical Use: Learning to Situate AI Predictions Relative to Human Knowledge

Published: 05 November 2023 Publication History

Editorial Notes

A corrigendum was issued for this paper on February 13, 2024. You can download the corrigendum from the Supplemental Material section of this citation page.

Abstract

A growing body of research has explored how to support humans in making better use of AI-based decision support, including via training and onboarding. Existing research has focused on decision-making tasks where it is possible to evaluate “appropriate reliance” by comparing each decision against a ground truth label that cleanly maps to both the AI’s predictive target and the human decision-maker’s goals. However, this assumption does not hold in many real-world settings where AI tools are deployed today (e.g., social work, criminal justice, and healthcare). In this paper, we introduce a process-oriented notion of appropriate reliance called critical use that centers the human’s ability to situate AI predictions against knowledge that is uniquely available to them but unavailable to the AI model. To explore how training can support critical use, we conduct a randomized online experiment in a complex social decision-making setting: child maltreatment screening. We find that, by providing participants with accelerated, low-stakes opportunities to practice AI-assisted decision-making in this setting, novices came to exhibit patterns of disagreement with AI that resemble those of experienced workers. A qualitative examination of participants’ explanations for their AI-assisted decisions revealed that they drew upon qualitative case narratives, to which the AI model did not have access, to learn when (not) to rely on AI predictions. Our findings open new questions for the study and design of training for real-world AI-assisted decision-making.

Supplemental Material

PDF File
Corrigendum to "Training Towards Critical Use: Learning to Situate AI Predictions Relative to Human Knowledge" by Kawakami et al., Proceedings of The ACM Collective Intelligence Conference (CI '23).

References

[1]
Gagan Bansal, Besmira Nushi, Ece Kamar, Walter S Lasecki, Daniel S Weld, and Eric Horvitz. 2019. Beyond accuracy: The role of mental models in human-AI team performance. In Proceedings of the AAAI Conference on Human Computation and Crowdsourcing, Vol. 7. 2–11.
[2]
Gagan Bansal, Tongshuang Wu, Joyce Zhou, Raymond Fok, Besmira Nushi, Ece Kamar, Marco Tulio Ribeiro, and Daniel Weld. 2021. Does the whole exceed its parts? the effect of AI explanations on complementary team performance. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. 1–16.
[3]
Solon Barocas, Moritz Hardt, and Arvind Narayanan. 2019. Fairness and Machine Learning. fairmlbook.org. http://www.fairmlbook.org.
[4]
Zana Buçinca, Phoebe Lin, Krzysztof Z Gajos, and Elena L Glassman. 2020. Proxy tasks and subjective measures can be misleading in evaluating explainable AI systems. In Proceedings of the 25th international conference on intelligent user interfaces. 454–464.
[5]
Jason W Burton, Mari-Klara Stein, and Tina Blegind Jensen. 2020. A systematic review of algorithm aversion in augmented decision making. Journal of Behavioral Decision Making 33, 2 (2020), 220–239.
[6]
Carrie J Cai, Samantha Winter, David Steiner, Lauren Wilcox, and Michael Terry. 2019. " Hello AI": Uncovering the Onboarding Needs of Medical Practitioners for Human-AI Collaborative Decision-Making. Proceedings of the ACM on Human-computer Interaction 3, CSCW (2019), 1–24.
[7]
Carrie J Cai, Samantha Winter, David Steiner, Lauren Wilcox, and Michael Terry. 2021. Onboarding Materials as Cross-functional Boundary Objects for Developing AI Assistants. In Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems. 1–7.
[8]
Hao-Fei Cheng, Logan Stapleton, Anna Kawakami, Venkatesh Sivaraman, Yanghuidi Cheng, Diana Qing, Adam Perer, Kenneth Holstein, Zhiwei Steven Wu, and Haiyi Zhu. 2022. How Child Welfare Workers Reduce Racial Disparities in Algorithmic Decisions. In CHI Conference on Human Factors in Computing Systems. 1–22.
[9]
Hao-Fei Cheng, Logan Stapleton, Ruiqi Wang, Paige Bullock, Alexandra Chouldechova, Zhiwei Steven Steven Wu, and Haiyi Zhu. 2021. Soliciting Stakeholders’ Fairness Notions in Child Maltreatment Predictive Systems. Conference on Human Factors in Computing Systems - Proceedings (2021), 1–17. https://doi.org/10.1145/3411764.3445308
[10]
Lingwei Cheng and Alexandra Chouldechova. 2022. Heterogeneity in Algorithm-Assisted Decision-Making: A Case Study in Child Abuse Hotline Screening. arXiv preprint arXiv:2204.05478 (2022).
[11]
Alexandra Chouldechova, Diana Benavides-Prado, Oleksandr Fialko, and Rhema Vaithianathan. 2018. A case study of algorithm-assisted decision making in child maltreatment hotline screening decisions. In Conference on Fairness, Accountability and Transparency. PMLR, 134–148.
[12]
Alexandra Chouldechova, Emily Putnam-Hornstein, Suzanne Dworak-Peck, Diana Benavides-Prado, Oleksandr Fialko, Rhema Vaithianathan, Sorelle A Friedler, and Christo Wilson. 2018. A case study of algorithm-assisted decision making in child maltreatment hotline screening decisions. Proceedings of Machine Learning Research 81 (2018), 1–15. http://proceedings.mlr.press/v81/chouldechova18a.html
[13]
Amanda Coston, Anna Kawakami, Haiyi Zhu, Ken Holstein, and Hoda Heidari. 2022. A Validity Perspective on Evaluating the Justified Use of Data-driven Decision-making Algorithms. arXiv preprint arXiv:2206.14983 (2022).
[14]
Maria De-Arteaga, Riccardo Fogliato, and Alexandra Chouldechova. 2020. A case for humans-in-the-loop: Decisions in the presence of erroneous algorithmic scores. arXiv (2020), 1–12.
[15]
Berkeley J Dietvorst, Joseph P Simmons, and Cade Massey. 2015. Algorithm aversion: People erroneously avoid algorithms after seeing them err.Journal of Experimental Psychology: General 144, 1 (2015), 114.
[16]
Kate Donahue, Alexandra Chouldechova, and Krishnaram Kenthapadi. 2022. Human-Algorithm Collaboration: Achieving Complementarity and Avoiding Unfairness. arXiv preprint arXiv:2202.08821 (2022).
[17]
Gilan M El Saadawi, Roger Azevedo, Melissa Castine, Velma Payne, Olga Medvedeva, Eugene Tseytlin, Elizabeth Legowski, Drazen Jukic, and Rebecca S Crowley. 2010. Factors affecting feeling-of-knowing in a medical intelligent tutoring system: the role of immediate feedback as a metacognitive scaffold. Advances in Health Sciences Education 15, 1 (2010), 9–30.
[18]
Virginia Eubanks. 2018. Automating inequality: How high-tech tools profile, police, and punish the poor. St. Martin’s Press.
[19]
Krzysztof Z Gajos and Lena Mamykina. 2022. Do People Engage Cognitively with AI? Impact of AI Assistance on Incidental Learning. In 27th International Conference on Intelligent User Interfaces. 794–806.
[20]
Ben Green and Yiling Chen. 2019. The principles and limits of algorithm-in-the-loop decision making. Proceedings of the ACM on Human-Computer Interaction 3, CSCW (2019). https://doi.org/10.1145/3359152
[21]
Luke Guerdan, Amanda Coston, Zhiwei Steven Wu, and Kenneth Holstein. 2023. Ground (less) Truth: A Causal Framework for Proxy Labels in Human-Algorithm Decision-Making. arXiv preprint arXiv:2302.06503 (2023).
[22]
Anne Helsdingen, Tamara Van Gog, and Jeroen Van Merriënboer. 2011. The effects of practice schedule and critical thinking prompts on learning and transfer of a complex judgment task.Journal of Educational Psychology 103, 2 (2011), 383.
[23]
Patrick Hemmer, Max Schemmer, Niklas Kühl, Michael Vössing, and Gerhard Satzger. 2022. On the Effect of Information Asymmetry in Human-AI Teams. arXiv preprint arXiv:2205.01467 (2022).
[24]
Kenneth Holstein and Vincent Aleven. 2022. Designing for human–AI complementarity in K-12 education. AI Magazine 43, 2 (2022), 239–248.
[25]
Kenneth Holstein, Vincent Aleven, and Nikol Rummel. 2020. A conceptual framework for human–AI hybrid adaptivity in education. In International Conference on Artificial Intelligence in Education. Springer, 240–254.
[26]
Kenneth Holstein, Maria De-Arteaga, Lakshmi Tumati, and Yanghuidi Cheng. 2023. Toward supporting perceptual complementarity in human-AI collaboration via reflection on unobservables. Proceedings of the ACM on Human-Computer Interaction 7, CSCW1 (2023), 1–20.
[27]
Kenneth Holstein, Bruce M McLaren, and Vincent Aleven. 2018. Student learning benefits of a mixed-reality teacher awareness tool in AI-enhanced classrooms. In International conference on artificial intelligence in education. Springer, 154–168.
[28]
Abigail Z Jacobs and Hanna Wallach. 2021. Measurement and fairness. In Proceedings of the 2021 ACM conference on fairness, accountability, and transparency. 375–385.
[29]
Sarah J Kaka, Joshua Littenberg-Tobias, Taylor Kessner, Anthony Tuf Francis, Katrina Kennett, G Marvez, and Justin Reich. 2021. Digital simulations as approximations of practice: Preparing preservice teachers to facilitate whole-class discussions of controversial issues. Journal of Technology and Teacher Education 29, 1 (2021), 67–90.
[30]
Anna Kawakami, Venkatesh Sivaraman, Hao-Fei Cheng, Logan Stapleton, Yanghuidi Cheng, Diana Qing, Adam Perer, Zhiwei Steven Wu, Haiyi Zhu, and Kenneth Holstein. 2022. Improving Human-AI Partnerships in Child Welfare: Understanding Worker Practices, Challenges, and Desires for Algorithmic Decision Support. In CHI Conference on Human Factors in Computing Systems. 1–18.
[31]
Anna Kawakami, Venkatesh Sivaraman, Logan Stapleton, Hao-Fei Cheng, Adam Perer, Zhiwei Steven Wu, Haiyi Zhu, and Kenneth Holstein. 2022. “Why Do I Care What’s Similar?” Probing Challenges in AI-Assisted Child Welfare Decision-Making through Worker-AI Interface Design Concepts. In Designing Interactive Systems Conference. 454–470.
[32]
Jon Kleinberg, Himabindu Lakkaraju, Jure Leskovec, Jens Ludwig, and Sendhil Mullainathan. 2018. Human decisions and machine predictions. The quarterly journal of economics 133, 1 (2018), 237–293.
[33]
Kenneth R Koedinger and Vincent Aleven. 2007. Exploring the assistance dilemma in experiments with cognitive tutors. Educational Psychology Review 19, 3 (2007), 239–264.
[34]
Kenneth R Koedinger, Albert T Corbett, and Charles Perfetti. 2012. The Knowledge-Learning-Instruction framework: Bridging the science-practice chasm to enhance robust student learning. Cognitive science 36, 5 (2012), 757–798.
[35]
Kwadwo Kyeremanteng and Gianni D’Egidio. 2015. Why process quality measures may be more valuable than outcome measures in critical care patients. Biology and Medicine 7, 2 (2015), 1.
[36]
Vivian Lai, Chacha Chen, Q Vera Liao, Alison Smith-Renner, and Chenhao Tan. 2021. Towards a Science of Human-AI Decision Making: A Survey of Empirical Studies. arXiv preprint arXiv:2112.11471 (2021).
[37]
Vivian Lai, Han Liu, and Chenhao Tan. 2020. " Why is ’Chicago’ deceptive?" Towards Building Model-Driven Tutorials for Humans. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems. 1–13.
[38]
John D Lee and Katrina A See. 2004. Trust in automation: Designing for appropriate reliance. Human factors 46, 1 (2004), 50–80.
[39]
Karen Levy, Kyla E Chasalow, and Sarah Riley. 2021. Algorithms and Decision-Making in the Public Sector. Annual Review of Law and Social Science 17 (2021), 1–38.
[40]
Zachary C Lipton. 2018. The mythos of model interpretability: In machine learning, the concept of interpretability is both important and slippery.Queue 16, 3 (2018), 31–57.
[41]
Zhuoran Lu and Ming Yin. 2021. Human reliance on machine learning models when performance feedback is limited: Heuristics and risks. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. 1–16.
[42]
William E McAuliffe. 1979. Measuring the quality of medical care: process versus outcome. The Milbank Memorial Fund Quarterly. Health and Society (1979), 118–152.
[43]
Hussein Mozannar, Arvind Satyanarayan, and David Sontag. 2021. Teaching Humans When To Defer to a Classifier via Examplars. arXiv preprint arXiv:2111.11297 (2021).
[44]
Sendhil Mullainathan and Ziad Obermeyer. 2019. A machine learning approach to low-value health care: wasted tests, missed heart attacks and mis-predictions. National Bureau of Economic Research.
[45]
Sendhil Mullainathan and Ziad Obermeyer. 2021. On the inequity of predicting a while hoping for B. In AEA Papers and Proceedings, Vol. 111. 37–42.
[46]
Ziad Obermeyer, Brian Powers, Christine Vogeli, and Sendhil Mullainathan. 2019. Dissecting racial bias in an algorithm used to manage the health of populations. Science 366, 6464 (2019), 447–453.
[47]
Samir Passi and Mihaela Vorvoreanu. 2022. Overreliance on AI: Literature review. (2022).
[48]
Forough Poursabzi-Sangdeh, Daniel G. Goldstein, Jake M. Hofman, Jennifer Wortman Vaughan, and Hanna Wallach. 2021. Manipulating and measuring model interpretability. (2021). arxiv:1802.07810
[49]
Charvi Rastogi, Liu Leqi, Kenneth Holstein, and Hoda Heidari. 2022. A unifying framework for combining complementary strengths of humans and ML toward better predictive decision-making. arXiv preprint arXiv:2204.10806 (2022).
[50]
Anjana Samant, Aaron Horowitz, Kath Xu, and Sophie Beiers. 2021. Family surveillance by algorithm: The rapidly spreading tools few have heard of. American Civil Liberties Union (ACLU) (2021). https://www.aclu.org/sites/default/files/field_document/2021.09.28a_family_surveillance_by_algorithm.pdf
[51]
Devansh Saxena, Karla Badillo-Urquiola, Pamela Wisniewski, and Shion Guha. 2021. A framework of high-stakes algorithmic decision-making for the public sector developed through a case study of child welfare. arXiv 5, October (2021). arxiv:arXiv:2107.03487v2
[52]
Devansh Saxena, Karla Badillo-Urquiola, Pamela J Wisniewski, and Shion Guha. 2020. A human-centered review of algorithms used within the US child welfare system. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems. 1–15.
[53]
Devansh Saxena, Seh Young Moon, Dahlia Shehata, and Shion Guha. 2022. Unpacking Invisible Work Practices, Constraints, and Latent Power Relationships in Child Welfare through Casenote Analysis. In CHI Conference on Human Factors in Computing Systems. 1–22.
[54]
Treasury Board Secretariat. 2020. Directive on automated decision-making. Ottawa (ON): Government of Canada (modified 2019-02-05 (2020).
[55]
Eliane Stampfer, Yanjin Long, Vincent Aleven, and Kenneth R Koedinger. 2011. Eliciting intelligent novice behaviors with grounded feedback in a fraction addition tutor. In Artificial Intelligence in Education: 15th International Conference, AIED 2011, Auckland, New Zealand, June 28–July 2011 15. Springer, 560–562.
[56]
Rhema Vaithianathan, Emily Putnam-Hornstein, Nan Jiang, Parma Nand, and Tim Maloney. 2017. Developing predictive models to support child maltreatment hotline screening decisions: Allegheny County methodology and implementation. Center for Social data Analytics (2017).
[57]
Angelina Wang, Sayash Kapoor, Solon Barocas, and Arvind Narayanan. 2022. Against predictive optimization: On the legitimacy of decision-making algorithms that optimize predictive accuracy. Available at SSRN (2022).
[58]
Eliane S Wiese and Kenneth R Koedinger. 2017. Designing grounded feedback: Criteria for using linked representations to support learning of abstract symbols. International Journal of Artificial Intelligence in Education 27 (2017), 448–474.
[59]
Bryan Wilder, Eric Horvitz, and Ece Kamar. 2020. Learning to complement humans. arXiv preprint arXiv:2005.00582 (2020).
[60]
Kun-Hsing Yu, Andrew L Beam, and Isaac S Kohane. 2018. Artificial intelligence in healthcare. Nature biomedical engineering 2, 10 (2018), 719–731.
[61]
Qiaoning Zhang, Matthew L Lee, and Scott Carter. 2022. You Complete Me: Human-AI Teams and Complementary Expertise. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems. 1–28.

Cited By

View all
  • (2024)Are We Asking the Right Questions?: Designing for Community Stakeholders’ Interactions with AI in PolicingProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642738(1-20)Online publication date: 11-May-2024
  • (2024)DISCERN: Designing Decision Support Interfaces to Investigate the Complexities of Workplace Social Decision-Making With Line ManagersProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642685(1-18)Online publication date: 11-May-2024
  • (2023)Effective human-AI teams via learned natural language rules and onboardingProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3667450(30466-30498)Online publication date: 10-Dec-2023

Index Terms

  1. Training Towards Critical Use: Learning to Situate AI Predictions Relative to Human Knowledge

        Recommendations

        Comments

        Please enable JavaScript to view thecomments powered by Disqus.

        Information & Contributors

        Information

        Published In

        cover image ACM Conferences
        CI '23: Proceedings of The ACM Collective Intelligence Conference
        November 2023
        97 pages
        ISBN:9798400701139
        DOI:10.1145/3582269
        This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike International 4.0 License.

        Sponsors

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 05 November 2023

        Check for updates

        Author Tags

        1. AI onboarding and training
        2. algorithm-assisted decision-making
        3. augmented intelligence
        4. human-AI complementarity

        Qualifiers

        • Research-article
        • Research
        • Refereed limited

        Funding Sources

        • National Science Foundation
        • UL Research Institutes through the Center for Advancing Safety of Machine Intelligence (CASMI) at Northwestern University
        • Carnegie Mellon University Block Center for Technology and Society
        • Toyota Research Institute
        • National Science Foundation Graduate Research Fellowship

        Conference

        CI '23
        Sponsor:
        CI '23: Collective Intelligence Conference
        November 6 - 9, 2023
        Delft, Netherlands

        Upcoming Conference

        CI '25
        Collective Intelligence Conference
        August 4 - 6, 2025
        La Jolla , CA , USA

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)677
        • Downloads (Last 6 weeks)74
        Reflects downloads up to 18 Nov 2024

        Other Metrics

        Citations

        Cited By

        View all
        • (2024)Are We Asking the Right Questions?: Designing for Community Stakeholders’ Interactions with AI in PolicingProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642738(1-20)Online publication date: 11-May-2024
        • (2024)DISCERN: Designing Decision Support Interfaces to Investigate the Complexities of Workplace Social Decision-Making With Line ManagersProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642685(1-18)Online publication date: 11-May-2024
        • (2023)Effective human-AI teams via learned natural language rules and onboardingProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3667450(30466-30498)Online publication date: 10-Dec-2023

        View Options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format.

        HTML Format

        Login options

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media