Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3603555.3603565acmotherconferencesArticle/Chapter ViewAbstractPublication PagesmundcConference Proceedingsconference-collections
research-article

From ChatGPT to FactGPT: A Participatory Design Study to Mitigate the Effects of Large Language Model Hallucinations on Users

Published: 03 September 2023 Publication History

Abstract

Large language models (LLMs) like ChatGPT recently gained interest across all walks of life with their human-like quality in textual responses. Despite their success in research, healthcare, or education, LLMs frequently include incorrect information, called hallucinations, in their responses. These hallucinations could influence users to trust fake news or change their general beliefs. Therefore, we investigate mitigation strategies desired by users to enable identification of LLM hallucinations. To achieve this goal, we conduct a participatory design study where everyday users design interface features which are then assessed for their feasibility by machine learning (ML) experts. We find that many of the desired features are well-perceived by ML experts but are also considered as difficult to implement. Finally, we provide a list of desired features that should serve as a basis for mitigating the effect of LLM hallucinations on users.

References

[1]
Eleni Adamopoulou and Lefteris Moussiades. 2020. Chatbots: History, technology, and applications. Machine Learning with Applications 2 (2020), 100006.
[2]
Gerardo Adesso. 2022. GPT4: The ultimate brain. Authorea Preprints (2022).
[3]
Hussam Alkaissi and Samy I McFarlane. 2023. Artificial Hallucinations in ChatGPT: Implications in Scientific Writing. Cureus (Feb. 2023). https://doi.org/10.7759/cureus.35179
[4]
Caroline Bassett. 2019. The computational therapeutic: exploring Weizenbaum’s ELIZA as a history of the present. AI & SOCIETY 34 (2019), 803–812.
[5]
Emily Bell. 2023. A fake news frenzy: why ChatGPT could be disastrous for truth in journalism. The Guardian (March 2023). https://www.theguardian.com/commentisfree/2023/mar/03/fake-news-chatgpt-truth-journalism-disinformation
[6]
Ivo Benke, Michael Thomas Knierim, and Alexander Maedche. 2020. Chatbot-based Emotion Management for Distributed Teams: A Participatory Design Study. Proceedings of the ACM on Human-Computer Interaction 4, CSCW2 (Oct 2020), 1–30. https://doi.org/10.1145/3415189
[7]
Som Biswas. 2023. ChatGPT and the Future of Medical Writing. Radiology (Feb. 2023). https://doi.org/10.1148/radiol.223312
[8]
Ali Borji. 2023. A categorical archive of chatgpt failures. arXiv preprint arXiv:2302.03494 (2023).
[9]
Virginia Braun and Victoria Clarke. 2006. Using thematic analysis in psychology. Qualitative Research in Psychology 3, 2 (Jan 2006), 77–101. https://doi.org/10.1191/1478088706qp063oa
[10]
Dmitri Brereton. 2023. Bing AI Can’t Be Trusted. https://dkb.blog/p/bing-ai-cant-be-trusted
[11]
Nicholas Carlini, Florian Tramer, Eric Wallace, Matthew Jagielski, Ariel Herbert-Voss, Katherine Lee, Adam Roberts, Tom B Brown, Dawn Song, Ulfar Erlingsson, 2021. Extracting Training Data from Large Language Models. In USENIX Security Symposium, Vol. 6.
[12]
Martin Coulter and Greg Bensinger. 2023. Alphabet shares dive after Google AI chatbot Bard flubs answer in ad. Reuters (Feb. 2023). https://www.reuters.com/technology/google-ai-chatbot-bard-offers-inaccurate-information-company-ad-2023-02-08/
[13]
Robert Dilts. 1994. Strategies of genius. Vol. 1, Aristotle, Sherlock Holmes, Walt Disney, Wolfgang Amadeus Mozart. Dilts Strategy Group, Scotts Valley.
[14]
Yogesh K Dwivedi, Nir Kshetri, Laurie Hughes, Emma Louise Slade, Anand Jeyaraj, Arpan Kumar Kar, Abdullah M Baabdullah, Alex Koohang, Vishnupriya Raghavan, Manju Ahuja, 2023. “So what if ChatGPT wrote it?” Multidisciplinary perspectives on opportunities, challenges and implications of generative conversational AI for research, practice and policy. International Journal of Information Management 71 (2023), 102642.
[15]
Jasper Feine, Ulrich Gnewuch, Stefan Morana, and Alexander Maedche. 2019. A taxonomy of social cues for conversational agents. International Journal of Human-Computer Studies 132 (2019), 138–161.
[16]
Luciano Floridi and Massimo Chiriatti. 2020. GPT-3: Its nature, scope, limits, and consequences. Minds and Machines 30 (2020), 681–694.
[17]
Google. 2021. LaMDA: our breakthrough conversation technology. https://blog.google/technology/ai/lamda/
[18]
Jia-Chen Gu, Tianda Li, Quan Liu, Zhen-Hua Ling, Zhiming Su, Si Wei, and Xiaodan Zhu. 2020. Speaker-aware BERT for multi-turn response selection in retrieval-based chatbots. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management. 2041–2044.
[19]
Andreas Holzinger. 2018. From machine learning to explainable AI. In 2018 world symposium on digital intelligence for systems and machines (DISA). IEEE, 55–66.
[20]
Joseph Kahne and Benjamin Bowyer. 2018. The Political Significance of Social Media Activity and Social Networks. Political Communication 35, 3 (feb 2018), 470–493. https://doi.org/10.1080/10584609.2018.1426662
[21]
David Kaplan, Ramayya Krishnan, Rema Padman, and James Peters. 1998. Assessing data quality in accounting information systems. Commun. ACM 41, 2 (Feb. 1998), 72–78. https://doi.org/10.1145/269012.269024
[22]
Enkelejda Kasneci, Kathrin Sessler, Stefan Küchemann, Maria Bannert, Daryna Dementieva, Frank Fischer, Urs Gasser, Georg Groh, Stephan Günnemann, Eyke Hüllermeier, Stepha Krusche, Gitta Kutyniok, Tilman Michaeli, Claudia Nerdel, Jürgen Pfeffer, Oleksandra Poquet, Michael Sailer, Albrecht Schmidt, Tina Seidel, Matthias Stadler, Jochen Weller, Jochen Kuhn, and Gjergji Kasneci. 2023. ChatGPT for good? On opportunities and challenges of large language models for education. Learning and Individual Differences 103 (2023), 102274. https://doi.org/10.1016/j.lindif.2023.102274
[23]
Merlin Knaeble, Mario Nadj, Luisa Germann, and Alexander Maedche. 2023. Tools of Trade of the Next Blue-Collar Job? Antecedents, Design Features, and Outcomes of Interactive Labeling Systems. In 31st European Conference on Information Systems (ECIS) Research Papers. Kristiansand, Norway.
[24]
Merlin Knaeble, Gabriel Sailer, Zihan Chen, Thorsten Schwarz, Kailun Yang, Mario Nadj, Rainer Stiefelhagen, and Alexander Maedche. 2023. AutoChemplete - Making Chemical Structural Formulas Accessible. In Proceedings of the 20th International Web for All Conference (Austin, TX, USA) (W4A ’23). Association for Computing Machinery, New York, NY, USA, 104–115. https://doi.org/10.1145/3587281.3587293
[25]
Vivian Lai, Chacha Chen, Q Vera Liao, Alison Smith-Renner, and Chenhao Tan. 2021. Towards a science of human-ai decision making: a survey of empirical studies. arXiv preprint arXiv:2112.11471 (2021).
[26]
John D Lee and Katrina A See. 2004. Trust in automation: Designing for appropriate reliance. Human factors 46, 1 (2004), 50–80.
[27]
Min Kyung Lee, Daniel Kusbit, Anson Kahng, Ji Tae Kim, Xinran Yuan, Allissa Chan, Daniel See, Ritesh Noothigattu, Siheon Lee, Alexandros Psomas, and Ariel D. Procaccia. 2019. WeBuildAI: Participatory Framework for Algorithmic Governance. Proceedings of the ACM on Human-Computer Interaction 3, CSCW (Nov 2019), 1–35. https://doi.org/10.1145/3359283
[28]
Sebastian Lins, Konstantin D. Pandl, Heiner Teigeler, Scott Thiebes, Calvin Bayer, and Ali Sunyaev. 2021. Artificial Intelligence as a Service. Business & Information Systems Engineering 63, 4 (jul 2021), 441–456. https://doi.org/10.1007/s12599-021-00708-w
[29]
Yvette P. Lopez, Paula L. Rechner, and Julie B. Olson-Buchanan. 2005. Shaping Ethical Perceptions: An Empirical Assessment of the Influence of Business Education, Culture, and Demographic Factors. Journal of Business Ethics 60, 4 (Sept. 2005), 341–358. https://doi.org/10.1007/s10551-005-1834-4
[30]
Jan Lorenz, Heiko Rauhut, Frank Schweitzer, and Dirk Helbing. 2011. How social influence can undermine the wisdom of crowd effect. Proceedings of the National Academy of Sciences 108, 22 (2011), 9020–9025. https://doi.org/10.1073/pnas.1008636108 arXiv:https://www.pnas.org/doi/pdf/10.1073/pnas.1008636108
[31]
Brady D. Lund and Ting Wang. 2023. Chatting about ChatGPT: how may AI and GPT impact academia and libraries?Library Hi Tech News (Feb. 2023). https://doi.org/10.1108/lhtn-01-2023-0009
[32]
Robert W McGee. 2023. Annie Chan: Three Short Stories Written with Chat GPT. Available at SSRN 4359403 (2023).
[33]
Danette McGilvray. 2021. Executing data quality projects: Ten steps to quality data and trusted information (TM). Academic Press.
[34]
Yusuf Mehdi. 2023. Reinventing search with a new AI-powered Microsoft Bing and EDGE, your copilot for the web. https://blogs.microsoft.com/blog/2023/02/07/reinventing-search-with-a-new-ai-powered-microsoft-bing-and-edge-your-copilot-for-the-web/
[35]
Michael J. Muller. 2002. Participatory design: the third space in HCI. L. Erlbaum Associates Inc., USA, 1051–1068.
[36]
Jyoti Narayan, Krystal Hu, Martin Coulter, and Supantha Mukherjee. 2023. Elon Musk and others urge ai pause, citing ’risks to society’. https://www.reuters.com/technology/musk-experts-urge-pause-training-ai-systems-that-can-outperform-gpt-4-2023-03-29/
[37]
Peter Norvig. 1992. Paradigms of artificial intelligence programming: case studies in Common LISP. Morgan Kaufmann.
[38]
Siobhan O’Connor. 2023. Corrigendum to "Open artificial intelligence platforms in nursing education: Tools for academic progress or abuse?" [Nurse Educ. Pract. 66 (2023) 103537]. Nurse education in practice 67 (February 2023), 103572. https://doi.org/10.1016/j.nepr.2023.103572
[39]
Siobhan O’Connor and ChatGPT. 2023. Open artificial intelligence platforms in nursing education: Tools for academic progress or abuse?Nurse education in practice 66 (January 2023), 103537. https://doi.org/10.1016/j.nepr.2022.103537
[40]
OpenAI. 2023. Introducing ChatGPT. https://openai.com/blog/chatgpt
[41]
OpenAI. (accessed 06/13/2023). OpenAI API Reference. https://platform.openai.com/docs/api-reference
[42]
Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schulman, Jacob Hilton, Fraser Kelton, Luke Miller, Maddie Simens, Amanda Askell, Peter Welinder, Paul F Christiano, Jan Leike, and Ryan Lowe. 2022. Training language models to follow instructions with human feedback. In Advances in Neural Information Processing Systems, S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh (Eds.). Vol. 35. Curran Associates, Inc., 27730–27744.
[43]
Raja Parasuraman and Victor Riley. 1997. Humans and automation: Use, misuse, disuse, abuse. Human factors 39, 2 (1997), 230–253.
[44]
Samir Passi and Mihaela Vorvoreanu. 2022. Overreliance on AI: Literature review. (2022).
[45]
Jaron Porciello, Maryia Ivanina, Maidul Islam, Stefan Einarson, and Haym Hirsh. 2020. Accelerating evidence-informed decision-making for the Sustainable Development Goals using machine learning. Nature Machine Intelligence 2, 10 (Oct. 2020), 559–565. https://doi.org/10.1038/s42256-020-00235-5
[46]
Maximilian Renner, Sebastian Lins, Matthias Söllner, Scott Thiebes, and Ali Sunyaev. 2021. Achieving Trustworthy Artificial Intelligence: Multi-Source Trust Transfer in Artificial Intelligence-capable Technology. In Building Sustainability and Resilience with IS: A Call fpr Action : ICIS 2021 proceedings ; 42nd International Conference on Information Systems (ICIS). AIS eLibrary (AISeL).
[47]
Aalok Sathe, Salar Ather, Tuan Manh Le, Nathan Perry, and Joonsuk Park. 2020. Automated fact-checking of claims from Wikipedia. In Proceedings of the 12th Language Resources and Evaluation Conference. 6874–6882.
[48]
Ben Shneiderman. 2020. Human-centered artificial intelligence: Reliable, safe & trustworthy. International Journal of Human–Computer Interaction 36, 6 (2020), 495–504.
[49]
Chris Stokel-Walker. 2023. ChatGPT listed as author on research papers: many scientists disapprove. Nature (2023).
[50]
Sarah Tausch, Fabius Steinberger, and Heinrich Hußmann. 2015. Thinking Like Disney: Supporting the Disney Method Using Ambient Feedback Based on Group Performance. Vol. 9298. Springer International Publishing, Cham, 614–621. https://doi.org/10.1007/978-3-319-22698-9_42
[51]
Scott Thiebes, Sebastian Lins, and Ali Sunyaev. 2020. Trustworthy artificial intelligence. Electronic Markets 31, 2 (Oct. 2020), 447–464. https://doi.org/10.1007/s12525-020-00441-4
[52]
Giuseppe Venuto. 2023. LLM failure archive (ChatGPT and beyond). https://github.com/giuven95/chatgpt-failures
[53]
James Vincent. 2023. Google and Microsoft’s chatbots are already citing one another in a misinformation shitshow. https://www.theverge.com/2023/3/22/23651564/google-microsoft-bard-bing-chatbots-misinformation
[54]
Laura von Rueden, Sebastian Mayer, Katharina Beckh, Bogdan Georgiev, Sven Giesselbach, Raoul Heese, Birgit Kirsch, Julius Pfrommer, Annika Pick, Rajkumar Ramamurthy, Michal Walczak, Jochen Garcke, Christian Bauckhage, and Jannis Schuecker. 2023. Informed Machine Learning – A Taxonomy and Survey of Integrating Prior Knowledge into Learning Systems. IEEE Transactions on Knowledge and Data Engineering 35, 1 (2023), 614–633. https://doi.org/10.1109/TKDE.2021.3079836
[55]
Nathan Walter, Jonathan Cohen, R. Lance Holbert, and Yasmin Morag. 2019. Fact-Checking: A Meta-Analysis of What Works and for Whom. Political Communication 37, 3 (Oct. 2019), 350–375. https://doi.org/10.1080/10584609.2019.1668894
[56]
Yingxu Wang and Guenther Ruhe. 2007. The cognitive process of decision making. International Journal of Cognitive Informatics and Natural Intelligence (IJCINI) 1, 2 (2007), 73–85.
[57]
Jason Wei, Yi Tay, Rishi Bommasani, Colin Raffel, Barret Zoph, Sebastian Borgeaud, Dani Yogatama, Maarten Bosma, Denny Zhou, Donald Metzler, 2022. Emergent abilities of large language models. arXiv preprint arXiv:2206.07682 (2022).
[58]
Joseph Weizenbaum. 1966. ELIZA—a computer program for the study of natural language communication between man and machine. Commun. ACM 9, 1 (1966), 36–45.
[59]
Ogan M. Yigitbasioglu and Oana Velcu. 2012. A review of dashboards in performance management: Implications for design and research. International Journal of Accounting Information Systems 13, 1 (March 2012), 41–59. https://doi.org/10.1016/j.accinf.2011.08.002
[60]
Douglas Zytko, Pamela J. Wisniewski, Shion Guha, Eric P. S. Baumer, and Min Kyung Lee. 2022. Participatory Design of AI Systems: Opportunities and Challenges Across Diverse Users, Relationships, and Application Domains. In Extended Abstracts of the 2022 CHI Conference on Human Factors in Computing Systems(CHI EA ’22). Association for Computing Machinery, New York, NY, USA, 1–4. https://doi.org/10.1145/3491101.3516506

Cited By

View all
  • (2024)Integrating Youth Perspectives into the Design of AI-Supported Collaborative Learning EnvironmentsEducation Sciences10.3390/educsci1411119714:11(1197)Online publication date: 31-Oct-2024
  • (2024)AI in CS Education: Opportunities, Challenges, and Pitfalls to AvoidACM Inroads10.1145/367920515:3(52-57)Online publication date: 21-Aug-2024
  • (2024)One vs. Many: Comprehending Accurate Information from Multiple Erroneous and Inconsistent AI GenerationsProceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency10.1145/3630106.3662681(2518-2531)Online publication date: 3-Jun-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
MuC '23: Proceedings of Mensch und Computer 2023
September 2023
593 pages
This work is licensed under a Creative Commons Attribution International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 September 2023

Check for updates

Author Tags

  1. Artificial Hallucinations
  2. ChatGPT
  3. Disney Method
  4. Large Language Models
  5. Participatory Design

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

MuC '23
MuC '23: Mensch und Computer 2023
September 3 - 6, 2023
Rapperswil, Switzerland

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)670
  • Downloads (Last 6 weeks)54
Reflects downloads up to 18 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Integrating Youth Perspectives into the Design of AI-Supported Collaborative Learning EnvironmentsEducation Sciences10.3390/educsci1411119714:11(1197)Online publication date: 31-Oct-2024
  • (2024)AI in CS Education: Opportunities, Challenges, and Pitfalls to AvoidACM Inroads10.1145/367920515:3(52-57)Online publication date: 21-Aug-2024
  • (2024)One vs. Many: Comprehending Accurate Information from Multiple Erroneous and Inconsistent AI GenerationsProceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency10.1145/3630106.3662681(2518-2531)Online publication date: 3-Jun-2024
  • (2024)Trust by Interface: How Different User Interfaces Shape Human Trust in Health Information from Large Language ModelsExtended Abstracts of the CHI Conference on Human Factors in Computing Systems10.1145/3613905.3650837(1-7)Online publication date: 11-May-2024
  • (2024)HILL: A Hallucination Identifier for Large Language ModelsProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642428(1-13)Online publication date: 11-May-2024
  • (2024)What Should a Robot Do? Comparing Human and Large Language Model Recommendations for Robot DeceptionCompanion of the 2024 ACM/IEEE International Conference on Human-Robot Interaction10.1145/3610978.3640752(906-910)Online publication date: 11-Mar-2024
  • (2024)Promises and challenges of generative artificial intelligence for human learningNature Human Behaviour10.1038/s41562-024-02004-58:10(1839-1850)Online publication date: 22-Oct-2024
  • (2024)Curio: Enhancing STEM Online Video Learning Experience Through Integrated, Just-in-Time Help-SeekingTechnology Enhanced Learning for Inclusive and Equitable Quality Education10.1007/978-3-031-72315-5_30(437-451)Online publication date: 16-Sep-2024
  • (2024)An Architecture for Formative Assessment Analytics of Multimodal Artefacts in ePortfolios Supported by Artificial IntelligenceAssessment Analytics in Education10.1007/978-3-031-56365-2_15(293-312)Online publication date: 8-May-2024
  • (2023)Hallucination-minimized Data-to-answer Framework for Financial Decision-makers2023 IEEE International Conference on Big Data (BigData)10.1109/BigData59044.2023.10386232(4693-4702)Online publication date: 15-Dec-2023

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media