poster

Supporting Qualitative Analysis with Large Language Models: Combining Codebook with GPT-3 for Deductive Coding

Authors:

Rania Abdelghani,

Pierre-Yves OudeyerAuthors Info & Claims

IUI '23 Companion: Companion Proceedings of the 28th International Conference on Intelligent User Interfaces

Pages 75 - 78

https://doi.org/10.1145/3581754.3584136

Published: 27 March 2023 Publication History

Abstract

Qualitative analysis of textual contents unpacks rich and valuable information by assigning labels to the data. However, this process is often labor-intensive, particularly when working with large datasets. While recent AI-based tools demonstrate utility, researchers may not have readily available AI resources and expertise, let alone be challenged by the limited generalizability of those task-specific models. In this study, we explored the use of large language models (LLMs) in supporting deductive coding, a major category of qualitative analysis where researchers use pre-determined codebooks to label the data into a fixed set of codes. Instead of training task-specific models, a pre-trained LLM could be used directly for various tasks without fine-tuning through prompt learning. Using a curiosity-driven questions coding task as a case study, we found, by combining GPT-3 with expert-drafted codebooks, our proposed approach achieved fair to substantial agreements with expert-coded results. We lay out challenges and opportunities in using LLMs to support qualitative coding and beyond.

References

[1]

Rania Abdelghani, Pierre-Yves Oudeyer, Edith Law, Catherine de Vulpillieres, and Hélene Sauzéon. 2022. Conversational agents for fostering curiosity-driven learning in children. arXiv preprint arXiv:2204.03546(2022).

[2]

Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel Ziegler, Jeffrey Wu, Clemens Winter, Chris Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei. 2020. Language Models are Few-Shot Learners. In Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin (Eds.). Vol. 33. Curran Associates, Inc., 1877–1901. https://proceedings.neurips.cc/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf

[3]

Aakanksha Chowdhery, Sharan Narang, Jacob Devlin, Maarten Bosma, Gaurav Mishra, Adam Roberts, Paul Barham, Hyung Won Chung, Charles Sutton, Sebastian Gehrmann, 2022. Palm: Scaling language modeling with pathways. arXiv preprint arXiv:2204.02311(2022).

[4]

Tianyu Gao, Adam Fisch, and Danqi Chen. 2021. Making Pre-trained Language Models Better Few-shot Learners. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Association for Computational Linguistics, Online, 3816–3830. https://doi.org/10.18653/v1/2021.acl-long.295

[5]

Hsiu-Fang Hsieh and Sarah E Shannon. 2005. Three approaches to qualitative content analysis. Qualitative health research 15, 9 (2005), 1277–1288.

[6]

Diane M Korngiebel and Sean D Mooney. 2021. Considering the possibilities and pitfalls of Generative Pre-trained Transformer 3 (GPT-3) in healthcare delivery. NPJ Digital Medicine 4, 1 (2021), 1–3.

[7]

Jasy Suet Yan Liew, Nancy McCracken, Shichun Zhou, and Kevin Crowston. 2014. Optimizing features in active machine learning for complex qualitative content analysis. In Proceedings of the ACL 2014 Workshop on Language Technologies and Computational Social Science. 44–48.

[8]

Pengfei Liu, Weizhe Yuan, Jinlan Fu, Zhengbao Jiang, Hiroaki Hayashi, and Graham Neubig. 2021. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. arXiv preprint arXiv:2107.13586(2021).

[9]

Mary L McHugh. 2012. Interrater reliability: the kappa statistic. Biochemia medica 22, 3 (2012), 276–282.

[10]

Michael Muller, Shion Guha, Eric PS Baumer, David Mimno, and N Sadat Shami. 2016. Machine learning and grounded theory method: convergence, divergence, and combination. In Proceedings of the 19th international conference on supporting group work. 3–8.

Digital Library

[11]

Pablo Paredes, Ana Rufino Ferreira, Cory Schillaci, Gene Yoo, Pierre Karashchuk, Dennis Xing, Coye Cheshire, and John Canny. 2017. Inquire: Large-scale early insight discovery for qualitative research. In Proceedings of the 2017 ACM Conference on Computer Supported Cooperative Work and Social Computing. 1562–1575.

Digital Library

[12]

Tim Rietz and Alexander Maedche. 2021. Cody: An AI-Based System to Semi-Automate Coding for Qualitative Research. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems (Yokohama, Japan) (CHI ’21). Association for Computing Machinery, New York, NY, USA, Article 394, 14 pages. https://doi.org/10.1145/3411764.3445591

Digital Library

[13]

William W Wilen. 1991. Questioning skills, for teachers. What research says to the teacher. (1991).

[14]

Ann Yuan, Andy Coenen, Emily Reif, and Daphne Ippolito. 2022. Wordcraft: Story Writing With Large Language Models. In 27th International Conference on Intelligent User Interfaces. 841–852.

[15]

Xingdi Yuan, Tong Wang, Yen-Hsiang Wang, Emery Fine, Rania Abdelghani, Pauline Lucas, Hélène Sauzéon, and Pierre-Yves Oudeyer. 2022. Selecting Better Samples from Pre-trained LLMs: A Case Study on Question Generation. arXiv preprint arXiv:2209.11000(2022).

[16]

Susan Zhang, Stephen Roller, Naman Goyal, Mikel Artetxe, Moya Chen, Shuohui Chen, Christopher Dewan, Mona Diab, Xian Li, Xi Victoria Lin, 2022. Opt: Open pre-trained transformer language models. arXiv preprint arXiv:2205.01068(2022).

Cited By

Šumak BPušnik MKožuh IŠorgo ABrdnik S(2025)Differences in User Perception of Artificial Intelligence-Driven Chatbots and Traditional Tools in Qualitative Data AnalysisApplied Sciences10.3390/app1502063115:2(631)Online publication date: 10-Jan-2025
https://doi.org/10.3390/app15020631
Zheng ZNing KZhong QChen JChen WGuo LWang WWang Y(2025)Towards an understanding of large language models in software engineering tasksEmpirical Software Engineering10.1007/s10664-024-10602-030:2Online publication date: 1-Mar-2025
https://dl.acm.org/doi/10.1007/s10664-024-10602-0
Cardamone NOlfson MSchmutte TUngar LLiu TCullen SWilliams NMarcus S(2024)Classifying Unstructured Text in Electronic Health Records for Mental Health Prediction Models using a Large Language Model (Preprint)JMIR Medical Informatics10.2196/65454Online publication date: 15-Aug-2024
https://doi.org/10.2196/65454
Show More Cited By

Index Terms

Supporting Qualitative Analysis with Large Language Models: Combining Codebook with GPT-3 for Deductive Coding
1. Human-centered computing
  1. Human computer interaction (HCI)
    1. HCI design and evaluation methods

Recommendations

Dynamic codebook adaptive vector quantization for image coding

A new image coding scheme called dynamic codebook adaptive vector quantization (DCAVQ) is proposed. DCAVQ is designed to minimize the transmission overhead and computational complexity of existing adaptive vector quantization systems. Simulation results ...
Large vector quantization codebook generation: analysis and design
Bit allocation for joint source and channel coding of progressively compressed 3-D models

This work presents a joint source and channel coding method for transmission of progressively compressed three-dimensional (3-D) models where the bit budget is allocated optimally. The proposed system uses the Compressed Progressive Mesh (CPM) algorithm ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

IUI '23 Companion: Companion Proceedings of the 28th International Conference on Intelligent User Interfaces

March 2023

266 pages

ISBN:9798400701078

DOI:10.1145/3581754

Copyright © 2023 Owner/Author.

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 March 2023

Check for updates

Author Tags

Qualifiers

Poster
Research
Refereed limited

Conference

IUI '23

Sponsor:

IUI '23: 28th International Conference on Intelligent User Interfaces

March 27 - 31, 2023

NSW, Sydney, Australia

Acceptance Rates

Overall Acceptance Rate 746 of 2,811 submissions, 27%

Upcoming Conference

IUI '25

Sponsor:
sigai
sigai

30th International Conference on Intelligent User Interfaces

March 24 - 27, 2025

Cagliari , Italy

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

57
Total Citations
View Citations
1,988
Total Downloads

Downloads (Last 12 months)1,260
Downloads (Last 6 weeks)187

Reflects downloads up to 15 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Šumak BPušnik MKožuh IŠorgo ABrdnik S(2025)Differences in User Perception of Artificial Intelligence-Driven Chatbots and Traditional Tools in Qualitative Data AnalysisApplied Sciences10.3390/app1502063115:2(631)Online publication date: 10-Jan-2025
https://doi.org/10.3390/app15020631
Zheng ZNing KZhong QChen JChen WGuo LWang WWang Y(2025)Towards an understanding of large language models in software engineering tasksEmpirical Software Engineering10.1007/s10664-024-10602-030:2Online publication date: 1-Mar-2025
https://dl.acm.org/doi/10.1007/s10664-024-10602-0
Cardamone NOlfson MSchmutte TUngar LLiu TCullen SWilliams NMarcus S(2024)Classifying Unstructured Text in Electronic Health Records for Mental Health Prediction Models using a Large Language Model (Preprint)JMIR Medical Informatics10.2196/65454Online publication date: 15-Aug-2024
https://doi.org/10.2196/65454
Haupt MYang LPurnat TMackey T(2024)Evaluating the Influence of Role-Playing Prompts on ChatGPT’s Misinformation Detection Accuracy: Quantitative StudyJMIR Infodemiology10.2196/606784(e60678)Online publication date: 26-Sep-2024
https://doi.org/10.2196/60678
Lee Vvan der Lubbe SGoh LValderas J(2024)Harnessing ChatGPT for Thematic Analysis: Are We Ready?Journal of Medical Internet Research10.2196/5497426(e54974)Online publication date: 31-May-2024
https://doi.org/10.2196/54974
Prescott MYeager SHam LRivera Saldana CSerrano VNarez JPaltin DDelgado JMoore DMontoya J(2024)Comparing the Efficacy and Efficiency of Human and Generative AI: Qualitative Thematic AnalysesJMIR AI10.2196/544823(e54482)Online publication date: 2-Aug-2024
https://doi.org/10.2196/54482
Liyanage CGokani RMago V(2024)GPT-4 as an X data annotator: Unraveling its performance on a stance classification taskPLOS ONE10.1371/journal.pone.030774119:8(e0307741)Online publication date: 15-Aug-2024
https://doi.org/10.1371/journal.pone.0307741
Bland TGuo MDousay T(2024)Multimedia design for learner interest and achievement: a visual guide to pharmacologyBMC Medical Education10.1186/s12909-024-05077-y24:1Online publication date: 5-Feb-2024
https://doi.org/10.1186/s12909-024-05077-y
Katz AGerhardt MSoledad M(2024)Using Generative Text Models to Create Qualitative Codebooks for Student Evaluations of TeachingInternational Journal of Qualitative Methods10.1177/1609406924129328323Online publication date: 14-Nov-2024
https://doi.org/10.1177/16094069241293283
Tai RBentley LXia XSitt JFankhauser SChicas-Mosier AMonteith B(2024)An Examination of the Use of Large Language Models to Aid Analysis of Textual DataInternational Journal of Qualitative Methods10.1177/1609406924123116823Online publication date: 13-Feb-2024
https://doi.org/10.1177/16094069241231168
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Table of Contents