Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3674805.3690744acmconferencesArticle/Chapter ViewAbstractPublication PagesesemConference Proceedingsconference-collections
research-article
Open access

Can ChatGPT emulate humans in software engineering surveys?

Published: 24 October 2024 Publication History

Abstract

Context: There is a growing belief in the literature that large language models (LLMs), such as ChatGPT, can mimic human behavior in surveys. Gap: While the literature has shown promising results in social sciences and market research, there is scant evidence of its effectiveness in technical fields like software engineering. Objective: Inspired by previous work, this paper explores ChatGPT’s ability to replicate findings from prior software engineering research. Given the frequent use of surveys in this field, if LLMs can accurately emulate human responses, this technique could address common methodological challenges like recruitment difficulties, representational shortcomings, and respondent fatigue. Method: We prompted ChatGPT to reflect the behavior of a ‘mega-persona’ representing the demographic distribution of interest. We replicated surveys from 2019 to 2023 from leading SE conferences, examining ChatGPT’s proficiency in mimicking responses from diverse demographics. Results: Our findings reveal that ChatGPT can successfully replicate the outcomes of some studies, but in others, the results were not significantly better than a random baseline. Conclusions: This paper reports our results so far and discusses the challenges and potential research opportunities in leveraging LLMs for representing humans in software engineering surveys.

References

[1]
W. Agnew, A. S. Bergman, J. Chien, M. Diaz, S. El-Sayed, J. Pittman, S. Mohamed, and K. R. McKee. 2024. The illusion of artificial inclusion. arxiv:2401.08572 [cs.CY]
[2]
G. V. Aher, R. I. Arriaga, and A. T. Kalai. 2023. Using large language models to simulate multiple humans and replicate human subject studies. In 40th International Conference on Machine Learning(ICML’23). PMLR, Honolulu, Hawaii, USA, 337–371.
[3]
R. S. Alsuhaibani, C. D. Newman, M. J. Decker, M. L. Collard, and J. I. Maletic. 2021. On the Naming of Methods: A Survey of Professional Developers. In ICSE ’21. IEEE Press, Madrid, Spain, 587–599. https://doi.org/10.1109/ICSE43902.2021.00061
[4]
A. Anghelescu, F. C. Firan, G. Onose, C. Munteanu, A. Trandafir, I. Ciobanu, S. Gheorghita, and V. Ciobanu. 2023. PRISMA Systematic Literature Review, including with Meta-Analysis vs. ChatbotGPT (AI) regarding Current Scientific Data on the Main Effects of the Calf Blood Deproteinized Hemoderivative Medicine (Actovegin) in Ischemic Stroke. Biomedicines 6, 11 (2023), 1–13.
[5]
L. P. Argyle, E. C. Busby, N. Fulda, J. R. Gubler, C. Rytting, and D. Wingate. 2023. Out of one, many: Using language models to simulate human samples. Political Analysis 31, 3 (2023), 337–351.
[6]
S. Asthana, H. Sajnani, E. Voyloshnikova, B. Acharya, and K. Herzig. 2023. A Case Study of Developer Bots: Motivations, Perceptions, and Challenges. In ESEC/FSE’23 (San Francisco, CA, USA). Association for Computing Machinery, New York, NY, USA, 1268–1280. https://doi.org/10.1145/3611643.3616248
[7]
Anonymous Author(s). 2024. Replication Package: Can ChatGPT emulate humans in software engineering surveys?https://doi.org/10.5281/zenodo.10578334
[8]
N. Bodani, A. Lal, A. Maqsood, S. Altamash, N. Ahmed, and A. Heboyan. 2023. Knowledge, Attitude, and Practices of General Population Toward Utilizing ChatGPT: A Cross-sectional Study. SAGE Open 13 (2023), 1–9. https://doi.org/10.1177/21582440231211079
[9]
L. Braz and A. Bacchelli. 2022. Software security during modern code review: the developer’s perspective. In ESEC/FSE’22 (Singapore). Association for Computing Machinery, New York, NY, USA, 810–821. https://doi.org/10.1145/3540250.3549135
[10]
A. Chatterjee, M. Guizani, C. Stevens, J. Emard, M. E. May, M. Burnett, I. Ahmed, and A. Sarma. 2021. AID: An automated detector for gender-inclusivity bugs in OSS project pages. In ICSE ’21. IEEE Press, Madrid, Spain, 1423–1435. https://doi.org/10.1109/ICSE43902.2021.00128
[11]
B. Chen, L. Chen, C. Zhang, and X. Peng. 2021. BuildFast: history-aware build outcome prediction for fast feedback and reduced cost in continuous integration. In ASE ’20 (Virtual Event, Australia). Association for Computing Machinery, New York, NY, USA, 42–53. https://doi.org/10.1145/3324884.3416616
[12]
S. Cruz, F. Q. B. da Silva, and L. F. Capretz. 2015. Forty years of research on personality in software engineering: A mapping study. Computers in Human Behavior 46 (2015), 94–113. https://doi.org/10.1016/j.chb.2014.12.008
[13]
D. Demszky, D. Yang, D. S. Yeager, C. J. Bryan, M. Clapper, S. Chandhok, J. C. Eichstaedt, C. Hecht, J. Jamieson, M. Johnson, 2023. Using large language models in psychology. Nature Reviews Psychology 2, 1 (2023), 1–14.
[14]
D. Dillion, N. Tandon, Y. Gu, and K. Gray. 2023. Can AI language models replace human participants?Trends in Cognitive Sciences 27, 7 (2023), 597–600. https://doi.org/10.1016/j.tics.2023.04.008
[15]
Z. Feng, A. Chatterjee, A. Sarma, and I. Ahmed. 2022. A case study of implicit mentoring, its prevalence, and impact in Apache. In ESEC/FSE ’22 (Singapore). Association for Computing Machinery, New York, NY, USA, 797–809. https://doi.org/10.1145/3540250.3549167
[16]
S. García, D. Strüber, D. Brugali, T. Berger, and P. Pelliccione. 2020. Robotics software engineering: a perspective from the service robotics domain. In 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (Virtual Event, USA) (ESEC/FSE ’20). Association for Computing Machinery, New York, NY, USA, 593–604. https://doi.org/10.1145/3368089.3409743
[17]
M. Gerosa, B. Trinkenreich, I. Steinmacher, and A. Sarma. 2023. Can AI serve as a substitute for human subjects in software engineering research?Automated Software Engineering 31, 13 (2023), 1–25. https://doi.org/10.1007/s10515-023-00409-6
[18]
A. N. Ghazi, K. Petersen, S. S. V. R. Reddy, and H. Nekkanti. 2019. Survey Research in Software Engineering: Problems and Mitigation Strategies. IEEE Access 7, 1 (2019), 24703–24718. https://doi.org/10.1109/ACCESS.2018.2881041
[19]
F. Grund, S. Chowdhury, N. C. Bradley, B. Hall, and R. Holmes. 2021. CodeShovel: Constructing Method-Level Source Code Histories. In ICSE ’21. IEEE Press, Madrid, Spain, 1510–1522. https://doi.org/10.1109/ICSE43902.2021.00135
[20]
P. Hämäläinen, M. Tavast, and A. Kunnari. 2023. Evaluating Large Language Models in Generating Synthetic HCI Research Data: A Case Study. In CHI’23 (Hamburg, Germany). Association for Computing Machinery, New York, NY, USA, Article 433, 19 pages. https://doi.org/10.1145/3544548.3580688
[21]
M. Hutson and A. Mastin. 2023. Guinea pigbots. Science (New York, NY) 381, 6654 (2023), 121–123.
[22]
H. Jiang, X. Zhang, X. Cao, J. Kabbara, and D. Roy. 2023. PersonaLLM: Investigating the ability of GPT-3.5 to express personality traits and gender differences.
[23]
A. Ju, H. Sajnani, S. Kelly, and K. Herzig. 2021. A Case Study of Onboarding in Software Teams: Tasks and Strategies. In ICSE ’21. IEEE Press, Madrid, Spain, 613–623. https://doi.org/10.1109/ICSE43902.2021.00063
[24]
J. Kaddour, J. Harris, M. Mozes, H. Bradley, R. Raileanu, and R. McHardy. 2023. Challenges and applications of large language models.
[25]
J. Kim and B. Lee. 2023. AI-Augmented Surveys: Leveraging Large Language Models for Opinion Prediction in Nationally Representative Surveys.
[26]
B. A. Kitchenham and S. L. Pfleeger. 2002. Principles of survey research part 2: designing a survey. SIGSOFT Softw. Eng. Notes 27, 1 (jan 2002), 18––20. https://doi.org/10.1145/566493.566495
[27]
E. Kokinda, M. Moster, J. Dominic, and P. Rodeghero. 2023. Under the Bridge: Trolling and the Challenges of Recruiting Software Developers for Empirical Research Studies. In ICSE-NIER ’23. Association for Computing Machinery, Melbourne, Australia, 55–59. https://doi.org/10.1109/ICSE-NIER58687.2023.00016
[28]
S. Lee, T.-Q. Peng, M. H. Goldberg, S. A. Rosenthal, J. E. Kotcher, E. W. Maibach, and A. Leiserowitz. 2023. Can Large Language Models Capture Public Opinion about Global Warming? An Empirical Assessment of Algorithmic Fidelity and Bias.
[29]
J. S. Molléri, K. Petersen, and E. Mendes. 2016. Survey Guidelines in Software Engineering: An Annotated Review. In ESEM ’16. Association for Computing Machinery, New York, NY, USA, Article 58, 6 pages. https://doi.org/10.1145/2961111.2962619
[30]
S. Motoki, F. Yoshio, J. Monteiro, R. Malagueño, and V. Rodrigues. 2023. From Data Scarcity to Data Abundance: Crafting Synthetic Survey Data in Management Accounting using ChatGPT.
[31]
OpenAI. 2024. ChatGPT (4) [Large language model]. https://chat.openai.com
[32]
P. Petrak, T. T. Tran, and I. Gurevych. 2024. Learning from Emotions, Demographic Information and Implicit User Feedback in Task-Oriented Document-Grounded Dialogues. arxiv:2401.09248 [cs.CL]
[33]
S. L. Pfleeger and B. A. Kitchenham. 2001. Principles of survey research: part 1: turning lemons into lemonade. SIGSOFT Softw. Eng. Notes 26, 6 (nov 2001), 16––18. https://doi.org/10.1145/505532.505535
[34]
N. E Sanders, A. Ulinich, and B. Schneier. 2023. Demonstrations of the potential of AI-based political issue polling.
[35]
G. Simmons and C. Hare. 2023. Large Language Models as Subpopulation Representative Models: A Review.
[36]
D. Sokolowski, P. Weisenburger, and G. Salvaneschi. 2021. Automating serverless deployments for DevOps organizations. In ESEC/FSE’21 (Athens, Greece). Association for Computing Machinery, New York, NY, USA, 57–69. https://doi.org/10.1145/3468264.3468575
[37]
E. A. M. van Dis, J. Bollen, W. Zuidema, R. van Rooij, and C. L. Bockting. 2023. ChatGPT: five priorities for research. Nature 614, 7947 (2023), 224–226. https://doi.org/10.1038/d41586-023-00288-7
[38]
E. L. Vargas, M. Aniche, C. Treude, M. Bruntink, and G. Gousios. 2020. Selecting third-party libraries: the practitioners’ perspective. In ESEC/FSE ’20 (Virtual Event, USA). Association for Computing Machinery, New York, NY, USA, 245–256. https://doi.org/10.1145/3368089.3409711
[39]
S. Wagner, D. Mendez, M. Felderer, D. Graziotin, and M. Kalinowski. 2020. Challenges in Survey Research. Springer International Publishing, Cham, 93–125. https://doi.org/10.1007/978-3-030-32489-6_4
[40]
J. Wen and W. Wang. 2023. The future of ChatGPT in academic research and publishing: A commentary for clinical and translational medicine. Clinical and Translational Medicine 13, 3 (2023), e1207. https://doi.org/10.1002/ctm2.1207
[41]
D. G. Widder, M. Hilton, C. Kästner, and B. Vasilescu. 2019. A conceptual replication of continuous integration pain points in the context of Travis CI. In 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (Tallinn, Estonia) (ESEC/FSE ’19). Association for Computing Machinery, New York, NY, USA, 647–658. https://doi.org/10.1145/3338906.3338922
[42]
Y. Xu, X. Liu, X. Cao, C. Huang, E. Liu, S. Qian, X. Liu, Y. Wu, F. Dong, C.-W. Qiu, J. Qiu, K. Hua, W. Su, J. Wu, H. Xu, Y. Han, Ch. Fu, Z. Yin, M. Liu, R. Roepman, S. Dietmann, M. Virta, F. Kengara, Z. Zhang, L. Zhang, T. Zhao, J. Dai, J. Yang, L. Lan, M. Luo, Z. Liu, T. An, B. Zhang, X. He, S. Cong, X. Liu, W. Zhang, J. P. Lewis, J. M. Tiedje, Q. Wang, Z. An, F. Wang, L. Zhang, T. Huang, C. Lu, Z. Cai, F. Wang, and J. Zhang. 2021. Artificial intelligence: A powerful paradigm for scientific research. The Innovation 2, 4 (2021), 100179. https://doi.org/10.1016/j.xinn.2021.100179
[43]
T. Zack, E. Lehman, M. Suzgun, J. A. Rodriguez, L. A. Celi, J. Gichoya, D. Jurafsky, P. Szolovits, D. W. Bates, R.-E. E. Abdulnour, A. J. Butte, and E. Alsentzer. 2024. Assessing the potential of GPT-4 to perpetuate racial and gender biases in health care: a model evaluation study. Nature 6, 17 (2024), E12–E22.

Index Terms

  1. Can ChatGPT emulate humans in software engineering surveys?

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ESEM '24: Proceedings of the 18th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement
    October 2024
    633 pages
    ISBN:9798400710476
    DOI:10.1145/3674805
    This work is licensed under a Creative Commons Attribution International 4.0 License.

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 24 October 2024

    Check for updates

    Author Tags

    1. Generative AI
    2. Mega-Personas
    3. Replication Study
    4. Survey

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Funding Sources

    Conference

    ESEM '24
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 130 of 594 submissions, 22%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 85
      Total Downloads
    • Downloads (Last 12 months)85
    • Downloads (Last 6 weeks)85
    Reflects downloads up to 19 Nov 2024

    Other Metrics

    Citations

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media