research-article

Open access

Can ChatGPT emulate humans in software engineering surveys?

Authors:

Igor Steinmacher,

Jacob Mcauley Penney,

Katia Romero Felizardo,

Alessandro F. Garcia,

Marco A. GerosaAuthors Info & Claims

ESEM '24: Proceedings of the 18th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement

Pages 414 - 419

https://doi.org/10.1145/3674805.3690744

Published: 24 October 2024 Publication History

All formats PDF

Abstract

Context: There is a growing belief in the literature that large language models (LLMs), such as ChatGPT, can mimic human behavior in surveys. Gap: While the literature has shown promising results in social sciences and market research, there is scant evidence of its effectiveness in technical fields like software engineering. Objective: Inspired by previous work, this paper explores ChatGPT’s ability to replicate findings from prior software engineering research. Given the frequent use of surveys in this field, if LLMs can accurately emulate human responses, this technique could address common methodological challenges like recruitment difficulties, representational shortcomings, and respondent fatigue. Method: We prompted ChatGPT to reflect the behavior of a ‘mega-persona’ representing the demographic distribution of interest. We replicated surveys from 2019 to 2023 from leading SE conferences, examining ChatGPT’s proficiency in mimicking responses from diverse demographics. Results: Our findings reveal that ChatGPT can successfully replicate the outcomes of some studies, but in others, the results were not significantly better than a random baseline. Conclusions: This paper reports our results so far and discusses the challenges and potential research opportunities in leveraging LLMs for representing humans in software engineering surveys.

References

[1]

W. Agnew, A. S. Bergman, J. Chien, M. Diaz, S. El-Sayed, J. Pittman, S. Mohamed, and K. R. McKee. 2024. The illusion of artificial inclusion. arxiv:2401.08572 [cs.CY]

[2]

G. V. Aher, R. I. Arriaga, and A. T. Kalai. 2023. Using large language models to simulate multiple humans and replicate human subject studies. In 40th International Conference on Machine Learning(ICML’23). PMLR, Honolulu, Hawaii, USA, 337–371.

Digital Library

[3]

R. S. Alsuhaibani, C. D. Newman, M. J. Decker, M. L. Collard, and J. I. Maletic. 2021. On the Naming of Methods: A Survey of Professional Developers. In ICSE ’21. IEEE Press, Madrid, Spain, 587–599. https://doi.org/10.1109/ICSE43902.2021.00061

Digital Library

[4]

A. Anghelescu, F. C. Firan, G. Onose, C. Munteanu, A. Trandafir, I. Ciobanu, S. Gheorghita, and V. Ciobanu. 2023. PRISMA Systematic Literature Review, including with Meta-Analysis vs. ChatbotGPT (AI) regarding Current Scientific Data on the Main Effects of the Calf Blood Deproteinized Hemoderivative Medicine (Actovegin) in Ischemic Stroke. Biomedicines 6, 11 (2023), 1–13.

[5]

L. P. Argyle, E. C. Busby, N. Fulda, J. R. Gubler, C. Rytting, and D. Wingate. 2023. Out of one, many: Using language models to simulate human samples. Political Analysis 31, 3 (2023), 337–351.

[6]

S. Asthana, H. Sajnani, E. Voyloshnikova, B. Acharya, and K. Herzig. 2023. A Case Study of Developer Bots: Motivations, Perceptions, and Challenges. In ESEC/FSE’23 (San Francisco, CA, USA). Association for Computing Machinery, New York, NY, USA, 1268–1280. https://doi.org/10.1145/3611643.3616248

Digital Library

[7]

Anonymous Author(s). 2024. Replication Package: Can ChatGPT emulate humans in software engineering surveys?https://doi.org/10.5281/zenodo.10578334

[8]

N. Bodani, A. Lal, A. Maqsood, S. Altamash, N. Ahmed, and A. Heboyan. 2023. Knowledge, Attitude, and Practices of General Population Toward Utilizing ChatGPT: A Cross-sectional Study. SAGE Open 13 (2023), 1–9. https://doi.org/10.1177/21582440231211079

[9]

L. Braz and A. Bacchelli. 2022. Software security during modern code review: the developer’s perspective. In ESEC/FSE’22 (Singapore). Association for Computing Machinery, New York, NY, USA, 810–821. https://doi.org/10.1145/3540250.3549135

Digital Library

[10]

A. Chatterjee, M. Guizani, C. Stevens, J. Emard, M. E. May, M. Burnett, I. Ahmed, and A. Sarma. 2021. AID: An automated detector for gender-inclusivity bugs in OSS project pages. In ICSE ’21. IEEE Press, Madrid, Spain, 1423–1435. https://doi.org/10.1109/ICSE43902.2021.00128

Digital Library

[11]

B. Chen, L. Chen, C. Zhang, and X. Peng. 2021. BuildFast: history-aware build outcome prediction for fast feedback and reduced cost in continuous integration. In ASE ’20 (Virtual Event, Australia). Association for Computing Machinery, New York, NY, USA, 42–53. https://doi.org/10.1145/3324884.3416616

Digital Library

[12]

S. Cruz, F. Q. B. da Silva, and L. F. Capretz. 2015. Forty years of research on personality in software engineering: A mapping study. Computers in Human Behavior 46 (2015), 94–113. https://doi.org/10.1016/j.chb.2014.12.008

Digital Library

[13]

D. Demszky, D. Yang, D. S. Yeager, C. J. Bryan, M. Clapper, S. Chandhok, J. C. Eichstaedt, C. Hecht, J. Jamieson, M. Johnson, 2023. Using large language models in psychology. Nature Reviews Psychology 2, 1 (2023), 1–14.

[14]

D. Dillion, N. Tandon, Y. Gu, and K. Gray. 2023. Can AI language models replace human participants?Trends in Cognitive Sciences 27, 7 (2023), 597–600. https://doi.org/10.1016/j.tics.2023.04.008

[15]

Z. Feng, A. Chatterjee, A. Sarma, and I. Ahmed. 2022. A case study of implicit mentoring, its prevalence, and impact in Apache. In ESEC/FSE ’22 (Singapore). Association for Computing Machinery, New York, NY, USA, 797–809. https://doi.org/10.1145/3540250.3549167

Digital Library

[16]

S. García, D. Strüber, D. Brugali, T. Berger, and P. Pelliccione. 2020. Robotics software engineering: a perspective from the service robotics domain. In 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (Virtual Event, USA) (ESEC/FSE ’20). Association for Computing Machinery, New York, NY, USA, 593–604. https://doi.org/10.1145/3368089.3409743

Digital Library

[17]

M. Gerosa, B. Trinkenreich, I. Steinmacher, and A. Sarma. 2023. Can AI serve as a substitute for human subjects in software engineering research?Automated Software Engineering 31, 13 (2023), 1–25. https://doi.org/10.1007/s10515-023-00409-6

Digital Library

[18]

A. N. Ghazi, K. Petersen, S. S. V. R. Reddy, and H. Nekkanti. 2019. Survey Research in Software Engineering: Problems and Mitigation Strategies. IEEE Access 7, 1 (2019), 24703–24718. https://doi.org/10.1109/ACCESS.2018.2881041

[19]

F. Grund, S. Chowdhury, N. C. Bradley, B. Hall, and R. Holmes. 2021. CodeShovel: Constructing Method-Level Source Code Histories. In ICSE ’21. IEEE Press, Madrid, Spain, 1510–1522. https://doi.org/10.1109/ICSE43902.2021.00135

Digital Library

[20]

P. Hämäläinen, M. Tavast, and A. Kunnari. 2023. Evaluating Large Language Models in Generating Synthetic HCI Research Data: A Case Study. In CHI’23 (Hamburg, Germany). Association for Computing Machinery, New York, NY, USA, Article 433, 19 pages. https://doi.org/10.1145/3544548.3580688

Digital Library

[21]

M. Hutson and A. Mastin. 2023. Guinea pigbots. Science (New York, NY) 381, 6654 (2023), 121–123.

[22]

H. Jiang, X. Zhang, X. Cao, J. Kabbara, and D. Roy. 2023. PersonaLLM: Investigating the ability of GPT-3.5 to express personality traits and gender differences.

[23]

A. Ju, H. Sajnani, S. Kelly, and K. Herzig. 2021. A Case Study of Onboarding in Software Teams: Tasks and Strategies. In ICSE ’21. IEEE Press, Madrid, Spain, 613–623. https://doi.org/10.1109/ICSE43902.2021.00063

Digital Library

[24]

J. Kaddour, J. Harris, M. Mozes, H. Bradley, R. Raileanu, and R. McHardy. 2023. Challenges and applications of large language models.

[25]

J. Kim and B. Lee. 2023. AI-Augmented Surveys: Leveraging Large Language Models for Opinion Prediction in Nationally Representative Surveys.

[26]

B. A. Kitchenham and S. L. Pfleeger. 2002. Principles of survey research part 2: designing a survey. SIGSOFT Softw. Eng. Notes 27, 1 (jan 2002), 18––20. https://doi.org/10.1145/566493.566495

Digital Library

[27]

E. Kokinda, M. Moster, J. Dominic, and P. Rodeghero. 2023. Under the Bridge: Trolling and the Challenges of Recruiting Software Developers for Empirical Research Studies. In ICSE-NIER ’23. Association for Computing Machinery, Melbourne, Australia, 55–59. https://doi.org/10.1109/ICSE-NIER58687.2023.00016

Digital Library

[28]

S. Lee, T.-Q. Peng, M. H. Goldberg, S. A. Rosenthal, J. E. Kotcher, E. W. Maibach, and A. Leiserowitz. 2023. Can Large Language Models Capture Public Opinion about Global Warming? An Empirical Assessment of Algorithmic Fidelity and Bias.

[29]

J. S. Molléri, K. Petersen, and E. Mendes. 2016. Survey Guidelines in Software Engineering: An Annotated Review. In ESEM ’16. Association for Computing Machinery, New York, NY, USA, Article 58, 6 pages. https://doi.org/10.1145/2961111.2962619

Digital Library

[30]

S. Motoki, F. Yoshio, J. Monteiro, R. Malagueño, and V. Rodrigues. 2023. From Data Scarcity to Data Abundance: Crafting Synthetic Survey Data in Management Accounting using ChatGPT.

[31]

OpenAI. 2024. ChatGPT (4) [Large language model]. https://chat.openai.com

[32]

P. Petrak, T. T. Tran, and I. Gurevych. 2024. Learning from Emotions, Demographic Information and Implicit User Feedback in Task-Oriented Document-Grounded Dialogues. arxiv:2401.09248 [cs.CL]

[33]

S. L. Pfleeger and B. A. Kitchenham. 2001. Principles of survey research: part 1: turning lemons into lemonade. SIGSOFT Softw. Eng. Notes 26, 6 (nov 2001), 16––18. https://doi.org/10.1145/505532.505535

Digital Library

[34]

N. E Sanders, A. Ulinich, and B. Schneier. 2023. Demonstrations of the potential of AI-based political issue polling.

[35]

G. Simmons and C. Hare. 2023. Large Language Models as Subpopulation Representative Models: A Review.

[36]

D. Sokolowski, P. Weisenburger, and G. Salvaneschi. 2021. Automating serverless deployments for DevOps organizations. In ESEC/FSE’21 (Athens, Greece). Association for Computing Machinery, New York, NY, USA, 57–69. https://doi.org/10.1145/3468264.3468575

Digital Library

[37]

E. A. M. van Dis, J. Bollen, W. Zuidema, R. van Rooij, and C. L. Bockting. 2023. ChatGPT: five priorities for research. Nature 614, 7947 (2023), 224–226. https://doi.org/10.1038/d41586-023-00288-7

[38]

E. L. Vargas, M. Aniche, C. Treude, M. Bruntink, and G. Gousios. 2020. Selecting third-party libraries: the practitioners’ perspective. In ESEC/FSE ’20 (Virtual Event, USA). Association for Computing Machinery, New York, NY, USA, 245–256. https://doi.org/10.1145/3368089.3409711

Digital Library

[39]

S. Wagner, D. Mendez, M. Felderer, D. Graziotin, and M. Kalinowski. 2020. Challenges in Survey Research. Springer International Publishing, Cham, 93–125. https://doi.org/10.1007/978-3-030-32489-6_4

[40]

J. Wen and W. Wang. 2023. The future of ChatGPT in academic research and publishing: A commentary for clinical and translational medicine. Clinical and Translational Medicine 13, 3 (2023), e1207. https://doi.org/10.1002/ctm2.1207

[41]

D. G. Widder, M. Hilton, C. Kästner, and B. Vasilescu. 2019. A conceptual replication of continuous integration pain points in the context of Travis CI. In 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (Tallinn, Estonia) (ESEC/FSE ’19). Association for Computing Machinery, New York, NY, USA, 647–658. https://doi.org/10.1145/3338906.3338922

Digital Library

[42]

Y. Xu, X. Liu, X. Cao, C. Huang, E. Liu, S. Qian, X. Liu, Y. Wu, F. Dong, C.-W. Qiu, J. Qiu, K. Hua, W. Su, J. Wu, H. Xu, Y. Han, Ch. Fu, Z. Yin, M. Liu, R. Roepman, S. Dietmann, M. Virta, F. Kengara, Z. Zhang, L. Zhang, T. Zhao, J. Dai, J. Yang, L. Lan, M. Luo, Z. Liu, T. An, B. Zhang, X. He, S. Cong, X. Liu, W. Zhang, J. P. Lewis, J. M. Tiedje, Q. Wang, Z. An, F. Wang, L. Zhang, T. Huang, C. Lu, Z. Cai, F. Wang, and J. Zhang. 2021. Artificial intelligence: A powerful paradigm for scientific research. The Innovation 2, 4 (2021), 100179. https://doi.org/10.1016/j.xinn.2021.100179

[43]

T. Zack, E. Lehman, M. Suzgun, J. A. Rodriguez, L. A. Celi, J. Gichoya, D. Jurafsky, P. Szolovits, D. W. Bates, R.-E. E. Abdulnour, A. J. Butte, and E. Alsentzer. 2024. Assessing the potential of GPT-4 to perpetuate racial and gender biases in health care: a model evaluation study. Nature 6, 17 (2024), E12–E22.

Index Terms

Can ChatGPT emulate humans in software engineering surveys?
1. General and reference
  1. Cross-computing tools and techniques
    1. Empirical studies

Recommendations

How ChatGPT Will Change Software Engineering Education
ITiCSE 2023: Proceedings of the 2023 Conference on Innovation and Technology in Computer Science Education V. 1

This position paper discusses the potential for using generative AIs like ChatGPT in software engineering education. Currently, discussions center around potential threats emerging from student's use of ChatGPT. For instance, generative AI will limit the ...
Surveys in Software Engineering: Identifying Representative Samples
ESEM '16: Proceedings of the 10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement

Context: The representativeness of samples in Software Engineering primary studies is still a great challenge, mainly when identifying available sources for establishing adequate sampling frames, characterizing subjects and stimulating their ...
Some lessons learned in conducting software engineering surveys in china
ESEM '08: Proceedings of the Second ACM-IEEE international symposium on Empirical software engineering and measurement

Component-Based Software Engineering (CBSE) with Open Source Software and Commercial-Off-the-Shelf (COTS) components, Open Source Software (OSS) based development, and Software Outsourcing (SO) are becoming increasingly important for the Chinese ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

ESEM '24: Proceedings of the 18th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement

October 2024

633 pages

ISBN:9798400710476

DOI:10.1145/3674805

Copyright © 2024 Owner/Author.

This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

SIGSOFT: ACM Special Interest Group on Software Engineering

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 October 2024

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

National Science Foundation
CNPq

Conference

ESEM '24

Sponsor:

SIGSOFT

ESEM '24: ACM / IEEE International Symposium on Empirical Software Engineering and Measurement

October 24 - 25, 2024

Barcelona, Spain

Acceptance Rates

Overall Acceptance Rate 130 of 594 submissions, 22%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
85
Total Downloads

Downloads (Last 12 months)85
Downloads (Last 6 weeks)85

Reflects downloads up to 19 Nov 2024

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents