Abstract
Research within sociotechnical domains, such as software engineering, fundamentally requires the human perspective. Nevertheless, traditional qualitative data collection methods suffer from difficulties in participant recruitment, scaling, and labor intensity. This vision paper proposes a novel approach to qualitative data collection in software engineering research by harnessing the capabilities of artificial intelligence (AI), especially large language models (LLMs) like ChatGPT and multimodal foundation models. We explore the potential of AI-generated synthetic text as an alternative source of qualitative data, discussing how LLMs can replicate human responses and behaviors in research settings. We discuss AI applications in emulating humans in interviews, focus groups, surveys, observational studies, and user evaluations. We discuss open problems and research opportunities to implement this vision. In the future, an integrated approach where both AI and human-generated data coexist will likely yield the most effective outcomes.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Aher, G.V., Arriaga, R.I., Kalai, A.T.: Using large language models to simulate multiple humans and replicate human subject studies. In: International Conference on Machine Learning, pp. 337–371. PMLR (2023)
Argyle, L.P., Busby, E.C., Fulda, N., Gubler, J.R., Rytting, C., Wingate, D.: Out of one, many: using language models to simulate human samples. Polit. Anal. 31(3), 337–351 (2023)
Biber, D.: Text-linguistic approaches to register variation. Regist. Stud. 1(1), 42–75 (2019)
Burnett, M., Stumpf, S., Macbeth, J., Makri, S., Beckwith, L., Kwan, I., Peters, A., Jernigan, W.: Gendermag: a method for evaluating software’s gender inclusiveness. Interact. Comput. 28(6), 760–787 (2016)
Chaves, A.P., Egbert, J., Hocking, T., Doerry, E., Gerosa, M.A.: Chatbots language design: the influence of language variation on user experience with tourist assistant chatbots. ACM Trans. Comput. Hum. Interact. 29(2), 1–38 (2022)
Chew, R., Bollenbacher, J., Wenger, M., Speer, J., Kim, A.: LLM-assisted content analysis: using large language models to support deductive coding (2023). arXiv:2306.14924
Dai, S.-C., Xiong, A., Ku, L.-W.: LLM-in-the-loop: leveraging large language model for thematic analysis (2023). arXiv:2310.15100
De Paoli, S.: Improved prompting and process for writing user personas with LLMs, using qualitative interviews: capturing behaviour and personality traits of users (2023). arXiv:2310.06391
Demszky, D., Yang, D., Yeager, D.S., Bryan, C.J., Clapper, M., Chandhok, S., Eichstaedt, J.C., Hecht, C., Jamieson, J., Johnson, M., et al.: Using large language models in psychology. Nat. Rev. Psychol. 2, 1–14 (2023)
Dillion, D., Tandon, N., Gu, Y., Gray, K.: Can AI language models replace human participants? Trends Cogn. Sci. 27(7), 597–600 (2023). https://doi.org/10.1016/j.tics.2023.04.008
Eliot, L.: The bold promise Of mega-personas as a new shake-up for prompt engineering generative AI techniques (2023). Accessed 08 Nov 2023. https://www.forbes.com/sites/lanceeliot/2023/08/15/the-bold-promise-of-mega-personas-as-a-new-shake-up-for-prompt-engineering-generative-ai-techniques/?sh=2be155065552
Gerosa, M., Wiese, I., Trinkenreich, B., Link, G., Robles, G., Treude, C., Steinmacher, I., Sarma, A.: The shifting sands of motivation: Revisiting what drives contributors in open source. In: 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE), pp. 1046–1058. IEEE (2021)
Hämäläinen, P., Tavast, M., Kunnari, A.: Evaluating large language models in generating synthetic HCI research data: a case study. In: Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems. CHI ’23. Association for Computing Machinery, New York, NY, USA (2023). https://doi.org/10.1145/3544548.3580688
Hutson, M., Mastin, A.: Guinea pigbots. Science (New York, NY) 381(6654), 121–123 (2023)
Jiang, H., Zhang, X., Cao, X., Kabbara, J., Roy, D.: PersonaLLM: investigating the ability of GPT-3.5 to express personality traits and gender differences (2023). arXiv:2305.02547
Jung, S.-g., Salminen, J., Kwak, H., An, J., Jansen, B.J.: Automatic persona generation (APG) a rationale and demonstration. In: Proceedings of the 2018 Conference on Human Information Interaction and Retrieval, pp. 321–324 (2018)
Kaddour, J., Harris, J., Mozes, M., Bradley, H., Raileanu, R., McHardy, R.: Challenges and applications of large language models (2023). arXiv:2307.10169
Kim, J., Lee, B.: AI-augmented surveys: leveraging large language models for opinion prediction in nationally representative surveys (2023). arXiv:2305.09620
Kokinda, E., Moster, M., Dominic, J., Rodeghero, P.: Under the bridge: trolling and the challenges of recruiting software developers for empirical research studies. In: 2023 IEEE/ACM 45th International Conference on Software Engineering: New Ideas and Emerging Results (ICSE-NIER), pp. 55–59 (2023). https://doi.org/10.1109/ICSE-NIER58687.2023.00016
Lee, S., Peng, T.-Q., Goldberg, M.H., Rosenthal, S.A., Kotcher, J.E., Maibach, E.W., Leiserowitz, A.: Can large language models capture public opinion about global warming? An empirical assessment of algorithmic fidelity and bias (2023). arXiv:2311.00217
Sanders, N.E., Ulinich, A., Schneier, B.: Demonstrations of the potential of AI-based political issue polling (2023). arXiv:2307.04781
Simmons, G., Hare, C.: Large language models as subpopulation representative models: a review (2023). arXiv:2310.17888
Smith, M., Danilova, A., Naiakshina, A.: A meta-research agenda for recruitment and study design for developer studies. In: 1st International Workshop on Recruiting Participants for Empirical Software Engineering (RoPES’22), 2022 IEEE/ACM 44th International Conference on Software Engineering (ICSE) (2022)
Storey, M.-A., Ernst, N.A., Williams, C., Kalliamvakou, E.: The who, what, how of software engineering research: a socio-technical framework. Empir. Softw. Eng. 25, 4097–4129 (2020)
Suguri Motoki, F.Y., Monteiro, J., Malagueño, R., Rodrigues, V.: From data scarcity to data abundance: crafting synthetic survey data in management accounting using ChatGPT (2023). Available at SSRN
Treude, C., Hata, H.: She elicits requirements and he tests: software engineering gender bias in large language models (2023). arXiv:2303.10131
Trinkenreich, B., Wiese, I., Sarma, A., Gerosa, M., Steinmacher, I.: Women’s participation in open source software: a survey of the literature. ACM Trans. Softw. Eng. Methodol. (TOSEM) 31(4), 1–37 (2022)
Wang, Z., Mao, S., Wu, W., Ge, T., Wei, F., Ji, H.: Unleashing cognitive synergy in large language models: a task-solving agent through multi-persona self-collaboration (2023). arXiv:2307.05300
Xiao, Z., Yuan, X., Liao, Q.V., Abdelghani, R., Oudeyer, P.-Y.: Supporting qualitative analysis with large language models: combining codebook with GPT-3 for deductive coding. In: Companion Proceedings of the 28th International Conference on Intelligent User Interfaces, pp. 75–78 (2023)
Zhou, Y., Muresanu, A.I., Han, Z., Paster, K., Pitis, S., Chan, H., Ba, J.: Large language models are human-level prompt engineers (2022). arXiv:2211.01910
Acknowledgements
Partial support of the NSF Grants 2236198, 2235601, 2247929, 2303043, and 2303042. ChatGPT v4 was used to copy-edit this article.
Author information
Authors and Affiliations
Contributions
MG: wrote the main manuscript text IS and AS: ideated about the paper and helped in copy editing. BT: participated in writing the paper and creating the reference list.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Gerosa, M., Trinkenreich, B., Steinmacher, I. et al. Can AI serve as a substitute for human subjects in software engineering research?. Autom Softw Eng 31, 13 (2024). https://doi.org/10.1007/s10515-023-00409-6
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10515-023-00409-6