Nothing Special   »   [go: up one dir, main page]

Skip to main content

Advertisement

Log in

Can AI serve as a substitute for human subjects in software engineering research?

  • Published:
Automated Software Engineering Aims and scope Submit manuscript

Abstract

Research within sociotechnical domains, such as software engineering, fundamentally requires the human perspective. Nevertheless, traditional qualitative data collection methods suffer from difficulties in participant recruitment, scaling, and labor intensity. This vision paper proposes a novel approach to qualitative data collection in software engineering research by harnessing the capabilities of artificial intelligence (AI), especially large language models (LLMs) like ChatGPT and multimodal foundation models. We explore the potential of AI-generated synthetic text as an alternative source of qualitative data, discussing how LLMs can replicate human responses and behaviors in research settings. We discuss AI applications in emulating humans in interviews, focus groups, surveys, observational studies, and user evaluations. We discuss open problems and research opportunities to implement this vision. In the future, an integrated approach where both AI and human-generated data coexist will likely yield the most effective outcomes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  • Aher, G.V., Arriaga, R.I., Kalai, A.T.: Using large language models to simulate multiple humans and replicate human subject studies. In: International Conference on Machine Learning, pp. 337–371. PMLR (2023)

  • Argyle, L.P., Busby, E.C., Fulda, N., Gubler, J.R., Rytting, C., Wingate, D.: Out of one, many: using language models to simulate human samples. Polit. Anal. 31(3), 337–351 (2023)

    Article  Google Scholar 

  • Biber, D.: Text-linguistic approaches to register variation. Regist. Stud. 1(1), 42–75 (2019)

    Article  Google Scholar 

  • Burnett, M., Stumpf, S., Macbeth, J., Makri, S., Beckwith, L., Kwan, I., Peters, A., Jernigan, W.: Gendermag: a method for evaluating software’s gender inclusiveness. Interact. Comput. 28(6), 760–787 (2016)

    Article  Google Scholar 

  • Chaves, A.P., Egbert, J., Hocking, T., Doerry, E., Gerosa, M.A.: Chatbots language design: the influence of language variation on user experience with tourist assistant chatbots. ACM Trans. Comput. Hum. Interact. 29(2), 1–38 (2022)

    Article  Google Scholar 

  • Chew, R., Bollenbacher, J., Wenger, M., Speer, J., Kim, A.: LLM-assisted content analysis: using large language models to support deductive coding (2023). arXiv:2306.14924

  • Dai, S.-C., Xiong, A., Ku, L.-W.: LLM-in-the-loop: leveraging large language model for thematic analysis (2023). arXiv:2310.15100

  • De Paoli, S.: Improved prompting and process for writing user personas with LLMs, using qualitative interviews: capturing behaviour and personality traits of users (2023). arXiv:2310.06391

  • Demszky, D., Yang, D., Yeager, D.S., Bryan, C.J., Clapper, M., Chandhok, S., Eichstaedt, J.C., Hecht, C., Jamieson, J., Johnson, M., et al.: Using large language models in psychology. Nat. Rev. Psychol. 2, 1–14 (2023)

    Google Scholar 

  • Dillion, D., Tandon, N., Gu, Y., Gray, K.: Can AI language models replace human participants? Trends Cogn. Sci. 27(7), 597–600 (2023). https://doi.org/10.1016/j.tics.2023.04.008

    Article  Google Scholar 

  • Eliot, L.: The bold promise Of mega-personas as a new shake-up for prompt engineering generative AI techniques (2023). Accessed 08 Nov 2023. https://www.forbes.com/sites/lanceeliot/2023/08/15/the-bold-promise-of-mega-personas-as-a-new-shake-up-for-prompt-engineering-generative-ai-techniques/?sh=2be155065552

  • Gerosa, M., Wiese, I., Trinkenreich, B., Link, G., Robles, G., Treude, C., Steinmacher, I., Sarma, A.: The shifting sands of motivation: Revisiting what drives contributors in open source. In: 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE), pp. 1046–1058. IEEE (2021)

  • Hämäläinen, P., Tavast, M., Kunnari, A.: Evaluating large language models in generating synthetic HCI research data: a case study. In: Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems. CHI ’23. Association for Computing Machinery, New York, NY, USA (2023). https://doi.org/10.1145/3544548.3580688

  • Hutson, M., Mastin, A.: Guinea pigbots. Science (New York, NY) 381(6654), 121–123 (2023)

    Article  Google Scholar 

  • Jiang, H., Zhang, X., Cao, X., Kabbara, J., Roy, D.: PersonaLLM: investigating the ability of GPT-3.5 to express personality traits and gender differences (2023). arXiv:2305.02547

  • Jung, S.-g., Salminen, J., Kwak, H., An, J., Jansen, B.J.: Automatic persona generation (APG) a rationale and demonstration. In: Proceedings of the 2018 Conference on Human Information Interaction and Retrieval, pp. 321–324 (2018)

  • Kaddour, J., Harris, J., Mozes, M., Bradley, H., Raileanu, R., McHardy, R.: Challenges and applications of large language models (2023). arXiv:2307.10169

  • Kim, J., Lee, B.: AI-augmented surveys: leveraging large language models for opinion prediction in nationally representative surveys (2023). arXiv:2305.09620

  • Kokinda, E., Moster, M., Dominic, J., Rodeghero, P.: Under the bridge: trolling and the challenges of recruiting software developers for empirical research studies. In: 2023 IEEE/ACM 45th International Conference on Software Engineering: New Ideas and Emerging Results (ICSE-NIER), pp. 55–59 (2023). https://doi.org/10.1109/ICSE-NIER58687.2023.00016

  • Lee, S., Peng, T.-Q., Goldberg, M.H., Rosenthal, S.A., Kotcher, J.E., Maibach, E.W., Leiserowitz, A.: Can large language models capture public opinion about global warming? An empirical assessment of algorithmic fidelity and bias (2023). arXiv:2311.00217

  • Sanders, N.E., Ulinich, A., Schneier, B.: Demonstrations of the potential of AI-based political issue polling (2023). arXiv:2307.04781

  • Simmons, G., Hare, C.: Large language models as subpopulation representative models: a review (2023). arXiv:2310.17888

  • Smith, M., Danilova, A., Naiakshina, A.: A meta-research agenda for recruitment and study design for developer studies. In: 1st International Workshop on Recruiting Participants for Empirical Software Engineering (RoPES’22), 2022 IEEE/ACM 44th International Conference on Software Engineering (ICSE) (2022)

  • Storey, M.-A., Ernst, N.A., Williams, C., Kalliamvakou, E.: The who, what, how of software engineering research: a socio-technical framework. Empir. Softw. Eng. 25, 4097–4129 (2020)

    Article  Google Scholar 

  • Suguri Motoki, F.Y., Monteiro, J., Malagueño, R., Rodrigues, V.: From data scarcity to data abundance: crafting synthetic survey data in management accounting using ChatGPT (2023). Available at SSRN

  • Treude, C., Hata, H.: She elicits requirements and he tests: software engineering gender bias in large language models (2023). arXiv:2303.10131

  • Trinkenreich, B., Wiese, I., Sarma, A., Gerosa, M., Steinmacher, I.: Women’s participation in open source software: a survey of the literature. ACM Trans. Softw. Eng. Methodol. (TOSEM) 31(4), 1–37 (2022)

    Article  Google Scholar 

  • Wang, Z., Mao, S., Wu, W., Ge, T., Wei, F., Ji, H.: Unleashing cognitive synergy in large language models: a task-solving agent through multi-persona self-collaboration (2023). arXiv:2307.05300

  • Xiao, Z., Yuan, X., Liao, Q.V., Abdelghani, R., Oudeyer, P.-Y.: Supporting qualitative analysis with large language models: combining codebook with GPT-3 for deductive coding. In: Companion Proceedings of the 28th International Conference on Intelligent User Interfaces, pp. 75–78 (2023)

  • Zhou, Y., Muresanu, A.I., Han, Z., Paster, K., Pitis, S., Chan, H., Ba, J.: Large language models are human-level prompt engineers (2022). arXiv:2211.01910

Download references

Acknowledgements

Partial support of the NSF Grants 2236198, 2235601, 2247929, 2303043, and 2303042. ChatGPT v4 was used to copy-edit this article.

Author information

Authors and Affiliations

Authors

Contributions

MG: wrote the main manuscript text IS and AS: ideated about the paper and helped in copy editing. BT: participated in writing the paper and creating the reference list.

Corresponding author

Correspondence to Marco Gerosa.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gerosa, M., Trinkenreich, B., Steinmacher, I. et al. Can AI serve as a substitute for human subjects in software engineering research?. Autom Softw Eng 31, 13 (2024). https://doi.org/10.1007/s10515-023-00409-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10515-023-00409-6

Keywords

Navigation