Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3568812.3603476acmconferencesArticle/Chapter ViewAbstractPublication PagesicerConference Proceedingsconference-collections
abstract

Generative AI for Programming Education: Benchmarking ChatGPT, GPT-4, and Human Tutors

Published: 13 September 2023 Publication History

Abstract

Generative AI and large language models hold great promise in enhancing computing education by powering next-generation educational technologies. State-of-the-art models like OpenAI’s ChatGPT [8] and GPT-4 [9] could enhance programming education in various roles, e.g., by acting as a personalized digital tutor for a student, a digital assistant for an educator, and a digital peer for collaborative learning [1, 2, 7]. In our work, we seek to comprehensively evaluate and benchmark state-of-the-art large language models for various scenarios in programming education.
Recent works have evaluated several large language models in the context of programming education [4, 6, 10, 11, 12]. However, these works are limited for several reasons: they have typically focused on evaluating a specific model for a specific education scenario (e.g., generating explanations), or have considered models that are already outdated (e.g., OpenAI’s Codex [3] is no longer publicly available since March 2023). Consequently, there is a lack of systematic study that benchmarks state-of-the-art models for a comprehensive set of programming education scenarios.
In our work, we systematically evaluate two models, ChatGPT (based on GPT-3.5) and GPT-4, and compare their performance with human tutors for a variety of scenarios in programming education. These scenarios are designed to capture distinct roles these models could play, namely digital tutors, assistants, and peers, as discussed above. More concretely, we consider the following six scenarios: (1) program repair, i.e., fixing a student’s buggy program; (2) hint generation, i.e., providing a natural language hint to the student to help resolve current issues; (3) grading feedback, i.e., grading a student’s program w.r.t. a given rubric; (4) peer programming, i.e., completing a partially written program or generating a sketch for the solution program; (5) task creation, i.e., generating new tasks that exercise specific types of concepts or bugs; (6) contextualized explanation, i.e., explaining specific concepts or functions in the context of a given program.
Our study uses a mix of quantitative and qualitative evaluation to compare the performance of these models with the performance of human tutors. We conduct our evaluation based on 5 introductory Python programming problems with a diverse set of input/output specifications. For each of these problems, we consider 5 buggy programs based on publicly accessible submissions from geeksforgeeks.org  [5] (see Figure 1); these buggy programs are picked to capture different types of bugs for each problem. We will provide a detailed analysis of the data and results in a longer version of this poster. Our preliminary results show that GPT-4 drastically outperforms ChatGPT (based on GPT-3.5) and comes close to human tutors’ performance for several scenarios.

References

[1]
David Baidoo-Anu and Leticia Owusu Ansah. 2023. Education in the Era of Generative Artificial Intelligence (AI): Understanding the Potential Benefits of ChatGPT in Promoting Teaching and Learning. Available at SSRN 4337484 (2023).
[2]
Sébastien Bubeck, Varun Chandrasekaran, Ronen Eldan, Johannes Gehrke, Eric Horvitz, Ece Kamar, Peter Lee, Yin Tat Lee, Yuanzhi Li, Scott M. Lundberg, Harsha Nori, Hamid Palangi, Marco Túlio Ribeiro, and Yi Zhang. 2023. Sparks of Artificial General Intelligence: Early Experiments with GPT-4. CoRR (2023).
[3]
Mark Chen 2021. Evaluating Large Language Models Trained on Code. CoRR abs/2107.03374 (2021).
[4]
James Finnie-Ansley, Paul Denny, Brett A. Becker, Andrew Luxton-Reilly, and James Prather. 2022. The Robots Are Coming: Exploring the Implications of OpenAI Codex on Introductory Programming. In ACE.
[5]
geeksforgeeks.org. 2009. GeeksforGeeks: A Computer Science Portal for Geeks. https://www.geeksforgeeks.org/.
[6]
Juho Leinonen, Arto Hellas, Sami Sarsa, Brent N. Reeves, Paul Denny, James Prather, and Brett A. Becker. 2023. Using Large Language Models to Enhance Programming Error Messages. In SIGCSE.
[7]
Weng Marc Lim, Asanka Gunasekara, Jessica Leigh Pallant, Jason Ian Pallant, and Ekaterina Pechenkina. 2023. Generative AI and the Future of Education: Ragnarök or Reformation? A Paradoxical Perspective from Management Educators. The International Journal of Management Education 21, 2 (2023), 100790.
[8]
OpenAI. 2023. ChatGPT. https://openai.com/blog/chatgpt.
[9]
OpenAI. 2023. GPT-4 Technical Report. CoRR abs/2303.08774 (2023).
[10]
Tung Phung, José Cambronero, Sumit Gulwani, Tobias Kohn, Rupak Majumdar, Adish Singla, and Gustavo Soares. 2023. Generating High-Precision Feedback for Programming Syntax Errors using Large Language Models. In EDM.
[11]
Sami Sarsa, Paul Denny, Arto Hellas, and Juho Leinonen. 2022. Automatic Generation of Programming Exercises and Code Explanations Using Large Language Models. In ICER.
[12]
Jialu Zhang, José Cambronero, Sumit Gulwani, Vu Le, Ruzica Piskac, Gustavo Soares, and Gust Verbruggen. 2022. Repairing Bugs in Python Assignments Using Large Language Models. CoRR abs/2209.14876 (2022).

Cited By

View all
  • (2024)Chat GPT Wrote it: What HCI Educators Can Learn from their Students?Proceedings of the 32nd International Conference on Information Systems Development10.62036/ISD.2024.25Online publication date: 2024
  • (2024)Generative AI in Education: Technical Foundations, Applications, and ChallengesArtificial Intelligence for Quality Education [Working Title]10.5772/intechopen.1005402Online publication date: 20-May-2024
  • (2024)Can ChatGPT Pass a CS1 Python Course?Journal of Computing Sciences in Colleges10.5555/3665609.366561839:8(128-142)Online publication date: 17-May-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ICER '23: Proceedings of the 2023 ACM Conference on International Computing Education Research - Volume 2
August 2023
140 pages
ISBN:9781450399753
DOI:10.1145/3568812
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 September 2023

Check for updates

Author Tags

  1. ChatGPT
  2. generative AI
  3. introductory programming education
  4. large language models

Qualifiers

  • Abstract
  • Research
  • Refereed limited

Funding Sources

Conference

ICER 2023
Sponsor:

Acceptance Rates

Overall Acceptance Rate 189 of 803 submissions, 24%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)519
  • Downloads (Last 6 weeks)62
Reflects downloads up to 12 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Chat GPT Wrote it: What HCI Educators Can Learn from their Students?Proceedings of the 32nd International Conference on Information Systems Development10.62036/ISD.2024.25Online publication date: 2024
  • (2024)Generative AI in Education: Technical Foundations, Applications, and ChallengesArtificial Intelligence for Quality Education [Working Title]10.5772/intechopen.1005402Online publication date: 20-May-2024
  • (2024)Can ChatGPT Pass a CS1 Python Course?Journal of Computing Sciences in Colleges10.5555/3665609.366561839:8(128-142)Online publication date: 17-May-2024
  • (2024)The Role of AI in Programming EducationPractitioner Research in College-Based Education10.4018/979-8-3693-1499-9.ch012(319-352)Online publication date: 26-Apr-2024
  • (2024)Advanced large language models and visualization tools for data analytics learningFrontiers in Education10.3389/feduc.2024.14180069Online publication date: 8-Aug-2024
  • (2024)Effectiveness of AI in solving math problems at the secondary school levelUnion: Jurnal Ilmiah Pendidikan Matematika10.30738/union.v12i2.1754812:2(350-360)Online publication date: 20-Jul-2024
  • (2024)Propagating Large Language Models Programming FeedbackProceedings of the Eleventh ACM Conference on Learning @ Scale10.1145/3657604.3664665(366-370)Online publication date: 9-Jul-2024
  • (2024)AutoTutor meets Large Language Models: A Language Model Tutor with Rich Pedagogy and GuardrailsProceedings of the Eleventh ACM Conference on Learning @ Scale10.1145/3657604.3662041(5-15)Online publication date: 9-Jul-2024
  • (2024)Customizing ChatGPT to Help Computer Science Principles Students Learn Through ConversationProceedings of the 2024 on Innovation and Technology in Computer Science Education V. 110.1145/3649217.3653570(633-639)Online publication date: 3-Jul-2024
  • (2024)Automating Human Tutor-Style Programming Feedback: Leveraging GPT-4 Tutor Model for Hint Generation and GPT-3.5 Student Model for Hint ValidationProceedings of the 14th Learning Analytics and Knowledge Conference10.1145/3636555.3636846(12-23)Online publication date: 18-Mar-2024
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media