abstract

Generative AI for Programming Education: Benchmarking ChatGPT, GPT-4, and Human Tutors

Authors:

Tung Phung,

Victor-Alexandru Pădurean,

Gustavo SoaresAuthors Info & Claims

ICER '23: Proceedings of the 2023 ACM Conference on International Computing Education Research - Volume 2

Pages 41 - 42

https://doi.org/10.1145/3568812.3603476

Published: 13 September 2023 Publication History

Get Access

Abstract

Generative AI and large language models hold great promise in enhancing computing education by powering next-generation educational technologies. State-of-the-art models like OpenAI’s ChatGPT [8] and GPT-4 [9] could enhance programming education in various roles, e.g., by acting as a personalized digital tutor for a student, a digital assistant for an educator, and a digital peer for collaborative learning [1, 2, 7]. In our work, we seek to comprehensively evaluate and benchmark state-of-the-art large language models for various scenarios in programming education.

Recent works have evaluated several large language models in the context of programming education [4, 6, 10, 11, 12]. However, these works are limited for several reasons: they have typically focused on evaluating a specific model for a specific education scenario (e.g., generating explanations), or have considered models that are already outdated (e.g., OpenAI’s Codex [3] is no longer publicly available since March 2023). Consequently, there is a lack of systematic study that benchmarks state-of-the-art models for a comprehensive set of programming education scenarios.

In our work, we systematically evaluate two models, ChatGPT (based on GPT-3.5) and GPT-4, and compare their performance with human tutors for a variety of scenarios in programming education. These scenarios are designed to capture distinct roles these models could play, namely digital tutors, assistants, and peers, as discussed above. More concretely, we consider the following six scenarios: (1) program repair, i.e., fixing a student’s buggy program; (2) hint generation, i.e., providing a natural language hint to the student to help resolve current issues; (3) grading feedback, i.e., grading a student’s program w.r.t. a given rubric; (4) peer programming, i.e., completing a partially written program or generating a sketch for the solution program; (5) task creation, i.e., generating new tasks that exercise specific types of concepts or bugs; (6) contextualized explanation, i.e., explaining specific concepts or functions in the context of a given program.

Our study uses a mix of quantitative and qualitative evaluation to compare the performance of these models with the performance of human tutors. We conduct our evaluation based on 5 introductory Python programming problems with a diverse set of input/output specifications. For each of these problems, we consider 5 buggy programs based on publicly accessible submissions from geeksforgeeks.org [5] (see Figure 1); these buggy programs are picked to capture different types of bugs for each problem. We will provide a detailed analysis of the data and results in a longer version of this poster. Our preliminary results show that GPT-4 drastically outperforms ChatGPT (based on GPT-3.5) and comes close to human tutors’ performance for several scenarios.

References

[1]

David Baidoo-Anu and Leticia Owusu Ansah. 2023. Education in the Era of Generative Artificial Intelligence (AI): Understanding the Potential Benefits of ChatGPT in Promoting Teaching and Learning. Available at SSRN 4337484 (2023).

Google Scholar

[2]

Sébastien Bubeck, Varun Chandrasekaran, Ronen Eldan, Johannes Gehrke, Eric Horvitz, Ece Kamar, Peter Lee, Yin Tat Lee, Yuanzhi Li, Scott M. Lundberg, Harsha Nori, Hamid Palangi, Marco Túlio Ribeiro, and Yi Zhang. 2023. Sparks of Artificial General Intelligence: Early Experiments with GPT-4. CoRR (2023).

Google Scholar

[3]

Mark Chen 2021. Evaluating Large Language Models Trained on Code. CoRR abs/2107.03374 (2021).

Google Scholar

[4]

James Finnie-Ansley, Paul Denny, Brett A. Becker, Andrew Luxton-Reilly, and James Prather. 2022. The Robots Are Coming: Exploring the Implications of OpenAI Codex on Introductory Programming. In ACE.

Google Scholar

[5]

geeksforgeeks.org. 2009. GeeksforGeeks: A Computer Science Portal for Geeks. https://www.geeksforgeeks.org/.

Google Scholar

[6]

Juho Leinonen, Arto Hellas, Sami Sarsa, Brent N. Reeves, Paul Denny, James Prather, and Brett A. Becker. 2023. Using Large Language Models to Enhance Programming Error Messages. In SIGCSE.

Google Scholar

[7]

Weng Marc Lim, Asanka Gunasekara, Jessica Leigh Pallant, Jason Ian Pallant, and Ekaterina Pechenkina. 2023. Generative AI and the Future of Education: Ragnarök or Reformation? A Paradoxical Perspective from Management Educators. The International Journal of Management Education 21, 2 (2023), 100790.

Crossref

Google Scholar

[8]

OpenAI. 2023. ChatGPT. https://openai.com/blog/chatgpt.

Google Scholar

[9]

OpenAI. 2023. GPT-4 Technical Report. CoRR abs/2303.08774 (2023).

Google Scholar

[10]

Tung Phung, José Cambronero, Sumit Gulwani, Tobias Kohn, Rupak Majumdar, Adish Singla, and Gustavo Soares. 2023. Generating High-Precision Feedback for Programming Syntax Errors using Large Language Models. In EDM.

Google Scholar

[11]

Sami Sarsa, Paul Denny, Arto Hellas, and Juho Leinonen. 2022. Automatic Generation of Programming Exercises and Code Explanations Using Large Language Models. In ICER.

Google Scholar

[12]

Jialu Zhang, José Cambronero, Sumit Gulwani, Vu Le, Ruzica Piskac, Gustavo Soares, and Gust Verbruggen. 2022. Repairing Bugs in Python Assignments Using Large Language Models. CoRR abs/2209.14876 (2022).

Google Scholar

Cited By

View all

Sikorski M(2024)Chat GPT Wrote it: What HCI Educators Can Learn from their Students?Proceedings of the 32nd International Conference on Information Systems Development10.62036/ISD.2024.25Online publication date: 2024
https://doi.org/10.62036/ISD.2024.25
Faisal Rashid SDuong-Trung NPinkwart N(2024)Generative AI in Education: Technical Foundations, Applications, and ChallengesArtificial Intelligence for Quality Education [Working Title]10.5772/intechopen.1005402Online publication date: 20-May-2024
https://doi.org/10.5772/intechopen.1005402
Sharpe JDougherty RSmith S(2024)Can ChatGPT Pass a CS1 Python Course?Journal of Computing Sciences in Colleges10.5555/3665609.366561839:8(128-142)Online publication date: 17-May-2024
https://dl.acm.org/doi/10.5555/3665609.3665618
Show More Cited By

Index Terms

Generative AI for Programming Education: Benchmarking ChatGPT, GPT-4, and Human Tutors
1. Computing methodologies
  1. Artificial intelligence
2. Social and professional topics
  1. Professional topics
    1. Computing education
      1. Computing education programs
        Computer science education

Recommendations

Evaluating ChatGPT and GPT-4 for Visual Programming
ICER '23: Proceedings of the 2023 ACM Conference on International Computing Education Research - Volume 2

Generative AI has the potential to drastically improve the landscape of computing education by automatically generating personalized feedback and content. In particular, this potential lies in the advanced capabilities of state-of-the-art deep ...
Exploring Human-Centered Approaches in Generative AI and Introductory Programming Research: A Scoping Review
UKICER '24: Proceedings of the 2024 Conference on United Kingdom & Ireland Computing Education Research

Recent advancements in generative artificial intelligence are poised to reshape introductory programming education, challenging conventional teaching methodologies. This paper presents a scoping review that explores the current understanding of ...
Notional machines and introductory programming education

This article brings together, summarizes, and comments on several threads of research that have contributed to our understanding of the challenges that novice programmers face when learning about the runtime dynamics of programs and the role of the ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

ICER '23: Proceedings of the 2023 ACM Conference on International Computing Education Research - Volume 2

August 2023

140 pages

ISBN:9781450399753

DOI:10.1145/3568812

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 September 2023

Check for updates

Author Tags

Qualifiers

Abstract
Research
Refereed limited

Funding Sources

European Research Council

Conference

ICER 2023

Sponsor:

SIGCSE

ICER 2023: ACM Conference on International Computing Education Research

August 7 - 11, 2023

IL, Chicago, USA

Acceptance Rates

Overall Acceptance Rate 189 of 803 submissions, 24%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

25
Total Citations
View Citations
619
Total Downloads

Downloads (Last 12 months)519
Downloads (Last 6 weeks)62

Reflects downloads up to 12 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

View all

Sikorski M(2024)Chat GPT Wrote it: What HCI Educators Can Learn from their Students?Proceedings of the 32nd International Conference on Information Systems Development10.62036/ISD.2024.25Online publication date: 2024
https://doi.org/10.62036/ISD.2024.25
Faisal Rashid SDuong-Trung NPinkwart N(2024)Generative AI in Education: Technical Foundations, Applications, and ChallengesArtificial Intelligence for Quality Education [Working Title]10.5772/intechopen.1005402Online publication date: 20-May-2024
https://doi.org/10.5772/intechopen.1005402
Sharpe JDougherty RSmith S(2024)Can ChatGPT Pass a CS1 Python Course?Journal of Computing Sciences in Colleges10.5555/3665609.366561839:8(128-142)Online publication date: 17-May-2024
https://dl.acm.org/doi/10.5555/3665609.3665618
Mahir AShohel MSall W(2024)The Role of AI in Programming EducationPractitioner Research in College-Based Education10.4018/979-8-3693-1499-9.ch012(319-352)Online publication date: 26-Apr-2024
https://doi.org/10.4018/979-8-3693-1499-9.ch012
Valverde-Rebaza JGonzález ANavarro-Hinojosa ONoguez J(2024)Advanced large language models and visualization tools for data analytics learningFrontiers in Education10.3389/feduc.2024.14180069Online publication date: 8-Aug-2024
https://doi.org/10.3389/feduc.2024.1418006
Hidayatullah EUntari RFifardin F(2024)Effectiveness of AI in solving math problems at the secondary school levelUnion: Jurnal Ilmiah Pendidikan Matematika10.30738/union.v12i2.1754812:2(350-360)Online publication date: 20-Jul-2024
https://doi.org/10.30738/union.v12i2.17548
Koutcheme CHellas AJoyner DKim MWang XXia M(2024)Propagating Large Language Models Programming FeedbackProceedings of the Eleventh ACM Conference on Learning @ Scale10.1145/3657604.3664665(366-370)Online publication date: 9-Jul-2024
https://dl.acm.org/doi/10.1145/3657604.3664665
Pal Chowdhury SZouhar VSachan MJoyner DKim MWang XXia M(2024)AutoTutor meets Large Language Models: A Language Model Tutor with Rich Pedagogy and GuardrailsProceedings of the Eleventh ACM Conference on Learning @ Scale10.1145/3657604.3662041(5-15)Online publication date: 9-Jul-2024
https://dl.acm.org/doi/10.1145/3657604.3662041
Frazier MDamevski KPollock LMonga MLonati VBarendsen ESheard JPaterson J(2024)Customizing ChatGPT to Help Computer Science Principles Students Learn Through ConversationProceedings of the 2024 on Innovation and Technology in Computer Science Education V. 110.1145/3649217.3653570(633-639)Online publication date: 3-Jul-2024
https://dl.acm.org/doi/10.1145/3649217.3653570
Phung TPădurean VSingh ABrooks CCambronero JGulwani SSingla ASoares G(2024)Automating Human Tutor-Style Programming Feedback: Leveraging GPT-4 Tutor Model for Hint Generation and GPT-3.5 Student Model for Hint ValidationProceedings of the 14th Learning Analytics and Knowledge Conference10.1145/3636555.3636846(12-23)Online publication date: 18-Mar-2024
https://dl.acm.org/doi/10.1145/3636555.3636846
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Abstract

References

Cited By

Index Terms

Recommendations

Evaluating ChatGPT and GPT-4 for Visual Programming

Exploring Human-Centered Approaches in Generative AI and Introductory Programming Research: A Scoping Review

Notional machines and introductory programming education

Comments

Information

Published In

Sponsors

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Funding Sources

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Get Access

Login options

Full Access

View options

PDF

eReader

HTML Format

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations