research-article

Detecting ChatGPT-Generated Code Submissions in a CS1 Course Using Machine Learning Models

Authors:

Damilola Babalola,

Bita AkramAuthors Info & Claims

SIGCSE 2024: Proceedings of the 55th ACM Technical Symposium on Computer Science Education V. 1

Pages 526 - 532

https://doi.org/10.1145/3626252.3630826

Published: 07 March 2024 Publication History

Abstract

The emergence of publicly accessible large language models (LLMs) such as ChatGPT poses unprecedented risks of new types of plagiarism and cheating where students use LLMs to solve exercises for them. Detecting this behavior will be a necessary component in introductory computer science (CS1) courses, and educators should be well-equipped with detection tools when the need arises. However, ChatGPT generates code non-deterministically, and thus, traditional similarity detectors might not suffice to detect AI-created code. In this work, we explore the affordances of Machine Learning (ML) models for the detection task. We used an openly available dataset of student programs for CS1 assignments and had ChatGPT generate code for the same assignments, and then evaluated the performance of both traditional machine learning models and Abstract Syntax Tree-based (AST-based) deep learning models in detecting ChatGPT code from student code submissions. Our results suggest that both traditional machine learning models and AST-based deep learning models are effective in identifying ChatGPT-generated code with accuracy above 90%. Since the deployment of such models requires ML knowledge and resources that are not always accessible to instructors, we also explore the patterns detected by deep learning models that indicate possible ChatGPT code signatures, which instructors could possibly use to detect LLM-based cheating manually. We also explore whether explicitly asking ChatGPT to impersonate a novice programmer affects the code produced. We further discuss the potential applications of our proposed models for enhancing introductory computer science instruction.

References

[1]

Bita Akram, Wookhee Min, Eric Wiebe, Bradford Mott, Kristy Elizabeth Boyer, and James Lester. 2018. Improving stealth assessment in game-based learning with LSTM-based analytics. In EDM. 208--218.

[2]

Bita Akram, Wookhe Min, Eric Wiebe, Anam Navied, Bradford Mott, Kristy Elizabeth Boyer, James Lester, et al. 2020. Automated assessment of computer science competencies from student programs with gaussian process regression. In EDM.

[3]

Ibrahim Albluwi. 2019. Plagiarism in programming assessments: a systematic review. TOCE, Vol. 20, 1 (2019), 1--28.

Digital Library

[4]

Uri Alon, Meital Zilberstein, Omer Levy, and Eran Yahav. 2019. code2vec: Learning distributed representations of code. POPL, Vol. 3 (2019), 1--29.

[5]

Brett A Becker, Paul Denny, James Finnie-Ansley, Andrew Luxton-Reilly, James Prather, and Eddie Antonio Santos. 2023. Programming Is Hard-Or at Least It Used to Be: Educational Opportunities and Challenges of AI Code Generation. In SIGCSE. 500--506.

[6]

Georgina Cosma and Mike Joy. 2008. Towards a definition of source-code plagiarism. IEEE Trans. on Ed., Vol. 51, 2 (2008), 195--200.

Digital Library

[7]

Seife Dendir and R Stockton Maxwell. 2020. Cheating in online courses: Evidence from online proctoring. Computers in Human Behavior Reports, Vol. 2 (2020), 100033.

[8]

Paul Denny, Viraj Kumar, and Nasser Giacaman. 2023a. Conversing with Copilot: Exploring prompt engineering for solving CS1 problems using natural language. In SIGCSE. 1136--1142.

[9]

Paul Denny, James Prather, Brett A Becker, James Finnie-Ansley, Arto Hellas, Juho Leinonen, Andrew Luxton-Reilly, Brent N Reeves, Eddie Antonio Santos, and Sami Sarsa. 2023b. Computing Education in the Era of Generative AI. arXiv preprint arXiv:2306.02608 (2023).

[10]

Paul Denny, Sami Sarsa, Arto Hellas, and Juho Leinonen. 2022 Robosourcing Educational Resources--Leveraging Large Language Models for Learnersourcing. arXiv preprint arXiv:2211.04715 (2022).

[11]

Martin Dick, Judy Sheard, Cathy Bareiss, Janet Carter, Donald Joyce, Trevor Harding, and Cary Laxer. 2002. Addressing student cheating: definitions and solutions. SIGCSE, Vol. 35, 2 (2002), 172--184.

Digital Library

[12]

Steve Engels, Vivek Lakshmanan, and Michelle Craig. 2007. Plagiarism detection using feature-based neural networks. In SIGCSE. 34--38.

[13]

Akhil Eppa and Anirudh Murali. 2022. Source Code Plagiarism Detection: A Machine Intelligence Approach. In ICAECC. 1--7.

[14]

Chunrong Fang, Zixi Liu, Yangyang Shi, Jeff Huang, and Qingkai Shi. 2020. Functional code clone detection with syntax and semantics fusion learning. In SIGSOFT. 516--527.

[15]

James Finnie-Ansley, Paul Denny, Brett A Becker, Andrew Luxton-Reilly, and James Prather. 2022. The robots are coming: Exploring the implications of openai codex on introductory programming. In ACE. 10--19.

[16]

James Finnie-Ansley, Paul Denny, Andrew Luxton-Reilly, Eddie Antonio Santos, James Prather, and Brett A Becker. 2023. My AI Wants to Know if This Will Be on the Exam: Testing OpenAI's Codex on CS2 Programming Exercises. In ACEC. 97--104.

[17]

Manuel A Fokam and Ritesh Ajoodha. 2021. Influence of Contrastive Learning on Source Code Plagiarism Detection through Recursive Neural Networks. In IMITEC. 1--6.

[18]

Arto Hellas, Juho Leinonen, and Petri Ihantola. 2017. Plagiarism in take-home exams: help-seeking, collaboration, and systematic cheating. In ITiCSE. 238--243.

[19]

Arto Hellas, Juho Leinonen, Sami Sarsa, Charles Koutcheme, Lilja Kujanp"a"a, and Juha Sorva. 2023. Exploring the Responses of Large Language Models to Beginner Programmers' Help Requests. In ICER.

[20]

Kenneth Holstein, Bruce M McLaren, and Vincent Aleven. 2018. Student learning benefits of a mixed-reality teacher awareness tool in AI-enhanced classrooms. In AIED. 154--168.

[21]

Muntasir Hoq, Peter Brusilovsky, and Bita Akram. 2022. SANN: A Subtree-based Attention Neural Network Model for Student Success Prediction Through Source Code Analysis. In 6th CSEDM Workshop.

[22]

Muntasir Hoq, Peter Brusilovsky, and Bita Akram. 2023 a. Analysis of an Explainable Student Performance Prediction Model in an Introductory Programming Course. In EDM. 79--90.

[23]

Muntasir Hoq, Sushanth Reddy Chilla, Melika Ahmadi Ranjbar, Peter Brusilovsky, and Bita Akram. 2023 b. SANN: Programming Code Representation Using Attention Neural Network with Optimized Subtree Extraction. In CIKM. 783--792.

[24]

Muntasir Hoq, Yang Shi, Juho Leinonen, Damilola Babalola, Collin Lynch, and Bita Akram. 2023 c. Detecting ChatGPT-Generated Code in a CS1 Course. In AIED LLM Workshop.

[25]

Qiubo Huang, Guozheng Fang, and Keyuan Jiang. 2019. An Approach of Suspected Code Plagiarism Detection Based on XGBoost Incremental Learning. In CNCI.

[26]

Meena Jha, Sander JJ Leemans, Regina Berretta, Ayse Aysin Bilgin, Lakmali Jayarathna, and Judy Sheard. 2022. Online Assessment and COVID: Opportunities and Challenges. In ACEC. 27--35.

[27]

Mike Joy, Georgina Cosma, Jane Yin-Kim Yau, and Jane Sinclair. 2010. Source code plagiarism-a student perspective. IEEE Trans. on Ed., Vol. 54, 1 (2010), 125--132.

Digital Library

[28]

MS Joy, JE Sinclair, Russell Boyatt, JY-K Yau, and Georgina Cosma. 2013. Student perspectives on source-code plagiarism. Int. J. for Educational Integrity (2013).

[29]

Oscar Karnalim, Simon, William Chivers, and Billy Susanto Panca. 2022. Educating students about programming plagiarism and collusion via formative feedback. TOCE, Vol. 22, 3 (2022), 1--31.

Digital Library

[30]

Majeed Kazemitabaar, Justin Chow, Carl Ka To Ma, Barbara J Ericson, David Weintrop, and Tovi Grossman. 2023. Studying the effect of AI Code Generators on Supporting Novice Learners in Introductory Programming. In CHI. 1--23.

[31]

Wang Kechao, Wang Tiantian, Zong Mingkui, Wang Zhifei, and Ren Xiangmin. 2012. Detection of plagiarism in students' programs using a data mining algorithm. In 2nd Int. Conf. on Comp. Sc. and Network Tech. 1318--1321.

[32]

Takeshi Kojima, Shixiang Shane Gu, Machel Reid, Yutaka Matsuo, and Yusuke Iwasawa. 2022. Large language models are zero-shot reasoners. In NeurIPS.

[33]

SeolHwa Lee, Andrew Matteson, Danial Hooshyar, SongHyun Kim, JaeBum Jung, GiChun Nam, and HeuiSeok Lim. 2016. Comparing programming language comprehension between novice and expert programmers using eeg analysis. In BIBE. 350--355.

[34]

Juho Leinonen, Paul Denny, Stephen MacNeil, Sami Sarsa, Seth Bernstein, Joanne Kim, Andrew Tran, and Arto Hellas. 2023 a. Comparing code explanations created by students and large language models. In ITiCSE.

[35]

Juho Leinonen, Arto Hellas, Sami Sarsa, Brent Reeves, Paul Denny, James Prather, and Brett A Becker. 2023 b. Using large language models to enhance programming error messages. In SIGCSE. 563--569.

[36]

Stephen MacNeil, Andrew Tran, Arto Hellas, Joanne Kim, Sami Sarsa, Paul Denny, Seth Bernstein, and Juho Leinonen. 2023. Experiences from using code explanations generated by large language models in a web software development e-book. In SIGCSE. 931--937.

[37]

Ye Mao, Yang Shi, Samiha Marwan, Thomas W Price, Tiffany Barnes, and Min Chi. 2021. Knowing both when and where: Temporal-ASTNN for Early Prediction of Student Success in Novice Programming Tasks. In EDM.

[38]

Samiha Marwan, Joseph Jay Williams, and Thomas Price. 2019. An evaluation of the impact of automated programming hints on performance and learning. In ICER. 61--70.

[39]

Eric Mitchell, Yoonho Lee, Alexander Khazatsky, Christopher D Manning, and Chelsea Finn. 2023. DetectGPT: Zero-Shot Machine-Generated Text Detection using Probability Curvature. arXiv preprint arXiv:2301.11305 (2023).

[40]

Sharon Nelson-Le Gall. 1981. Help-seeking: An understudied problem-solving skill in children. Developmental Review, Vol. 1, 3 (1981), 224--246.

[41]

Richard S Newman. 2002. How self-regulated learners cope with academic difficulty: The role of adaptive help seeking. Theory into practice, Vol. 41, 2 (2002), 132--138.

[42]

Michael Sheinman Orenstrakh, Oscar Karnalim, Carlos Anibal Suarez, and Michael Liut. 2023. Detecting LLM-Generated Text in Computing Education: A Comparative Study for ChatGPT Cases. arXiv preprint arXiv:2307.07411 (2023).

[43]

James Prather, Paul Denny, Juho Leinonen, Brett A Becker, Ibrahim Albluwi, Michelle Craig, Hieke Keuning, Natalie Kiesler, Tobias Kohn, Andrew Luxton-Reilly, Stephen MacNeil, Andrew Petersen, Raymond Pettit, Brent N Reeves, and Jaromir Savelka. 2023 a. The Robots Are Here: The Generative AI Revolution in Computing Education. Working Group Reports on Innovation and Technology in Computer Science Education (2023).

[44]

James Prather, Brent N Reeves, Paul Denny, Brett A Becker, Juho Leinonen, Andrew Luxton-Reilly, Garrett Powell, James Finnie-Ansley, and Eddie Antonio Santos. 2023 b. "It's Weird That it Knows What I Want": Usability and Interactions with Copilot for Novice Programmers. TOCHI (2023).

[45]

Lutz Prechelt, Guido Malpohl, Michael Philippsen, et al. 2002. Finding plagiarisms among a set of programs with JPlag. J. Univ. Comput. Sci., Vol. 8, 11 (2002), 1016.

[46]

Greg Rosalsky and Emma Peaslee. 2023. This 22-year-old is trying to save us from ChatGPT before it changes writing forever. NPR, Vol. 18 (2023).

[47]

Allison M Ryan and Sungok Serena Shim. 2012. Changes in help seeking from peers during early adolescence: Associations with changes in achievement and perceptions of teachers. J. of Educational Psychology, Vol. 104, 4 (2012), 1122.

[48]

Sami Sarsa, Paul Denny, Arto Hellas, and Juho Leinonen. 2022a. Automatic generation of programming exercises and code explanations using large language models. In ICER. 27--43.

[49]

Sami Sarsa, Juho Leinonen, and Arto Hellas. 2022b. Empirical Evaluation of Deep Learning Models for Knowledge Tracing: Of Hyperparameters and Metrics on Performance and Replicability. J. of EDM, Vol. 14, 2 (2022).

[50]

Judy Sheard, Martin Dick, Selby Markham, Ian Macdonald, and Meaghan Walsh. 2002. Cheating and plagiarism: Perceptions and practices of first year IT students. In ITiCSE. 183--187.

[51]

Judy Sheard, Selby Markham, and Martin Dick. 2003. Investigating differences in cheating behaviours of IT undergraduate and graduate students: The maturity and motivation factors. Higher Ed. Research & Development, Vol. 22 (2003), 91--108.

[52]

Yang Shi. 2023. Interpretable Code-Informed Learning Analytics for CS Education. In LAK. 180--187.

[53]

Yang Shi, Min Chi, Tiffany Barnes, and Thomas Price. 2022. Code-DKT: A Code-based Knowledge Tracing Model for Programming Tasks. In EDM. 50--61.

[54]

Yang Shi, Ye Mao, Tiffany Barnes, Min Chi, and Thomas W Price. 2021a. More with less: Exploring how to use deep learning effectively through semi-supervised learning for automatic bug detection in student code. In EDM. 446--453.

[55]

Yang Shi, Robin Schmucker, Min Chi, Tiffany Barnes, and Thomas Price. 2023. KC-Finder: Automated Knowledge Component Discovery for Programming Problems. In EDM.

[56]

Yang Shi, Krupal Shah, Wengran Wang, Samiha Marwan, Poorvaja Penmetsa, and Thomas Price. 2021b. Toward semi-automatic misconception discovery using code embeddings. In LAK. 606--612.

[57]

Wenhan Wang, Ge Li, Bo Ma, Xin Xia, and Zhi Jin. 2020. Detecting code clones with graph neural network and flow-augmented abstract syntax tree. In SANER.

[58]

Susan Wiedenbeck, Vikki Fix, and Jean Scholtz. 1993. Characteristics of the mental representations of novice and expert programmers: an empirical study. Int. J. of Man-Machine Studies, Vol. 39, 5 (1993), 793--812.

Digital Library

[59]

Jian Zhang, Xu Wang, Hongyu Zhang, Hailong Sun, Kaixuan Wang, and Xudong Liu. 2019. A novel neural source code representation based on abstract syntax tree. In ICSE. 783--794. io

Cited By

Pliuskuvienė BRadvilaitė UJuodagalvytė RRamanauskaitė SStefanovič P(2024)EDUCATIONAL DATA MINING AND LEARNING ANALYTICS: TEXT GENERATORS USAGE EFFECT ON STUDENTS’ GRADESNew Trends in Computer Sciences10.3846/ntcs.2024.213182:1(19-30)Online publication date: 4-Jun-2024
https://doi.org/10.3846/ntcs.2024.21318
Oedingen MEngelhardt RDenz RHammer MKonen W(2024)ChatGPT Code Detection: Techniques for Uncovering the Source of CodeAI10.3390/ai50300535:3(1066-1094)Online publication date: 2-Jul-2024
https://doi.org/10.3390/ai5030053
Sousa DPaixao MRagkhitwetsagul CUchoa I(2024)Code Clone Configuration as a Multi-Objective Search ProblemProceedings of the 18th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement10.1145/3674805.3690757(503-509)Online publication date: 24-Oct-2024
https://dl.acm.org/doi/10.1145/3674805.3690757
Show More Cited By

Index Terms

Detecting ChatGPT-Generated Code Submissions in a CS1 Course Using Machine Learning Models
1. Applied computing
  1. Education

Recommendations

ChatGPT and Cheat Detection in CS1 Using a Program Autograding System
ITiCSE 2024: Proceedings of the 2024 on Innovation and Technology in Computer Science Education V. 1

We experimented with ChatGPT's ability to write programs in a CS1 class, and the ability of a popular tool to auto-detect ChatGPT-written programs. We found ChatGPT was proficient at generating correct programs from a mere copy-paste of the English ...
CS1 with a Side of AI: Teaching Software Verification for Secure Code in the Era of Generative AI
SIGCSE 2024: Proceedings of the 55th ACM Technical Symposium on Computer Science Education V. 1

As AI-generated code promises to become an increasingly relied upon tool for software developers, there is a temptation to call for significant changes to early computer science curricula. A move from syntax-focused topics in CS1 toward abstraction and ...
Detecting Code Smells using ChatGPT: Initial Insights
ESEM '24: Proceedings of the 18th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement

This paper presents initial insights into the effectiveness of ChatGPT in detecting code smells in Java projects. We utilize a large dataset comprising four code smells—Blob, Data Class, Feature Envy, and Long Method—classified into three severity ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGCSE 2024: Proceedings of the 55th ACM Technical Symposium on Computer Science Education V. 1

March 2024

1583 pages

ISBN:9798400704239

DOI:10.1145/3626252

General Chairs:
Ben Stephenson
University of Calgary, Canada6000230660002306
,
Jeffrey A. Stone
Penn State University6000143960001439
,
Program Chairs:
Lina Battestilli
North Carolina State University, USA6000492360004923
,
Samuel A. Rebelsky
Grinnell College60028806
,
Libby Shoop
Macalester College60028787

Copyright © 2024 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGCSE: ACM Special Interest Group on Computer Science Education

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 March 2024

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

SIGCSE 2024

Sponsor:

SIGCSE

SIGCSE 2024: The 55th ACM Technical Symposium on Computer Science Education

March 20 - 23, 2024

OR, Portland, USA

Acceptance Rates

Overall Acceptance Rate 1,595 of 4,542 submissions, 35%

Upcoming Conference

SIGCSE Virtual 2024

Sponsor:
sigcse

1st ACM Virtual Global Computing Education Conference

December 5 - 8, 2024

Virtual Event , NC , USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

8
Total Citations
View Citations
363
Total Downloads

Downloads (Last 12 months)363
Downloads (Last 6 weeks)60

Reflects downloads up to 09 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Pliuskuvienė BRadvilaitė UJuodagalvytė RRamanauskaitė SStefanovič P(2024)EDUCATIONAL DATA MINING AND LEARNING ANALYTICS: TEXT GENERATORS USAGE EFFECT ON STUDENTS’ GRADESNew Trends in Computer Sciences10.3846/ntcs.2024.213182:1(19-30)Online publication date: 4-Jun-2024
https://doi.org/10.3846/ntcs.2024.21318
Oedingen MEngelhardt RDenz RHammer MKonen W(2024)ChatGPT Code Detection: Techniques for Uncovering the Source of CodeAI10.3390/ai50300535:3(1066-1094)Online publication date: 2-Jul-2024
https://doi.org/10.3390/ai5030053
Sousa DPaixao MRagkhitwetsagul CUchoa I(2024)Code Clone Configuration as a Multi-Objective Search ProblemProceedings of the 18th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement10.1145/3674805.3690757(503-509)Online publication date: 24-Oct-2024
https://dl.acm.org/doi/10.1145/3674805.3690757
Skripchuk JBacher JPrice T(2024)An Investigation of the Drivers of Novice Programmers' Intentions to Use Web Search and GenAIProceedings of the 2024 ACM Conference on International Computing Education Research - Volume 110.1145/3632620.3671112(487-501)Online publication date: 12-Aug-2024
https://dl.acm.org/doi/10.1145/3632620.3671112
Kuksa VPolyakov M(2024)Developing and Applying a Neural Network System for Text Plagiarism Detection in Higher Education2024 4th International Conference on Technology Enhanced Learning in Higher Education (TELE)10.1109/TELE62556.2024.10605693(412-416)Online publication date: 20-Jun-2024
https://doi.org/10.1109/TELE62556.2024.10605693
Pirzado FAhmed AMendoza-Urdiales RTerashima-Marin H(2024)Navigating the Pitfalls: Analyzing the Behavior of LLMs as a Coding Assistant for Computer Science Students—A Systematic Review of the LiteratureIEEE Access10.1109/ACCESS.2024.344362112(112605-112625)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3443621
Andrade-Hidalgo GMio-Cango PIparraguirre-Villanueva O(2024)Exploring the Impact of Artificial Intelligence on Research Ethics - A Systematic ReviewJournal of Academic Ethics10.1007/s10805-024-09579-8Online publication date: 23-Oct-2024
https://doi.org/10.1007/s10805-024-09579-8
Rebollido FCaro J(2024)Code Comprehension Problems in Introductory Programming to Overcome ChatGPTNovel and Intelligent Digital Systems: Proceedings of the 4th International Conference (NiDS 2024)10.1007/978-3-031-73344-4_50(582-593)Online publication date: 16-Oct-2024
https://doi.org/10.1007/978-3-031-73344-4_50

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents