Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3626252.3630826acmconferencesArticle/Chapter ViewAbstractPublication PagessigcseConference Proceedingsconference-collections
research-article

Detecting ChatGPT-Generated Code Submissions in a CS1 Course Using Machine Learning Models

Published: 07 March 2024 Publication History

Abstract

The emergence of publicly accessible large language models (LLMs) such as ChatGPT poses unprecedented risks of new types of plagiarism and cheating where students use LLMs to solve exercises for them. Detecting this behavior will be a necessary component in introductory computer science (CS1) courses, and educators should be well-equipped with detection tools when the need arises. However, ChatGPT generates code non-deterministically, and thus, traditional similarity detectors might not suffice to detect AI-created code. In this work, we explore the affordances of Machine Learning (ML) models for the detection task. We used an openly available dataset of student programs for CS1 assignments and had ChatGPT generate code for the same assignments, and then evaluated the performance of both traditional machine learning models and Abstract Syntax Tree-based (AST-based) deep learning models in detecting ChatGPT code from student code submissions. Our results suggest that both traditional machine learning models and AST-based deep learning models are effective in identifying ChatGPT-generated code with accuracy above 90%. Since the deployment of such models requires ML knowledge and resources that are not always accessible to instructors, we also explore the patterns detected by deep learning models that indicate possible ChatGPT code signatures, which instructors could possibly use to detect LLM-based cheating manually. We also explore whether explicitly asking ChatGPT to impersonate a novice programmer affects the code produced. We further discuss the potential applications of our proposed models for enhancing introductory computer science instruction.

References

[1]
Bita Akram, Wookhee Min, Eric Wiebe, Bradford Mott, Kristy Elizabeth Boyer, and James Lester. 2018. Improving stealth assessment in game-based learning with LSTM-based analytics. In EDM. 208--218.
[2]
Bita Akram, Wookhe Min, Eric Wiebe, Anam Navied, Bradford Mott, Kristy Elizabeth Boyer, James Lester, et al. 2020. Automated assessment of computer science competencies from student programs with gaussian process regression. In EDM.
[3]
Ibrahim Albluwi. 2019. Plagiarism in programming assessments: a systematic review. TOCE, Vol. 20, 1 (2019), 1--28.
[4]
Uri Alon, Meital Zilberstein, Omer Levy, and Eran Yahav. 2019. code2vec: Learning distributed representations of code. POPL, Vol. 3 (2019), 1--29.
[5]
Brett A Becker, Paul Denny, James Finnie-Ansley, Andrew Luxton-Reilly, James Prather, and Eddie Antonio Santos. 2023. Programming Is Hard-Or at Least It Used to Be: Educational Opportunities and Challenges of AI Code Generation. In SIGCSE. 500--506.
[6]
Georgina Cosma and Mike Joy. 2008. Towards a definition of source-code plagiarism. IEEE Trans. on Ed., Vol. 51, 2 (2008), 195--200.
[7]
Seife Dendir and R Stockton Maxwell. 2020. Cheating in online courses: Evidence from online proctoring. Computers in Human Behavior Reports, Vol. 2 (2020), 100033.
[8]
Paul Denny, Viraj Kumar, and Nasser Giacaman. 2023a. Conversing with Copilot: Exploring prompt engineering for solving CS1 problems using natural language. In SIGCSE. 1136--1142.
[9]
Paul Denny, James Prather, Brett A Becker, James Finnie-Ansley, Arto Hellas, Juho Leinonen, Andrew Luxton-Reilly, Brent N Reeves, Eddie Antonio Santos, and Sami Sarsa. 2023b. Computing Education in the Era of Generative AI. arXiv preprint arXiv:2306.02608 (2023).
[10]
Paul Denny, Sami Sarsa, Arto Hellas, and Juho Leinonen. 2022 Robosourcing Educational Resources--Leveraging Large Language Models for Learnersourcing. arXiv preprint arXiv:2211.04715 (2022).
[11]
Martin Dick, Judy Sheard, Cathy Bareiss, Janet Carter, Donald Joyce, Trevor Harding, and Cary Laxer. 2002. Addressing student cheating: definitions and solutions. SIGCSE, Vol. 35, 2 (2002), 172--184.
[12]
Steve Engels, Vivek Lakshmanan, and Michelle Craig. 2007. Plagiarism detection using feature-based neural networks. In SIGCSE. 34--38.
[13]
Akhil Eppa and Anirudh Murali. 2022. Source Code Plagiarism Detection: A Machine Intelligence Approach. In ICAECC. 1--7.
[14]
Chunrong Fang, Zixi Liu, Yangyang Shi, Jeff Huang, and Qingkai Shi. 2020. Functional code clone detection with syntax and semantics fusion learning. In SIGSOFT. 516--527.
[15]
James Finnie-Ansley, Paul Denny, Brett A Becker, Andrew Luxton-Reilly, and James Prather. 2022. The robots are coming: Exploring the implications of openai codex on introductory programming. In ACE. 10--19.
[16]
James Finnie-Ansley, Paul Denny, Andrew Luxton-Reilly, Eddie Antonio Santos, James Prather, and Brett A Becker. 2023. My AI Wants to Know if This Will Be on the Exam: Testing OpenAI's Codex on CS2 Programming Exercises. In ACEC. 97--104.
[17]
Manuel A Fokam and Ritesh Ajoodha. 2021. Influence of Contrastive Learning on Source Code Plagiarism Detection through Recursive Neural Networks. In IMITEC. 1--6.
[18]
Arto Hellas, Juho Leinonen, and Petri Ihantola. 2017. Plagiarism in take-home exams: help-seeking, collaboration, and systematic cheating. In ITiCSE. 238--243.
[19]
Arto Hellas, Juho Leinonen, Sami Sarsa, Charles Koutcheme, Lilja Kujanp"a"a, and Juha Sorva. 2023. Exploring the Responses of Large Language Models to Beginner Programmers' Help Requests. In ICER.
[20]
Kenneth Holstein, Bruce M McLaren, and Vincent Aleven. 2018. Student learning benefits of a mixed-reality teacher awareness tool in AI-enhanced classrooms. In AIED. 154--168.
[21]
Muntasir Hoq, Peter Brusilovsky, and Bita Akram. 2022. SANN: A Subtree-based Attention Neural Network Model for Student Success Prediction Through Source Code Analysis. In 6th CSEDM Workshop.
[22]
Muntasir Hoq, Peter Brusilovsky, and Bita Akram. 2023 a. Analysis of an Explainable Student Performance Prediction Model in an Introductory Programming Course. In EDM. 79--90.
[23]
Muntasir Hoq, Sushanth Reddy Chilla, Melika Ahmadi Ranjbar, Peter Brusilovsky, and Bita Akram. 2023 b. SANN: Programming Code Representation Using Attention Neural Network with Optimized Subtree Extraction. In CIKM. 783--792.
[24]
Muntasir Hoq, Yang Shi, Juho Leinonen, Damilola Babalola, Collin Lynch, and Bita Akram. 2023 c. Detecting ChatGPT-Generated Code in a CS1 Course. In AIED LLM Workshop.
[25]
Qiubo Huang, Guozheng Fang, and Keyuan Jiang. 2019. An Approach of Suspected Code Plagiarism Detection Based on XGBoost Incremental Learning. In CNCI.
[26]
Meena Jha, Sander JJ Leemans, Regina Berretta, Ayse Aysin Bilgin, Lakmali Jayarathna, and Judy Sheard. 2022. Online Assessment and COVID: Opportunities and Challenges. In ACEC. 27--35.
[27]
Mike Joy, Georgina Cosma, Jane Yin-Kim Yau, and Jane Sinclair. 2010. Source code plagiarism-a student perspective. IEEE Trans. on Ed., Vol. 54, 1 (2010), 125--132.
[28]
MS Joy, JE Sinclair, Russell Boyatt, JY-K Yau, and Georgina Cosma. 2013. Student perspectives on source-code plagiarism. Int. J. for Educational Integrity (2013).
[29]
Oscar Karnalim, Simon, William Chivers, and Billy Susanto Panca. 2022. Educating students about programming plagiarism and collusion via formative feedback. TOCE, Vol. 22, 3 (2022), 1--31.
[30]
Majeed Kazemitabaar, Justin Chow, Carl Ka To Ma, Barbara J Ericson, David Weintrop, and Tovi Grossman. 2023. Studying the effect of AI Code Generators on Supporting Novice Learners in Introductory Programming. In CHI. 1--23.
[31]
Wang Kechao, Wang Tiantian, Zong Mingkui, Wang Zhifei, and Ren Xiangmin. 2012. Detection of plagiarism in students' programs using a data mining algorithm. In 2nd Int. Conf. on Comp. Sc. and Network Tech. 1318--1321.
[32]
Takeshi Kojima, Shixiang Shane Gu, Machel Reid, Yutaka Matsuo, and Yusuke Iwasawa. 2022. Large language models are zero-shot reasoners. In NeurIPS.
[33]
SeolHwa Lee, Andrew Matteson, Danial Hooshyar, SongHyun Kim, JaeBum Jung, GiChun Nam, and HeuiSeok Lim. 2016. Comparing programming language comprehension between novice and expert programmers using eeg analysis. In BIBE. 350--355.
[34]
Juho Leinonen, Paul Denny, Stephen MacNeil, Sami Sarsa, Seth Bernstein, Joanne Kim, Andrew Tran, and Arto Hellas. 2023 a. Comparing code explanations created by students and large language models. In ITiCSE.
[35]
Juho Leinonen, Arto Hellas, Sami Sarsa, Brent Reeves, Paul Denny, James Prather, and Brett A Becker. 2023 b. Using large language models to enhance programming error messages. In SIGCSE. 563--569.
[36]
Stephen MacNeil, Andrew Tran, Arto Hellas, Joanne Kim, Sami Sarsa, Paul Denny, Seth Bernstein, and Juho Leinonen. 2023. Experiences from using code explanations generated by large language models in a web software development e-book. In SIGCSE. 931--937.
[37]
Ye Mao, Yang Shi, Samiha Marwan, Thomas W Price, Tiffany Barnes, and Min Chi. 2021. Knowing both when and where: Temporal-ASTNN for Early Prediction of Student Success in Novice Programming Tasks. In EDM.
[38]
Samiha Marwan, Joseph Jay Williams, and Thomas Price. 2019. An evaluation of the impact of automated programming hints on performance and learning. In ICER. 61--70.
[39]
Eric Mitchell, Yoonho Lee, Alexander Khazatsky, Christopher D Manning, and Chelsea Finn. 2023. DetectGPT: Zero-Shot Machine-Generated Text Detection using Probability Curvature. arXiv preprint arXiv:2301.11305 (2023).
[40]
Sharon Nelson-Le Gall. 1981. Help-seeking: An understudied problem-solving skill in children. Developmental Review, Vol. 1, 3 (1981), 224--246.
[41]
Richard S Newman. 2002. How self-regulated learners cope with academic difficulty: The role of adaptive help seeking. Theory into practice, Vol. 41, 2 (2002), 132--138.
[42]
Michael Sheinman Orenstrakh, Oscar Karnalim, Carlos Anibal Suarez, and Michael Liut. 2023. Detecting LLM-Generated Text in Computing Education: A Comparative Study for ChatGPT Cases. arXiv preprint arXiv:2307.07411 (2023).
[43]
James Prather, Paul Denny, Juho Leinonen, Brett A Becker, Ibrahim Albluwi, Michelle Craig, Hieke Keuning, Natalie Kiesler, Tobias Kohn, Andrew Luxton-Reilly, Stephen MacNeil, Andrew Petersen, Raymond Pettit, Brent N Reeves, and Jaromir Savelka. 2023 a. The Robots Are Here: The Generative AI Revolution in Computing Education. Working Group Reports on Innovation and Technology in Computer Science Education (2023).
[44]
James Prather, Brent N Reeves, Paul Denny, Brett A Becker, Juho Leinonen, Andrew Luxton-Reilly, Garrett Powell, James Finnie-Ansley, and Eddie Antonio Santos. 2023 b. "It's Weird That it Knows What I Want": Usability and Interactions with Copilot for Novice Programmers. TOCHI (2023).
[45]
Lutz Prechelt, Guido Malpohl, Michael Philippsen, et al. 2002. Finding plagiarisms among a set of programs with JPlag. J. Univ. Comput. Sci., Vol. 8, 11 (2002), 1016.
[46]
Greg Rosalsky and Emma Peaslee. 2023. This 22-year-old is trying to save us from ChatGPT before it changes writing forever. NPR, Vol. 18 (2023).
[47]
Allison M Ryan and Sungok Serena Shim. 2012. Changes in help seeking from peers during early adolescence: Associations with changes in achievement and perceptions of teachers. J. of Educational Psychology, Vol. 104, 4 (2012), 1122.
[48]
Sami Sarsa, Paul Denny, Arto Hellas, and Juho Leinonen. 2022a. Automatic generation of programming exercises and code explanations using large language models. In ICER. 27--43.
[49]
Sami Sarsa, Juho Leinonen, and Arto Hellas. 2022b. Empirical Evaluation of Deep Learning Models for Knowledge Tracing: Of Hyperparameters and Metrics on Performance and Replicability. J. of EDM, Vol. 14, 2 (2022).
[50]
Judy Sheard, Martin Dick, Selby Markham, Ian Macdonald, and Meaghan Walsh. 2002. Cheating and plagiarism: Perceptions and practices of first year IT students. In ITiCSE. 183--187.
[51]
Judy Sheard, Selby Markham, and Martin Dick. 2003. Investigating differences in cheating behaviours of IT undergraduate and graduate students: The maturity and motivation factors. Higher Ed. Research & Development, Vol. 22 (2003), 91--108.
[52]
Yang Shi. 2023. Interpretable Code-Informed Learning Analytics for CS Education. In LAK. 180--187.
[53]
Yang Shi, Min Chi, Tiffany Barnes, and Thomas Price. 2022. Code-DKT: A Code-based Knowledge Tracing Model for Programming Tasks. In EDM. 50--61.
[54]
Yang Shi, Ye Mao, Tiffany Barnes, Min Chi, and Thomas W Price. 2021a. More with less: Exploring how to use deep learning effectively through semi-supervised learning for automatic bug detection in student code. In EDM. 446--453.
[55]
Yang Shi, Robin Schmucker, Min Chi, Tiffany Barnes, and Thomas Price. 2023. KC-Finder: Automated Knowledge Component Discovery for Programming Problems. In EDM.
[56]
Yang Shi, Krupal Shah, Wengran Wang, Samiha Marwan, Poorvaja Penmetsa, and Thomas Price. 2021b. Toward semi-automatic misconception discovery using code embeddings. In LAK. 606--612.
[57]
Wenhan Wang, Ge Li, Bo Ma, Xin Xia, and Zhi Jin. 2020. Detecting code clones with graph neural network and flow-augmented abstract syntax tree. In SANER.
[58]
Susan Wiedenbeck, Vikki Fix, and Jean Scholtz. 1993. Characteristics of the mental representations of novice and expert programmers: an empirical study. Int. J. of Man-Machine Studies, Vol. 39, 5 (1993), 793--812.
[59]
Jian Zhang, Xu Wang, Hongyu Zhang, Hailong Sun, Kaixuan Wang, and Xudong Liu. 2019. A novel neural source code representation based on abstract syntax tree. In ICSE. 783--794. io

Cited By

View all
  • (2024)EDUCATIONAL DATA MINING AND LEARNING ANALYTICS: TEXT GENERATORS USAGE EFFECT ON STUDENTS’ GRADESNew Trends in Computer Sciences10.3846/ntcs.2024.213182:1(19-30)Online publication date: 4-Jun-2024
  • (2024)ChatGPT Code Detection: Techniques for Uncovering the Source of CodeAI10.3390/ai50300535:3(1066-1094)Online publication date: 2-Jul-2024
  • (2024)Code Clone Configuration as a Multi-Objective Search ProblemProceedings of the 18th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement10.1145/3674805.3690757(503-509)Online publication date: 24-Oct-2024
  • Show More Cited By

Index Terms

  1. Detecting ChatGPT-Generated Code Submissions in a CS1 Course Using Machine Learning Models

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGCSE 2024: Proceedings of the 55th ACM Technical Symposium on Computer Science Education V. 1
    March 2024
    1583 pages
    ISBN:9798400704239
    DOI:10.1145/3626252
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 07 March 2024

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. artificial intelligence
    2. chatgpt
    3. cheat detection
    4. cs1
    5. introductory programming course
    6. large language model
    7. plagiarism detection

    Qualifiers

    • Research-article

    Conference

    SIGCSE 2024
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 1,595 of 4,542 submissions, 35%

    Upcoming Conference

    SIGCSE Virtual 2024
    1st ACM Virtual Global Computing Education Conference
    December 5 - 8, 2024
    Virtual Event , NC , USA

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)363
    • Downloads (Last 6 weeks)60
    Reflects downloads up to 09 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)EDUCATIONAL DATA MINING AND LEARNING ANALYTICS: TEXT GENERATORS USAGE EFFECT ON STUDENTS’ GRADESNew Trends in Computer Sciences10.3846/ntcs.2024.213182:1(19-30)Online publication date: 4-Jun-2024
    • (2024)ChatGPT Code Detection: Techniques for Uncovering the Source of CodeAI10.3390/ai50300535:3(1066-1094)Online publication date: 2-Jul-2024
    • (2024)Code Clone Configuration as a Multi-Objective Search ProblemProceedings of the 18th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement10.1145/3674805.3690757(503-509)Online publication date: 24-Oct-2024
    • (2024)An Investigation of the Drivers of Novice Programmers' Intentions to Use Web Search and GenAIProceedings of the 2024 ACM Conference on International Computing Education Research - Volume 110.1145/3632620.3671112(487-501)Online publication date: 12-Aug-2024
    • (2024)Developing and Applying a Neural Network System for Text Plagiarism Detection in Higher Education2024 4th International Conference on Technology Enhanced Learning in Higher Education (TELE)10.1109/TELE62556.2024.10605693(412-416)Online publication date: 20-Jun-2024
    • (2024)Navigating the Pitfalls: Analyzing the Behavior of LLMs as a Coding Assistant for Computer Science Students—A Systematic Review of the LiteratureIEEE Access10.1109/ACCESS.2024.344362112(112605-112625)Online publication date: 2024
    • (2024)Exploring the Impact of Artificial Intelligence on Research Ethics - A Systematic ReviewJournal of Academic Ethics10.1007/s10805-024-09579-8Online publication date: 23-Oct-2024
    • (2024)Code Comprehension Problems in Introductory Programming to Overcome ChatGPTNovel and Intelligent Digital Systems: Proceedings of the 4th International Conference (NiDS 2024)10.1007/978-3-031-73344-4_50(582-593)Online publication date: 16-Oct-2024

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media