research-article

Open access

BugSpotter: Automated Generation of Code Debugging Exercises

Authors: Victor-Alexandru P?durean, Paul Denny, Adish SinglaAuthors Info & Claims

SIGCSETS 2025: Proceedings of the 56th ACM Technical Symposium on Computer Science Education V. 1

Pages 896 - 902

https://doi.org/10.1145/3641554.3701974

Published: 18 February 2025 Publication History

Abstract

Debugging is an essential skill when learning to program, yet its instruction and emphasis often vary widely across introductory courses. In the era of code-generating large language models (LLMs), the ability for students to reason about code and identify errors is increasingly important. However, students frequently resort to trial-and-error methods to resolve bugs without fully understanding the underlying issues. Developing the ability to identify and hypothesize the cause of bugs is crucial but can be time-consuming to teach effectively through traditional means. This paper introduces BugSpotter, an innovative tool that leverages an LLM to generate buggy code from a problem description and verify the synthesized bugs via a test suite. Students interact with BugSpotter by designing failing test cases, where the buggy code's output differs from the expected result as defined by the problem specification. This not only provides opportunities for students to enhance their debugging skills, but also to practice reading and understanding problem specifications. We deployed BugSpotter in a large classroom setting and compared the debugging exercises it generated to exercises hand-crafted by an instructor for the same problems. We found that the LLM-generated exercises produced by BugSpotter varied in difficulty and were well-matched to the problem specifications. Importantly, the LLM-generated exercises were comparable to those manually created by instructors with respect to student performance, suggesting that BugSpotter could be an effective and efficient aid for learning debugging.

References

[1]

Umair Z. Ahmed, Maria Christakis, Aleksandr Efremov, Nigel Fernandez, Ahana Ghosh, Abhik Roychoudhury, and Adish Singla. 2020. Synthesizing Tasks for Block-based Programming. In Proceedings of the Annual Conference on Neural Information Processing Systems (NeurIPS).

[2]

Georg Brandl, Matthäus Chajdas, and Jean Abou-Samra. 2006. Pygments. https://pygments.org/.

[3]

Serena Caraco, Nelson Lojo, Michael Verdicchio, and Armando Fox. 2024. Generating Multi-Part Autogradable Faded Parsons Problems From Code-Writing Exercises. In Proceedings of the Technical Symposium on Computer Science Education (SIGCSE).

Digital Library

[4]

Mei-Wen Chen, Cheng-Chih Wu, and Yu-Tzu Lin. 2013. Novices' Debugging Behaviors in VB Programming. In Proceedings of the Learning and Teaching in Computing and Engineering (LaTiCE).

[5]

William G Cochran. 1952. The χ2 Test of Goodness of Fit. The Annals of Mathematical Statistics (1952).

[6]

Jacob Cohen. 1960. A Coefficient of Agreement for Nominal Scales. Educational and Psychological Measurement (1960).

[7]

Paul Denny, Sumit Gulwani, Neil T. Heffernan, Tanja Käser, Steven Moore, Anna N. Rafferty, and Adish Singla. 2024a. Generative AI for Education (GAIED): Advances, Opportunities, and Challenges. CoRR, Vol. abs/2402.01580 (2024).

[8]

Paul Denny, Juho Leinonen, James Prather, Andrew Luxton-Reilly, Thezyrie Amarouche, Brett A. Becker, and Brent N. Reeves. 2024b. Prompt Problems: A New Programming Exercise for the Generative AI Era. In Proceedings of the Technical Symposium on Computer Science Education (SIGCSE).

[9]

Paul Denny, Andrew Luxton-Reilly, Ewan D. Tempero, and Jacob Hendrickx. 2011. CodeWrite: Supporting Student-Driven Practice of Java. In Proceedings of the Technical Symposium on Computer Science Education (SIGCSE).

Digital Library

[10]

Paul Denny, James Prather, Brett A. Becker, Zachary Albrecht, Dastyni Loksa, and Raymond Pettit. 2019. A Closer Look at Metacognitive Scaffolding: Solving Test Cases Before Programming. In Koli Calling International Conference on Computing Education Research (Koli Calling).

Digital Library

[11]

Paul Denny, James Prather, Brett A. Becker, James Finnie-Ansley, Arto Hellas, Juho Leinonen, Andrew Luxton-Reilly, Brent N. Reeves, Eddie Antonio Santos, and Sami Sarsa. 2024c. Computing Education in the Era of Generative AI. Commun. ACM (2024).

[12]

Sue Fitzgerald, Gary Lewandowski, Renée McCauley, Laurie Murphy, Beth Simon, Lynda Thomas, and Carol Zander. 2008. Debugging: Finding, Fixing and Flailing, a Multi-institutional Study of Novice Debuggers. Computer Science Education, Vol. 18 (2008).

[13]

Ahana Ghosh, Sebastian Tschiatschek, Sam Devlin, and Adish Singla. 2022. Adaptive Scaffolding in Block-Based Programming via Synthesizing New Tasks as Pop Quizzes. In Proceeding of the International Conference on Artificial Intelligence in Education AIED.

Digital Library

[14]

Andre Del Carpio Gutierrez, Paul Denny, and Andrew Luxton-Reilly. 2024. Evaluating Automatically Generated Contextualised Programming Exercises. In Proceedings of the Technical Symposium on Computer Science Education (SIGCSE).

Digital Library

[15]

Mollie Jordan, Kevin Ly, and Adalbert Gerald Soosai Raj. 2024. Need a Programming Exercise Generated in Your Native Language? ChatGPT's Got Your Back: Automatic Generation of Non-English Programming Exercises Using OpenAI GPT-3.5. In Proceedings of the Technical Symposium on Computer Science Education (SIGCSE).

Digital Library

[16]

Yasmin B. Kafai, David DeLiema, Deborah A. Fields, Gary Lewandowski, and Colleen M. Lewis. 2019. Rethinking Debugging as Productive Failure for CS Education. In Proceedings of the Technical Symposium on Computer Science Education (SIGCSE).

[17]

Amy J. Ko, Thomas D. LaToza, Stephen Hull, Ellen A. Ko, William Kwok, Jane Quichocho, Harshitha Akkaraju, and Rishin Pandit. 2019. Teaching Explicit Programming Strategies to Adolescents. In Proceedings of the Technical Symposium on Computer Science Education (SIGCSE).

Digital Library

[18]

Nachiket Kotalwar, Alkis Gotovos, and Adish Singla. 2024. Hints-In-Browser: Benchmarking Language Models for Programming Feedback Generation. In Proceedings of the Annual Conference on Neural Information Processing Systems (NeurIPS) Track on Datasets and Benchmarks.

[19]

Martin Labaj and Mária Bieliková. 2014. Utilization of Exercise Difficulty Rating by Students for Recommendation. In Proceedings of the International Conference on Web-Based Learning (ICWL).

[20]

Juho Leinonen, Arto Hellas, Sami Sarsa, Brent N. Reeves, Paul Denny, James Prather, and Brett A. Becker. 2023. Using Large Language Models to Enhance Programming Error Messages. In Proceedings of the Technical Symposium on Computer Science Education (SIGCSE).

[21]

Chen Li, Emily Chan, Paul Denny, Andrew Luxton-Reilly, and Ewan D. Tempero. 2019. Towards a Framework for Teaching Debugging. In Proceedings of the Australasian Computing Education Conference (ACE).

[22]

Nelson Lojo and Armando Fox. 2022. Teaching Test-Writing As a Variably-Scaffolded Programming Pattern. In Proceedings of the Conference on Innovation and Technology in Computer Science Education (ItiCSE).

Digital Library

[23]

Qianou Ma, Hua Shen, Kenneth Koedinger, and Sherry Tongshuang Wu. 2024. How to Teach Programming in the AI Era? Using LLMs as a Teachable Agent for Debugging. In Proceeding of the International Conference on Artificial Intelligence in Education (AIED).

Digital Library

[24]

Stephen MacNeil, Paul Denny, Andrew Tran, Juho Leinonen, Seth Bernstein, Arto Hellas, Sami Sarsa, and Joanne Kim. 2024. Decoding Logic Errors: A Comparative Study on Bug Detection by Students and Large Language Models. In Proceedings of the Australasian Computing Education Conference (ACE).

Digital Library

[25]

Renée McCauley, Sue Fitzgerald, Gary Lewandowski, Laurie Murphy, Beth Simon, Lynda Thomas, and Carol Zander. 2008. Debugging: A Review of the Literature from an Educational Perspective. Computer Science Education (2008).

[26]

Ismael Villegas Molina, Audria Montalvo, Benjamin Ochoa, Paul Denny, and Leo Porter. 2024. Leveraging LLM Tutoring Systems for Non-Native English Speakers in Introductory CS Courses.

[27]

Laurie Murphy, Gary Lewandowski, Renée McCauley, Beth Simon, Lynda Thomas, and Carol Zander. 2008. Debugging: the Good, the Bad, and the Quirky - A Qualitative Analysis of Novices' Strategies. In Proceedings of the Technical Symposium on Computer Science Education (SIGCSE).

Digital Library

[28]

Manh Hung Nguyen, Sebastian Tschiatschek, and Adish Singla. 2024. Large Language Models for In-Context Student Modeling: Synthesizing Student's Behavior in Visual Programming from One-Shot Observation. In Proceedings of the International Conference on Educational Data Mining (EDM).

[29]

OpenAI. 2023. ChatGPT. https://openai.com/blog/chatgpt.

[30]

OpenAI. 2024a. Hello GPT-4o. https://openai.com/index/hello-gpt-4o/.

[31]

OpenAI. 2024b. OpenAI Platform Models. https://platform.openai.com/docs/models.

[32]

Victor-Alexandru Pădurean and Adish Singla. 2024. Benchmarking Generative Models on Computational Thinking Tests in Elementary Visual Programming. In Proceedings of the Annual Conference on Neural Information Processing Systems (NeurIPS) Track on Datasets and Benchmarks.

[33]

Victor-Alexandru Pădurean, Georgios Tzannetos, and Adish Singla. 2024. Neural Task Synthesis for Visual Programming. Transactions on Machine Learning Research (TMLR) (2024).

[34]

Tung Phung, José Cambronero, Sumit Gulwani, Tobias Kohn, Rupak Majumdar, Adish Singla, and Gustavo Soares. 2023a. Generating High-Precision Feedback for Programming Syntax Errors using Large Language Models. In Proceedings of the International Conference on Educational Data Mining (EDM).

[35]

Tung Phung, Victor-Alexandru Padurean, José Cambronero, Sumit Gulwani, Tobias Kohn, Rupak Majumdar, Adish Singla, and Gustavo Soares. 2023b. Generative AI for Programming Education: Benchmarking ChatGPT, GPT-4, and Human Tutors. In Proceedings of the Conference on International Computing Education Research (ICER) - Volume 2.

Digital Library

[36]

Tung Phung, Victor-Alexandru Padurean, Anjali Singh, Christopher Brooks, José Cambronero, Sumit Gulwani, Adish Singla, and Gustavo Soares. 2024. Automating Human Tutor-Style Programming Feedback: Leveraging GPT-4 Tutor Model for Hint Generation and GPT-3.5 Student Model for Hint Validation. In Proceedings of the Learning Analytics and Knowledge Conference (LAK).

Digital Library

[37]

Kate Sanders et al. 2013. The Canterbury QuestionBank: Building a Repository of Multiple-Choice CS1 and CS2 Questions. In Proceedings of the Working Group Reports of the Conference on Innovation and Technology in Computer Science Education (ItiCSE).

[38]

Sami Sarsa, Paul Denny, Arto Hellas, and Juho Leinonen. 2022. Automatic Generation of Programming Exercises and Code Explanations Using Large Language Models. In Proceedings of the Conference on International Computing Education Research (ICER).

Digital Library

[39]

David H. Smith, Paul Denny, and Max Fowler. 2024. Prompting for Comprehension: Exploring the Intersection of Explain in Plain English Questions and Prompt Writing. In Proceedings of the Conference on Learning @ Scale (L@S).

Digital Library

[40]

Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed H. Chi, Quoc V. Le, and Denny Zhou. 2022. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. In Proceedings of the Annual Conference on Neural Information Processing Systems (NeurIPS).

[41]

Jacqueline Whalley, Amber Settle, and Andrew Luxton-Reilly. 2021. Novice Reflections on Debugging. In Proceedings of the Technical Symposium on Computer Science Education (SIGCSE).

Digital Library

Index Terms

BugSpotter: Automated Generation of Code Debugging Exercises
1. Computing methodologies
  1. Artificial intelligence
2. Social and professional topics
  1. Professional topics
    1. Computing education

Recommendations

Debugging with an AI Tutor: Investigating Novice Help-seeking Behaviors and Perceived Learning
ICER '24: Proceedings of the 2024 ACM Conference on International Computing Education Research - Volume 1

Debugging is a crucial skill for programmers, yet it can be challenging for novices to learn. The introduction of large language models (LLMs) has opened up new possibilities for providing personalized debugging support to students. However, concerns ...
Instructor Perceptions of AI Code Generation Tools - A Multi-Institutional Interview Study
SIGCSE 2024: Proceedings of the 55th ACM Technical Symposium on Computer Science Education V. 1

Much of the recent work investigating large language models and AI Code Generation tools in computing education has focused on assessing their capabilities for solving typical programming problems and for generating resources such as code explanations ...
Debugging Debugging
COMPSACW '11: Proceedings of the 2011 IEEE 35th Annual Computer Software and Applications Conference Workshops

When a program fails to accomplish its intended task, debugging is conducted to identify and remove any bugs. The debugging operation itself is not immune to flaws. Empirical evidence suggests many bugs are found after shipping, which calls into ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGCSETS 2025: Proceedings of the 56th ACM Technical Symposium on Computer Science Education V. 1

February 2025

1405 pages

ISBN:9798400705311

DOI:10.1145/3641554

General Chairs:
Jeffrey A. Stone
Pennsylvania State University
,
Timothy Yuen
University of Texas at San Antonio
,
Program Chairs:
Libby Shoop
Macalester College
,
Samuel A. Rebelsky
Grinnell College
,
James Prather
Abilene Christian University

Copyright © 2025 Owner/Author.

This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

SIGCSE: ACM Special Interest Group on Computer Science Education

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 February 2025

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

European Research Council

Conference

SIGCSE TS 2025

Sponsor:

SIGCSE

SIGCSE TS 2025: The 56th ACM Technical Symposium on Computer Science Education

February 26 - March 1, 2025

PA, Pittsburgh, USA

Acceptance Rates

SIGCSETS 2025 Paper Acceptance Rate 192 of 604 submissions, 32%;

Overall Acceptance Rate 1,787 of 5,146 submissions, 35%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
96
Total Downloads

Downloads (Last 12 months)96
Downloads (Last 6 weeks)96

Reflects downloads up to 27 Feb 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten