Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3641554.3701974acmconferencesArticle/Chapter ViewAbstractPublication PagessigcseConference Proceedingsconference-collections
research-article
Open access

BugSpotter: Automated Generation of Code Debugging Exercises

Published: 18 February 2025 Publication History

Abstract

Debugging is an essential skill when learning to program, yet its instruction and emphasis often vary widely across introductory courses. In the era of code-generating large language models (LLMs), the ability for students to reason about code and identify errors is increasingly important. However, students frequently resort to trial-and-error methods to resolve bugs without fully understanding the underlying issues. Developing the ability to identify and hypothesize the cause of bugs is crucial but can be time-consuming to teach effectively through traditional means. This paper introduces BugSpotter, an innovative tool that leverages an LLM to generate buggy code from a problem description and verify the synthesized bugs via a test suite. Students interact with BugSpotter by designing failing test cases, where the buggy code's output differs from the expected result as defined by the problem specification. This not only provides opportunities for students to enhance their debugging skills, but also to practice reading and understanding problem specifications. We deployed BugSpotter in a large classroom setting and compared the debugging exercises it generated to exercises hand-crafted by an instructor for the same problems. We found that the LLM-generated exercises produced by BugSpotter varied in difficulty and were well-matched to the problem specifications. Importantly, the LLM-generated exercises were comparable to those manually created by instructors with respect to student performance, suggesting that BugSpotter could be an effective and efficient aid for learning debugging.

References

[1]
Umair Z. Ahmed, Maria Christakis, Aleksandr Efremov, Nigel Fernandez, Ahana Ghosh, Abhik Roychoudhury, and Adish Singla. 2020. Synthesizing Tasks for Block-based Programming. In Proceedings of the Annual Conference on Neural Information Processing Systems (NeurIPS).
[2]
Georg Brandl, Matthäus Chajdas, and Jean Abou-Samra. 2006. Pygments. https://pygments.org/.
[3]
Serena Caraco, Nelson Lojo, Michael Verdicchio, and Armando Fox. 2024. Generating Multi-Part Autogradable Faded Parsons Problems From Code-Writing Exercises. In Proceedings of the Technical Symposium on Computer Science Education (SIGCSE).
[4]
Mei-Wen Chen, Cheng-Chih Wu, and Yu-Tzu Lin. 2013. Novices' Debugging Behaviors in VB Programming. In Proceedings of the Learning and Teaching in Computing and Engineering (LaTiCE).
[5]
William G Cochran. 1952. The χ2 Test of Goodness of Fit. The Annals of Mathematical Statistics (1952).
[6]
Jacob Cohen. 1960. A Coefficient of Agreement for Nominal Scales. Educational and Psychological Measurement (1960).
[7]
Paul Denny, Sumit Gulwani, Neil T. Heffernan, Tanja Käser, Steven Moore, Anna N. Rafferty, and Adish Singla. 2024a. Generative AI for Education (GAIED): Advances, Opportunities, and Challenges. CoRR, Vol. abs/2402.01580 (2024).
[8]
Paul Denny, Juho Leinonen, James Prather, Andrew Luxton-Reilly, Thezyrie Amarouche, Brett A. Becker, and Brent N. Reeves. 2024b. Prompt Problems: A New Programming Exercise for the Generative AI Era. In Proceedings of the Technical Symposium on Computer Science Education (SIGCSE).
[9]
Paul Denny, Andrew Luxton-Reilly, Ewan D. Tempero, and Jacob Hendrickx. 2011. CodeWrite: Supporting Student-Driven Practice of Java. In Proceedings of the Technical Symposium on Computer Science Education (SIGCSE).
[10]
Paul Denny, James Prather, Brett A. Becker, Zachary Albrecht, Dastyni Loksa, and Raymond Pettit. 2019. A Closer Look at Metacognitive Scaffolding: Solving Test Cases Before Programming. In Koli Calling International Conference on Computing Education Research (Koli Calling).
[11]
Paul Denny, James Prather, Brett A. Becker, James Finnie-Ansley, Arto Hellas, Juho Leinonen, Andrew Luxton-Reilly, Brent N. Reeves, Eddie Antonio Santos, and Sami Sarsa. 2024c. Computing Education in the Era of Generative AI. Commun. ACM (2024).
[12]
Sue Fitzgerald, Gary Lewandowski, Renée McCauley, Laurie Murphy, Beth Simon, Lynda Thomas, and Carol Zander. 2008. Debugging: Finding, Fixing and Flailing, a Multi-institutional Study of Novice Debuggers. Computer Science Education, Vol. 18 (2008).
[13]
Ahana Ghosh, Sebastian Tschiatschek, Sam Devlin, and Adish Singla. 2022. Adaptive Scaffolding in Block-Based Programming via Synthesizing New Tasks as Pop Quizzes. In Proceeding of the International Conference on Artificial Intelligence in Education AIED.
[14]
Andre Del Carpio Gutierrez, Paul Denny, and Andrew Luxton-Reilly. 2024. Evaluating Automatically Generated Contextualised Programming Exercises. In Proceedings of the Technical Symposium on Computer Science Education (SIGCSE).
[15]
Mollie Jordan, Kevin Ly, and Adalbert Gerald Soosai Raj. 2024. Need a Programming Exercise Generated in Your Native Language? ChatGPT's Got Your Back: Automatic Generation of Non-English Programming Exercises Using OpenAI GPT-3.5. In Proceedings of the Technical Symposium on Computer Science Education (SIGCSE).
[16]
Yasmin B. Kafai, David DeLiema, Deborah A. Fields, Gary Lewandowski, and Colleen M. Lewis. 2019. Rethinking Debugging as Productive Failure for CS Education. In Proceedings of the Technical Symposium on Computer Science Education (SIGCSE).
[17]
Amy J. Ko, Thomas D. LaToza, Stephen Hull, Ellen A. Ko, William Kwok, Jane Quichocho, Harshitha Akkaraju, and Rishin Pandit. 2019. Teaching Explicit Programming Strategies to Adolescents. In Proceedings of the Technical Symposium on Computer Science Education (SIGCSE).
[18]
Nachiket Kotalwar, Alkis Gotovos, and Adish Singla. 2024. Hints-In-Browser: Benchmarking Language Models for Programming Feedback Generation. In Proceedings of the Annual Conference on Neural Information Processing Systems (NeurIPS) Track on Datasets and Benchmarks.
[19]
Martin Labaj and Mária Bieliková. 2014. Utilization of Exercise Difficulty Rating by Students for Recommendation. In Proceedings of the International Conference on Web-Based Learning (ICWL).
[20]
Juho Leinonen, Arto Hellas, Sami Sarsa, Brent N. Reeves, Paul Denny, James Prather, and Brett A. Becker. 2023. Using Large Language Models to Enhance Programming Error Messages. In Proceedings of the Technical Symposium on Computer Science Education (SIGCSE).
[21]
Chen Li, Emily Chan, Paul Denny, Andrew Luxton-Reilly, and Ewan D. Tempero. 2019. Towards a Framework for Teaching Debugging. In Proceedings of the Australasian Computing Education Conference (ACE).
[22]
Nelson Lojo and Armando Fox. 2022. Teaching Test-Writing As a Variably-Scaffolded Programming Pattern. In Proceedings of the Conference on Innovation and Technology in Computer Science Education (ItiCSE).
[23]
Qianou Ma, Hua Shen, Kenneth Koedinger, and Sherry Tongshuang Wu. 2024. How to Teach Programming in the AI Era? Using LLMs as a Teachable Agent for Debugging. In Proceeding of the International Conference on Artificial Intelligence in Education (AIED).
[24]
Stephen MacNeil, Paul Denny, Andrew Tran, Juho Leinonen, Seth Bernstein, Arto Hellas, Sami Sarsa, and Joanne Kim. 2024. Decoding Logic Errors: A Comparative Study on Bug Detection by Students and Large Language Models. In Proceedings of the Australasian Computing Education Conference (ACE).
[25]
Renée McCauley, Sue Fitzgerald, Gary Lewandowski, Laurie Murphy, Beth Simon, Lynda Thomas, and Carol Zander. 2008. Debugging: A Review of the Literature from an Educational Perspective. Computer Science Education (2008).
[26]
Ismael Villegas Molina, Audria Montalvo, Benjamin Ochoa, Paul Denny, and Leo Porter. 2024. Leveraging LLM Tutoring Systems for Non-Native English Speakers in Introductory CS Courses.
[27]
Laurie Murphy, Gary Lewandowski, Renée McCauley, Beth Simon, Lynda Thomas, and Carol Zander. 2008. Debugging: the Good, the Bad, and the Quirky - A Qualitative Analysis of Novices' Strategies. In Proceedings of the Technical Symposium on Computer Science Education (SIGCSE).
[28]
Manh Hung Nguyen, Sebastian Tschiatschek, and Adish Singla. 2024. Large Language Models for In-Context Student Modeling: Synthesizing Student's Behavior in Visual Programming from One-Shot Observation. In Proceedings of the International Conference on Educational Data Mining (EDM).
[29]
OpenAI. 2023. ChatGPT. https://openai.com/blog/chatgpt.
[30]
OpenAI. 2024a. Hello GPT-4o. https://openai.com/index/hello-gpt-4o/.
[31]
OpenAI. 2024b. OpenAI Platform Models. https://platform.openai.com/docs/models.
[32]
Victor-Alexandru Pădurean and Adish Singla. 2024. Benchmarking Generative Models on Computational Thinking Tests in Elementary Visual Programming. In Proceedings of the Annual Conference on Neural Information Processing Systems (NeurIPS) Track on Datasets and Benchmarks.
[33]
Victor-Alexandru Pădurean, Georgios Tzannetos, and Adish Singla. 2024. Neural Task Synthesis for Visual Programming. Transactions on Machine Learning Research (TMLR) (2024).
[34]
Tung Phung, José Cambronero, Sumit Gulwani, Tobias Kohn, Rupak Majumdar, Adish Singla, and Gustavo Soares. 2023a. Generating High-Precision Feedback for Programming Syntax Errors using Large Language Models. In Proceedings of the International Conference on Educational Data Mining (EDM).
[35]
Tung Phung, Victor-Alexandru Padurean, José Cambronero, Sumit Gulwani, Tobias Kohn, Rupak Majumdar, Adish Singla, and Gustavo Soares. 2023b. Generative AI for Programming Education: Benchmarking ChatGPT, GPT-4, and Human Tutors. In Proceedings of the Conference on International Computing Education Research (ICER) - Volume 2.
[36]
Tung Phung, Victor-Alexandru Padurean, Anjali Singh, Christopher Brooks, José Cambronero, Sumit Gulwani, Adish Singla, and Gustavo Soares. 2024. Automating Human Tutor-Style Programming Feedback: Leveraging GPT-4 Tutor Model for Hint Generation and GPT-3.5 Student Model for Hint Validation. In Proceedings of the Learning Analytics and Knowledge Conference (LAK).
[37]
Kate Sanders et al. 2013. The Canterbury QuestionBank: Building a Repository of Multiple-Choice CS1 and CS2 Questions. In Proceedings of the Working Group Reports of the Conference on Innovation and Technology in Computer Science Education (ItiCSE).
[38]
Sami Sarsa, Paul Denny, Arto Hellas, and Juho Leinonen. 2022. Automatic Generation of Programming Exercises and Code Explanations Using Large Language Models. In Proceedings of the Conference on International Computing Education Research (ICER).
[39]
David H. Smith, Paul Denny, and Max Fowler. 2024. Prompting for Comprehension: Exploring the Intersection of Explain in Plain English Questions and Prompt Writing. In Proceedings of the Conference on Learning @ Scale (L@S).
[40]
Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed H. Chi, Quoc V. Le, and Denny Zhou. 2022. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. In Proceedings of the Annual Conference on Neural Information Processing Systems (NeurIPS).
[41]
Jacqueline Whalley, Amber Settle, and Andrew Luxton-Reilly. 2021. Novice Reflections on Debugging. In Proceedings of the Technical Symposium on Computer Science Education (SIGCSE).

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGCSETS 2025: Proceedings of the 56th ACM Technical Symposium on Computer Science Education V. 1
February 2025
1405 pages
ISBN:9798400705311
DOI:10.1145/3641554
This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 February 2025

Check for updates

Author Tags

  1. bugspotter
  2. debugging
  3. exercise generation
  4. generative ai
  5. llms
  6. programming education
  7. test cases

Qualifiers

  • Research-article

Funding Sources

Conference

SIGCSE TS 2025
Sponsor:

Acceptance Rates

SIGCSETS 2025 Paper Acceptance Rate 192 of 604 submissions, 32%;
Overall Acceptance Rate 1,787 of 5,146 submissions, 35%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 96
    Total Downloads
  • Downloads (Last 12 months)96
  • Downloads (Last 6 weeks)96
Reflects downloads up to 27 Feb 2025

Other Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media