research-article

Open access

Comparing Code Explanations Created by Students and Large Language Models

Authors:

Stephen MacNeil,

Seth Bernstein,

Arto HellasAuthors Info & Claims

ITiCSE 2023: Proceedings of the 2023 Conference on Innovation and Technology in Computer Science Education V. 1

Pages 124 - 130

https://doi.org/10.1145/3587102.3588785

Published: 30 June 2023 Publication History

Abstract

Reasoning about code and explaining its purpose are fundamental skills for computer scientists. There has been extensive research in the field of computing education on the relationship between a student's ability to explain code and other skills such as writing and tracing code. In particular, the ability to describe at a high-level of abstraction how code will behave over all possible inputs correlates strongly with code writing skills. However, developing the expertise to comprehend and explain code accurately and succinctly is a challenge for many students. Existing pedagogical approaches that scaffold the ability to explain code, such as producing exemplar code explanations on demand, do not currently scale well to large classrooms. The recent emergence of powerful large language models (LLMs) may offer a solution. In this paper, we explore the potential of LLMs in generating explanations that can serve as examples to scaffold students' ability to understand and explain code. To evaluate LLM-created explanations, we compare them with explanations created by students in a large course (n ≈ 1000) with respect to accuracy, understandability and length. We find that LLM-created explanations, which can be produced automatically on demand, are rated as being significantly easier to understand and more accurate summaries of code than student-created explanations. We discuss the significance of this finding, and suggest how such models can be incorporated into introductory programming education.

References

[1]

Solmaz Abdi, Hassan Khosravi, Shazia Sadiq, and Gianluca Demartini. 2021. Evaluating the Quality of Learning Resources: A Learnersourcing Approach. IEEE Transactions on Learning Technologies, Vol. 14, 1 (2021), 81--92.

[2]

Siti-Soraya Abdul-Rahman and Benedict du Boulay. 2014. Learning programming via worked-examples: Relation of learning styles to cognitive load. Computers in Human Behavior, Vol. 30 (2014), 286--298. https://doi.org/10.1016/j.chb.2013.09.007

Digital Library

[3]

Brett A. Becker, Paul Denny, James Finnie-Ansley, Andrew Luxton-Reilly, James Prather, and Eddie Antonio Santos. 2023. Programming Is Hard - Or at Least It Used to Be: Educational Opportunities and Challenges of AI Code Generation. In Proc. of the 54th ACM Technical Symp. on Computer Science Education V. 1. ACM.

Digital Library

[4]

Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. 2020. Language models are few-shot learners. Advances in neural information processing systems, Vol. 33 (2020), 1877--1901.

[5]

Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, et al. 2021. Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374 (2021).

[6]

Bas Cornelissen, Andy Zaidman, and Arie van Deursen. 2011. A Controlled Experiment for Program Comprehension through Trace Visualization. IEEE Transactions on Software Engineering, Vol. 37, 3 (2011), 341--355.

Digital Library

[7]

Kathryn Cunningham, Yike Qiao, Alex Feng, and Eleanor O'Rourke. 2022. Bringing "High-Level" Down to Earth: Gaining Clarity in Conversational Programmer Learning Goals. In Proc. of the 53rd ACM Technical Symp. on Computer Science Education V. 1 (Providence, RI, USA) (SIGCSE 2022). ACM, 551--557.

Digital Library

[8]

Paul Denny, Viraj Kumar, and Nasser Giacaman. 2023. Conversing with Copilot: Exploring prompt engineering for solving CS1 problems using natural language. In Proc. of the 54th ACM Technical Symp. on Computer Science Education V. 1.

Digital Library

[9]

Paul Denny, Andrew Luxton-Reilly, and Beth Simon. 2009. Quality of Student Contributed Questions Using PeerWise. In Proc. of the Eleventh Australasian Conf. on Computing Education - Volume 95. Australian Computer Society, Inc.

[10]

Paul Denny, Andrew Luxton-Reilly, and Ewan Tempero. 2012. All Syntax Errors Are Not Equal. In Proc. of the 17th ACM Annual Conf. on Innovation and Technology in Computer Science Education. ACM, New York, NY, USA.

Digital Library

[11]

Paul Denny, Sami Sarsa, Arto Hellas, and Juho Leinonen. 2022. Robosourcing Educational Resources--Leveraging Large Language Models for Learnersourcing. arXiv preprint arXiv:2211.04715 (2022).

[12]

Andrew Ettles, Andrew Luxton-Reilly, and Paul Denny. 2018. Common logic errors made by novice programmers. In Proc. of the 20th Australasian Computing Education Conf. 83--89.

Digital Library

[13]

James Finnie-Ansley, Paul Denny, Brett A. Becker, Andrew Luxton-Reilly, and James Prather. 2022. The Robots Are Coming: Exploring the Implications of OpenAI Codex on Introductory Programming. In Australasian Computing Education Conf. ACM, 10--19.

[14]

Jean M. Griffin. 2016. Learning by Taking Apart: Deconstructing Code by Reading, Tracing, and Debugging. In Proc. of the 17th Annual Conf. on Information Technology Education. ACM, 148--153.

Digital Library

[15]

Philip J Guo. 2013. Online python tutor: embeddable web-based program visualization for cs education. In Proc. of the 44th ACM technical Symp. on Computer science education. 579--584.

Digital Library

[16]

Brian Hanks, Sue Fitzgerald, Renée McCauley, Laurie Murphy, and Carol Zander. 2011. Pair programming in education: a literature review. Computer Science Education, Vol. 21, 2 (2011), 135--173. https://doi.org/10.1080/08993408.2011.579808

[17]

Regina Hebig, Truong Ho-Quang, Rodi Jolak, Jan Schröder, Humberto Linero, Magnus Ågren, and Salome Honest Maro. 2020. How do Students Experience and Judge Software Comprehension Techniques?. In Proc. of the 28th Int. Conf. on Program Comprehension. 425--435.

Digital Library

[18]

Julie S Hui, Darren Gergle, and Elizabeth M Gerber. 2018. Introassist: A tool to support writing introductory help requests. In Proc. of the 2018 CHI Conf. on Human Factors in Computing Systems. 1--13.

Digital Library

[19]

Dave S Kerby. 2014. The simple difference formula: An approach to teaching nonparametric correlation. Comprehensive Psychology, Vol. 3 (2014), 11--IT.

[20]

Teemu Lehtinen, Lassi Haaranen, and Juho Leinonen. 2023. Automated Questionnaires About Students' JavaScript Programs: Towards Gauging Novice Programming Processes. In Proc. of the 25th Australasian Computing Education Conf.

Digital Library

[21]

Teemu Lehtinen, Aleksi Lukkarinen, and Lassi Haaranen. 2021. Students Struggle to Explain Their Own Program Code. In Proc. of the 26th ACM Conf. on Innovation and Technology in Computer Science Education V. 1. ACM, 206--212.

Digital Library

[22]

Juho Leinonen, Arto Hellas, Sami Sarsa, Brent Reeves, Paul Denny, James Prather, and Brett A. Becker. 2023. Using Large Language Models to Enhance Programming Error Messages. In Proc. of the 54th ACM Technical Symp. on Computer Science Education V. 1. 563--569.

[23]

Juho Leinonen, Nea Pirttinen, and Arto Hellas. 2020. Crowdsourcing Content Creation for SQL Practice. In Proc. of the 2020 ACM Conf. on Innovation and Technology in Computer Science Education. 349--355.

Digital Library

[24]

Raymond Lister, Colin Fidge, and Donna Teague. 2009. Further Evidence of a Relationship between Explaining, Tracing and Writing Skills in Introductory Programming. SIGCSE Bull., Vol. 41, 3 (2009), 161--165.

Digital Library

[25]

Stephen MacNeil, Zijian Ding, Kexin Quan, Thomas j Parashos, Yajie Sun, and Steven P Dow. 2021. Framing Creative Work: Helping Novices Frame Better Problems through Interactive Scaffolding. In Creativity and Cognition. 1--10.

[26]

Stephen MacNeil, Andrew Tran, Arto Hellas, Joanne Kim, Sami Sarsa, Paul Denny, Seth Bernstein, and Juho Leinonen. 2023. Experiences from using code explanations generated by large language models in a web software development e-book. In Proc. of the 54th ACM Technical Symp. on Computer Science Education.

Digital Library

[27]

Stephen MacNeil, Andrew Tran, Dan Mogil, Seth Bernstein, Erin Ross, and Ziheng Huang. 2022. Generating Diverse Code Explanations Using the GPT-3 Large Language Model. In Proc. of the 2022 ACM Conf. on Int. Computing Education Research - Volume 2. ACM, 37--39.

Digital Library

[28]

Henry B Mann and Donald R Whitney. 1947. On a test of whether one of two random variables is stochastically larger than the other. The annals of mathematical statistics (1947), 50--60.

[29]

Samiha Marwan, Nicholas Lytle, Joseph Jay Williams, and Thomas Price. 2019. The Impact of Adding Textual Explanations to Next-Step Hints in a Novice Programming Environment. In Proc. of the 2019 ACM Conf. on Innovation and Technology in Computer Science Education. ACM, 520--526.

Digital Library

[30]

Kenneth O McGraw and Seok P Wong. 1992. A common language effect size statistic. Psychological bulletin, Vol. 111, 2 (1992), 361.

[31]

Laurie Murphy, Sue Fitzgerald, Raymond Lister, and Renée McCauley. 2012. Ability to 'explain in Plain English' Linked to Proficiency in Computer-Based Programming. In Proc. of the Ninth Annual Int. Conf. on Int. Computing Education Research. ACM, 111--118.

Digital Library

[32]

Henrik Nygren, Juho Leinonen, Nea Pirttinen, Antti Leinonen, and Arto Hellas. 2019. Experimenting with model solutions as a support mechanism. In Proc. of the 1st UK & Ireland Computing Education Research Conf. 1--7.

Digital Library

[33]

Steve Oney, Christopher Brooks, and Paul Resnick. 2018. Creating guided code explanations with chat.codes. Proc. of the ACM on Human-Computer Interaction, Vol. 2, CSCW (2018), 1--20.

Digital Library

[34]

Nea Pirttinen, Vilma Kangas, Irene Nikkarinen, Henrik Nygren, Juho Leinonen, and Arto Hellas. 2018. Crowdsourcing programming assignments with CrowdSorcerer. In Proc. of the 23rd Annual ACM Conf. on Innovation and Technology in Computer Science Education. 326--331.

Digital Library

[35]

Nea Pirttinen and Juho Leinonen. 2022. Can Students Review Their Peers? Comparison of Peer and Instructor Reviews. In Proc. of the 27th ACM Conf. on Innovation and Technology in Computer Science Education Vol 1.

[36]

Margaret M Reek. 1995. A top-down approach to teaching programming. In Proc. of the twenty-sixth SIGCSE technical symp. on Computer science education. 6--9.

Digital Library

[37]

Kate Sanders, Judy Sheard, Brett A Becker, Anna Eckerdal, and Sally Hamouda. 2019. Inferential statistics in computing education research: A methodological review. In Proc. of the 2019 ACM conf. on int. comp. education research. 177--185.

Digital Library

[38]

Sami Sarsa, Paul Denny, Arto Hellas, and Juho Leinonen. 2022. Automatic Generation of Programming Exercises and Code Explanations Using Large Language Models. In Proc. of the 2022 ACM Conf. on Int. Computing Education Research - Volume 1. ACM, 27--43.

Digital Library

[39]

Judy Sheard, Angela Carbone, Raymond Lister, Beth Simon, Errol Thompson, and Jacqueline L. Whalley. 2008. Going SOLO to Assess Novice Programmers. In Proc. of the 13th Annual Conf. on Innovation and Technology in Computer Science Education. ACM, 209--213.

[40]

Simon and Susan Snowdon. 2011. Explaining Program Code: Giving Students the Answer Helps - but Only Just. In Proc. of the Seventh Int. Workshop on Computing Education Research. ACM, 93--100.

[41]

Leigh Ann Sudol-DeLyser, Mark Stehlik, and Sharon Carver. 2012. Code Comprehension Problems as Learning Events. In Proc. of the 17th ACM Annual Conf. on Innovation and Technology in Computer Science Education. ACM, 81--86.

Digital Library

[42]

Ron Sun, Edward Merrill, and Todd Peterson. 2000. Knowledge Acquisition Via Bottom-up Learning. Knowledge-Based Systems (2000), 249--291.

[43]

Zahid Ullah, Adidah Lajis, Mona Jamjoom, Abdulrahman Altalhi, Abdullah Al-Ghamdi, and Farrukh Saleem. 2018. The effect of automatic assessment on novice programming: Strengths and limitations of existing systems. Computer Applications in Engineering Education, Vol. 26, 6 (2018), 2328--2341.

[44]

Arto Vihavainen, Craig S Miller, and Amber Settle. 2015. Benefits of self-explanation in introductory programming. In Proc. of the 46th ACM Technical Symp. on Computer Science Education. 284--289.

Digital Library

[45]

A. Von Mayrhauser and A.M. Vans. 1995. Program comprehension during software maintenance and evolution. Computer, Vol. 28, 8 (1995), 44--55.

Digital Library

[46]

Ronald L Wasserstein and Nicole A Lazar. 2016. The ASA statement on p-values: context, process, and purpose. The American Statistician, Vol. 70, 2 (2016), 129--133.

[47]

Jacqueline L. Whalley, Raymond Lister, Errol Thompson, Tony Clear, Phil Robbins, P. K. Ajith Kumar, and Christine Prasad. 2006. An Australasian Study of Reading and Comprehension Skills in Novice Programmers, Using the Bloom and SOLO Taxonomies. In Proc. of the 8th Australasian Conf. on Computing Education - Volume 52. Australian Computer Society, Inc., AUS, 243--252.

[48]

Rui Zhi, Thomas W. Price, Samiha Marwan, Alexandra Milliken, Tiffany Barnes, and Min Chi. 2019. Exploring the Impact of Worked Examples in a Novice Programming Environment. In Proc. of the 50th ACM Technical Symp. on Computer Science Education. ACM, 98--104.

Digital Library

Cited By

Poitras ECrane BDempsey DBragg TSiegel ALin M(2024)Cognitive Apprenticeship and Artificial Intelligence Coding AssistantsNavigating Computer Science Education in the 21st Century10.4018/979-8-3693-1066-3.ch013(261-281)Online publication date: 26-Feb-2024
https://doi.org/10.4018/979-8-3693-1066-3.ch013
Dvivedi SVijay VPujari SLodh SKumar DAdams BZimmermann TOzkaya ILin DZhang J(2024)A Comparative Analysis of Large Language Models for Code Documentation GenerationProceedings of the 1st ACM International Conference on AI-Powered Software10.1145/3664646.3664765(65-73)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.1145/3664646.3664765
Nguyen HStott NAllan VJoyner DKim MWang XXia M(2024)Comparing Feedback from Large Language Models and Instructors: Teaching Computer Science at ScaleProceedings of the Eleventh ACM Conference on Learning @ Scale10.1145/3657604.3664660(335-339)Online publication date: 9-Jul-2024
https://dl.acm.org/doi/10.1145/3657604.3664660
Show More Cited By

Index Terms

Comparing Code Explanations Created by Students and Large Language Models
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Natural language generation
2. Social and professional topics
  1. Professional topics
    1. Computing education

Recommendations

Generating Diverse Code Explanations using the GPT-3 Large Language Model
ICER '22: Proceedings of the 2022 ACM Conference on International Computing Education Research - Volume 2

Good explanations are essential to efficiently learning introductory programming concepts [10]. To provide high-quality explanations at scale, numerous systems automate the process by tracing the execution of code [8, 12], defining terms [9], giving ...
Automatic Generation of Programming Exercises and Code Explanations Using Large Language Models
ICER '22: Proceedings of the 2022 ACM Conference on International Computing Education Research - Volume 1

This article explores the natural language generation capabilities of large language models with application to the production of two types of learning resources common in programming courses. Using OpenAI Codex as the large language model, we create ...
The Implications of Large Language Models for CS Teachers and Students
SIGCSE 2023: Proceedings of the 54th ACM Technical Symposium on Computer Science Education V. 2

The introduction of Large Language Models (LLMs) has generated a significant amount of excitement both in industry and among researchers. Recently, tools that leverage LLMs have made their way into the classroom where they help students generate code and ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

ITiCSE 2023: Proceedings of the 2023 Conference on Innovation and Technology in Computer Science Education V. 1

June 2023

694 pages

ISBN:9798400701382

DOI:10.1145/3587102

General Chairs:
Mikko-Jussi Laakso
University of Turku, Finland
,
Mattia Monga
University of Milan, Italy
,
Program Chairs:
Simon
Unaffiliated, Australia
,
Judithe Sheard
Monash University, Australia

Copyright © 2023 Owner/Author.

This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

SIGCSE: ACM Special Interest Group on Computer Science Education

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 June 2023

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Ulla Tuominen Foundation

Conference

ITiCSE 2023

Sponsor:

SIGCSE

ITiCSE 2023: Innovation and Technology in Computer Science Education

July 7 - 12, 2023

Turku, Finland

Acceptance Rates

Overall Acceptance Rate 552 of 1,613 submissions, 34%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

55
Total Citations
View Citations
1,309
Total Downloads

Downloads (Last 12 months)1,111
Downloads (Last 6 weeks)139

Reflects downloads up to 01 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Poitras ECrane BDempsey DBragg TSiegel ALin M(2024)Cognitive Apprenticeship and Artificial Intelligence Coding AssistantsNavigating Computer Science Education in the 21st Century10.4018/979-8-3693-1066-3.ch013(261-281)Online publication date: 26-Feb-2024
https://doi.org/10.4018/979-8-3693-1066-3.ch013
Dvivedi SVijay VPujari SLodh SKumar DAdams BZimmermann TOzkaya ILin DZhang J(2024)A Comparative Analysis of Large Language Models for Code Documentation GenerationProceedings of the 1st ACM International Conference on AI-Powered Software10.1145/3664646.3664765(65-73)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.1145/3664646.3664765
Nguyen HStott NAllan VJoyner DKim MWang XXia M(2024)Comparing Feedback from Large Language Models and Instructors: Teaching Computer Science at ScaleProceedings of the Eleventh ACM Conference on Learning @ Scale10.1145/3657604.3664660(335-339)Online publication date: 9-Jul-2024
https://dl.acm.org/doi/10.1145/3657604.3664660
Gabbay HCohen AJoyner DKim MWang XXia M(2024)Combining LLM-Generated and Test-Based Feedback in a MOOC for ProgrammingProceedings of the Eleventh ACM Conference on Learning @ Scale10.1145/3657604.3662040(177-187)Online publication date: 9-Jul-2024
https://dl.acm.org/doi/10.1145/3657604.3662040
Zhang ATang XOney SChen YJoyner DKim MWang XXia M(2024)CFlow: Supporting Semantic Flow Analysis of Students' Code in Programming Problems at ScaleProceedings of the Eleventh ACM Conference on Learning @ Scale10.1145/3657604.3662025(188-199)Online publication date: 9-Jul-2024
https://dl.acm.org/doi/10.1145/3657604.3662025
Prather JLeinonen JKiesler NBenario JLau SMacNeil SNorouzi NOpel SPettit VPorter LReeves BSavelka JSmith DStrickroth SZingaro DMonga MLonati VBarendsen ESheard JPaterson J(2024)How Instructors Incorporate Generative AI into Teaching ComputingProceedings of the 2024 on Innovation and Technology in Computer Science Education V. 210.1145/3649405.3659534(771-772)Online publication date: 8-Jul-2024
https://dl.acm.org/doi/10.1145/3649405.3659534
Mahon JMac Namee BBecker BMonga MLonati VBarendsen ESheard JPaterson J(2024)Guidelines for the Evolving Role of Generative AI in Introductory Programming Based on Emerging PracticeProceedings of the 2024 on Innovation and Technology in Computer Science Education V. 110.1145/3649217.3653602(10-16)Online publication date: 3-Jul-2024
https://dl.acm.org/doi/10.1145/3649217.3653602
Cucuiat VWaite JMonga MLonati VBarendsen ESheard JPaterson J(2024)Feedback Literacy: Holistic Analysis of Secondary Educators' Views of LLM Explanations of Program Error MessagesProceedings of the 2024 on Innovation and Technology in Computer Science Education V. 110.1145/3649217.3653595(192-198)Online publication date: 3-Jul-2024
https://dl.acm.org/doi/10.1145/3649217.3653595
Azaiz IKiesler NStrickroth SMonga MLonati VBarendsen ESheard JPaterson J(2024)Feedback-Generation for Programming Exercises With GPT-4Proceedings of the 2024 on Innovation and Technology in Computer Science Education V. 110.1145/3649217.3653594(31-37)Online publication date: 3-Jul-2024
https://dl.acm.org/doi/10.1145/3649217.3653594
Denny PMacNeil SSavelka JPorter LLuxton-Reilly AMonga MLonati VBarendsen ESheard JPaterson J(2024)Desirable Characteristics for AI Teaching Assistants in Programming EducationProceedings of the 2024 on Innovation and Technology in Computer Science Education V. 110.1145/3649217.3653574(408-414)Online publication date: 3-Jul-2024
https://dl.acm.org/doi/10.1145/3649217.3653574
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents