Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3230977.3230999acmconferencesArticle/Chapter ViewAbstractPublication PagesicerConference Proceedingsconference-collections
research-article
Public Access

Who Tests the Testers?

Published: 08 August 2018 Publication History

Abstract

Instructors routinely use automated assessment methods to evaluate the semantic qualities of student implementations and, sometimes, test suites. In this work, we distill a variety of automated assessment methods in the literature down to a pair of assessment models. We identify pathological assessment outcomes in each model that point to underlying methodological flaws. These theoretical flaws broadly threaten the validity of the techniques, and we actually observe them in multiple assignments of an introductory programming course. We propose adjustments that remedy these flaws and then demonstrate, on these same assignments, that our interventions improve the accuracy of assessment. We believe that with these adjustments, instructors can greatly improve the accuracy of automated assessment.

References

[1]
Kalle Aaltonen, Petri Ihantola, and Otto Sepp"al"a . 2010. Mutation Analysis vs. Code Coverage in Automated Assessment of Students' Testing Skills. In Proceedings of the ACM International Conference Companion on Object Oriented Programming Systems Languages and Applications Companion (OOPSLA '10). ACM, New York, NY, USA, 153--160.
[2]
Michael K. Bradshaw . 2015. Ante Up: A Framework to Strengthen Student-Based Testing of Assignments Proceedings of the 46th ACM Technical Symposium on Computer Science Education (SIGCSE '15). ACM, New York, NY, USA, 488--493.
[3]
Koen Claessen and John Hughes . 2000. QuickCheck: A Lightweight Tool for Random Testing of Haskell Programs Proceedings of the Fifth ACM SIGPLAN International Conference on Functional Programming (ICFP '00). ACM, New York, NY, USA, 268--279.
[4]
Jeffrey Dean and Sanjay Ghemawat . 2008. MapReduce: Simplified Data Processing on Large Clusters. Commun. ACM Vol. 51, 1 (Jan. . 2008), 107--113.
[5]
The Pyret Project Developers . 2018 a. The Pyret Programming Language. Chapter 2.2.deftempurl%https://www.pyret.org/docs/latest/testing.html tempurl
[6]
The Rust Project Developers . 2018 b. The Rust Programming Language. Chapter 11.deftempurl%https://doc.rust-lang.org/book/second-edition/ch11-00-testing.html tempurl
[7]
Stephen H. Edwards . 2003 a. Improving Student Performance by Evaluating How Well Students Test Their Own Programs. Journal on Educational Resources in Computing Vol. 3, 3, Article bibinfoarticleno1 (Sept. . 2003).
[8]
Stephen H. Edwards . 2003 b. Improving Student Performance by Evaluating How Well Students Test Their Own Programs. J. Educ. Resour. Comput. Vol. 3, 3, Article bibinfoarticleno1 (Sept. . 2003).
[9]
Stephen H. Edwards and Zalia Shams . 2014 a. Comparing Test Quality Measures for Assessing Student-written Tests Companion Proceedings of the 36th International Conference on Software Engineering (ICSE Companion 2014). ACM, New York, NY, USA, 354--363.
[10]
Stephen H. Edwards and Zalia Shams . 2014 b. Do Student Programmers All Tend to Write the Same Software Tests?killpunct ITiCSE. ACM, New York, NY, USA, 171--176.
[11]
Stephen H. Edwards, Zalia Shams, Michael Cogswell, and Robert C. Senkbeil . 2012. Running Students' Software Tests Against Each Others' Code: New Life for an Old "Gimmick". In Proceedings of the 43rd ACM Technical Symposium on Computer Science Education (SIGCSE '12). ACM, New York, NY, USA, 221--226.
[12]
John English . 2004. Automated Assessment of GUI Programs Using JEWL. In Proceedings of the 9th Annual SIGCSE Conference on Innovation and Technology in Computer Science Education (ITiCSE '04). ACM, New York, NY, USA, 137--141.
[13]
Matthias Felleisen, Robert Bruce Findler, Matthew Flatt, and Shriram Krishnamurthi . 2001. How to Design Programs. MIT Press.
[14]
Matthias Felleisen, Robert Bruce Findler, Matthew Flatt, and Shriram Krishnamurthi . 2018. How to Design Programs (bibinfoeditionsecond ed.). MIT Press.
[15]
George E. Forsythe and Niklaus Wirth . 1965. Automatic Grading Programs. Commun. ACM Vol. 8, 5 (May . 1965), 275--278.
[16]
Eric Foxley, Omar Salman, and Zarina Shukur . 1997. The Automatic Assessment of Z Specifications. In The Supplemental Proceedings of the Conference on Integrating Technology into Computer Science Education: Working Group Reports and Supplemental Proceedings (ITiCSE-WGR '97). ACM, New York, NY, USA, 129--131.
[17]
Xiang Fu, Boris Peltsverger, Kai Qian, Lixin Tao, and Jigang Liu . 2008. APOGEE: Automated Project Grading and Instant Feedback System for Web Based Computing. In Proceedings of the 39th SIGCSE Technical Symposium on Computer Science Education (SIGCSE '08). ACM, New York, NY, USA, 77--81.
[18]
Michael H. Goldwasser . 2002. A Gimmick to Integrate Software Testing Throughout the Curriculum Proceedings of the 33rd SIGCSE Technical Symposium on Computer Science Education (SIGCSE '02). ACM, New York, NY, USA, 271--275.
[19]
J. B. Hext and J. W. Winings . 1969. An Automatic Grading Scheme for Simple Programming Exercises. Commun. ACM Vol. 12, 5 (May . 1969), 272--275.
[20]
Laura Inozemtseva and Reid Holmes . 2014. Coverage is Not Strongly Correlated with Test Suite Effectiveness Proceedings of the 36th International Conference on Software Engineering (ICSE 2014). ACM, New York, NY, USA, 435--445.
[21]
Peter C. Isaacson and Terry A. Scott . 1989. Automating the Execution of Student Programs. SIGCSE Bull. Vol. 21, 2 (June . 1989), 15--22.
[22]
David Jackson . 2000. A Semi-automated Approach to Online Assessment. In Proceedings of the 5th Annual SIGCSE/SIGCUE ITiCSEconference on Innovation and Technology in Computer Science Education (ITiCSE '00). ACM, New York, NY, USA, 164--167.
[23]
David Jackson and Michelle Usher . 1997. Grading Student Programs Using ASSYST. In Proceedings of the Twenty-eighth SIGCSE Technical Symposium on Computer Science Education (SIGCSE '97). ACM, New York, NY, USA, 335--339.
[24]
David G. Kay, Terry Scott, Peter Isaacson, and Kenneth A. Reek . 1994. Automated Grading Assistance for Student Programs. In Proceedings of the Twenty-fifth SIGCSE Symposium on Computer Science Education (SIGCSE '94). ACM, New York, NY, USA, 381--382.
[25]
Will Marrero and Amber Settle . 2005. Testing First: Emphasizing Testing in Early Programming Courses Proceedings of the 10th Annual SIGCSE Conference on Innovation and Technology in Computer Science Education (ITiCSE '05). ACM, New York, NY, USA, 4--8.
[26]
Sebastian Pape, Julian Flake, Andreas Beckmann, and Jan Jürjens . 2016. STAGE: A Software Tool for Automatic Grading of Testing Exercises: Case Study Paper. In Proceedings of the 38th International Conference on Software Engineering Companion (ICSE '16). ACM, New York, NY, USA, 491--500.
[27]
Joe Gibbs Politz, Shriram Krishnamurthi, and Kathi Fisler . 2014. In-flow Peer-review of Tests in Test-first Programming ICER. ACM, New York, NY, USA, 11--18.
[28]
Jon Postel . 1980. Transmission Control Protocol. Internet-Draft. Internet Engineering Task Force. deftempurl%https://tools.ietf.org/html/rfc761 tempurl
[29]
Gerard Salton, Anita Wong, and Chung-Shu Yang . 1975. A Vector Space Model for Automatic Indexing. Commun. ACM Vol. 18, 11 (Nov. . 1975), 613--620.
[30]
Zalia Shams and Stephen H. Edwards . 2013. Toward Practical Mutation Analysis for Evaluating the Quality of Student-written Software Tests. In Proceedings of the Ninth Annual International ACM Conference on International Computing Education Research (ICER '13). ACM, New York, NY, USA, 53--58.
[31]
K. K. Sharma, Kunal Banerjee, and Chittaranjan Mandal . 2014. A Scheme for Automated Evaluation of Programming Assignments Using FSMD Based Equivalence Checking. In Proceedings of the 6th IBM Collaborative Academia Research Exchange Conference (I-CARE) on I-CARE 2014 (I-CARE 2014). ACM, New York, NY, USA, Article bibinfoarticleno10, bibinfonumpages4 pages.
[32]
Joanna Smith, Joe Tessler, Elliot Kramer, and Calvin Lin . 2012. Using Peer Review to Teach Software Testing. In Proceedings of the Ninth Annual International Conference on International Computing Education Research (ICER '12). ACM, New York, NY, USA, 93--98.
[33]
Rebecca Smith, Terry Tang, Joe Warren, and Scott Rixner . 2017. An Automated System for Interactively Learning Software Testing Proceedings of the 2017 ACM Conference on Innovation and Technology in Computer Science Education (ITiCSE '17). ACM, New York, NY, USA, 98--103.
[34]
Jaime Spacco, Jaymie Strecker, David Hovemeyer, and William Pugh . 2005. Software Repository Mining with Marmoset: An Automated Programming Project Snapshot and Testing System. In Proceedings of the 2005 International Workshop on Mining Software Repositories (MSR '05). ACM, New York, NY, USA, 1--5.
[35]
Matthew Thornton, Stephen H. Edwards, Roy P. Tan, and Manuel A. Pérez-Qui nones . 2008. Supporting Student-written Tests of Gui Programs. In Proceedings of the 39th SIGCSE Technical Symposium on Computer Science Education (SIGCSE '08). ACM, New York, NY, USA, 537--541.
[36]
Urs von Matt . 1994. Kassandra: The Automatic Grading System. SIGCUE Outlook Vol. 22, 1 (Jan. . 1994), 26--40.

Cited By

View all
  • (2024)Forge: A Tool and Language for Teaching Formal MethodsProceedings of the ACM on Programming Languages10.1145/36498338:OOPSLA1(613-641)Online publication date: 29-Apr-2024
  • (2024)Testing and Debugging Habits of Intermediate Student Programmers2024 IEEE Global Engineering Education Conference (EDUCON)10.1109/EDUCON60312.2024.10578650(1-10)Online publication date: 8-May-2024
  • (2023)A Model of How Students Engineer Test Cases With FeedbackACM Transactions on Computing Education10.1145/362860424:1(1-31)Online publication date: 20-Oct-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ICER '18: Proceedings of the 2018 ACM Conference on International Computing Education Research
August 2018
307 pages
ISBN:9781450356282
DOI:10.1145/3230977
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 August 2018

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. advice to instructors
  2. assessing implementations
  3. assessing test suites
  4. automated assessment
  5. program correctness

Qualifiers

  • Research-article

Funding Sources

  • US National Science Foundation

Conference

ICER '18
Sponsor:

Acceptance Rates

ICER '18 Paper Acceptance Rate 28 of 125 submissions, 22%;
Overall Acceptance Rate 189 of 803 submissions, 24%

Upcoming Conference

ICER 2025
ACM Conference on International Computing Education Research
August 3 - 6, 2025
Charlottesville , VA , USA

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)77
  • Downloads (Last 6 weeks)8
Reflects downloads up to 20 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Forge: A Tool and Language for Teaching Formal MethodsProceedings of the ACM on Programming Languages10.1145/36498338:OOPSLA1(613-641)Online publication date: 29-Apr-2024
  • (2024)Testing and Debugging Habits of Intermediate Student Programmers2024 IEEE Global Engineering Education Conference (EDUCON)10.1109/EDUCON60312.2024.10578650(1-10)Online publication date: 8-May-2024
  • (2023)A Model of How Students Engineer Test Cases With FeedbackACM Transactions on Computing Education10.1145/362860424:1(1-31)Online publication date: 20-Oct-2023
  • (2023)Evaluating Copilot on CS1 Code Writing Problems with Suppressed SpecificationsProceedings of the 16th Annual ACM India Compute Conference10.1145/3627217.3627235(104-107)Online publication date: 9-Dec-2023
  • (2023)Proving and Disproving Equivalence of Functional Programming AssignmentsProceedings of the ACM on Programming Languages10.1145/35912587:PLDI(928-951)Online publication date: 6-Jun-2023
  • (2023)Investigating the Potential of GPT-3 in Providing Feedback for Programming AssessmentsProceedings of the 2023 Conference on Innovation and Technology in Computer Science Education V. 110.1145/3587102.3588852(292-298)Online publication date: 29-Jun-2023
  • (2023)Using Model-Checking and Peer-Grading to Provide Automated Feedback to Concurrency Exercises in ProgvisProceedings of the 25th Australasian Computing Education Conference10.1145/3576123.3576125(11-20)Online publication date: 30-Jan-2023
  • (2022)Making Hay from Wheats: A Classsourcing Method to Identify MisconceptionsProceedings of the 22nd Koli Calling International Conference on Computing Education Research10.1145/3564721.3564726(1-7)Online publication date: 17-Nov-2022
  • (2022)On the use of mutation analysis for evaluating student test suite qualityProceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3533767.3534217(263-275)Online publication date: 18-Jul-2022
  • (2022)Automatic Generation of Programming Exercises and Code Explanations Using Large Language ModelsProceedings of the 2022 ACM Conference on International Computing Education Research - Volume 110.1145/3501385.3543957(27-43)Online publication date: 3-Aug-2022
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media