Nothing Special   »   [go: up one dir, main page]

Skip to main content

Automatic Grading of Student Code with Similarity Measurement

  • Conference paper
  • First Online:
Machine Learning and Knowledge Discovery in Databases (ECML PKDD 2022)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13718))

Abstract

Nowadays, online judges are extensively used for automatically grading student code. However, they grade code by only counting the number of passed test cases, which is not fair for assessing the overall quality of a code snippet. On the other hand, existing studies have used machine learning techniques for code grading. However, they usually require large amounts of labeled code to enable supervised learning and heavily rely on feature engineering. In this work, we design SimGrader, a code grading system that grades student code based on the measurement of similarity to the “good” code, and thus save the effort for code labeling. We extract three types of features to capture the overall quality of a code snippet, and design specific methods to enhance the feature discrimination, which facilitates the similarity measurement. We conduct extensive experiments to show the superiority of SimGrader over existing methods and justify the effect of the its system components. We deploy SimGrader to grade the student code submitted in an introductory programming course.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://github.com/eliben/pycparser.

  2. 2.

    https://github.com/cpplint.

  3. 3.

    https://cppcheck.sourceforge.io/.

References

  1. Alon, U., Brody, S., Levy, O., Yahav, E.: code2seq: generating sequences from structured representations of code. In: International Conference on Learning Representations (2018)

    Google Scholar 

  2. Alon, U., Zilberstein, M., Levy, O., Yahav, E.: code2vec: learning distributed representations of code. Proc. ACM Program. Lang. 3(POPL), 1–29 (2019)

    Google Scholar 

  3. Bielik, P., Raychev, V., Vechev, M.: PHOG: probabilistic model for code. In: International Conference on Machine Learning, pp. 2933–2942. PMLR (2016)

    Google Scholar 

  4. Bui, N.D., Yu, Y., Jiang, L.: Self-supervised contrastive learning for code retrieval and summarization via semantic-preserving transformations. In: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 511–521 (2021)

    Google Scholar 

  5. Bui, N.D., Yu, Y., Jiang, L.: TreeCaps: tree-based capsule networks for source code processing. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 30–38 (2021)

    Google Scholar 

  6. Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: International Conference on Machine Learning, pp. 1597–1607. PMLR (2020)

    Google Scholar 

  7. Dong, Yu., Hou, J., Lu, X.: An intelligent online judge system for programming training. In: Nah, Y., Cui, B., Lee, S.-W., Yu, J.X., Moon, Y.-S., Whang, S.E. (eds.) DASFAA 2020. LNCS, vol. 12114, pp. 785–789. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-59419-0_57

    Chapter  Google Scholar 

  8. Hofmeister, J., Siegmund, J., Holt, D.V.: Shorter identifier names take longer to comprehend. In: 2017 IEEE 24th International Conference on Software Analysis, Evolution and Reengineering (SANER), pp. 217–227. IEEE (2017)

    Google Scholar 

  9. Johnson-Yu, S., Bowman, N., Sahami, M., Piech, C.: SimGrade: using code similarity measures for more accurate human grading. In: Proceedings of the 14th International Conference on Educational Data Mining, EDM 2021, virtual, 29 June–2 July 2021 (2021)

    Google Scholar 

  10. Kim, S., Park, J., Jeon, S., Seo, D.: Web-based online judge system for online programming education. In: 2022 IEEE International Conference on Consumer Electronics (ICCE), pp. 1–3. IEEE (2022)

    Google Scholar 

  11. Li, Z., Li, L., Wu, Y., Liu, Y., Chen, X.: Automated student code scoring by analyzing grammatical and semantic information of code. In: 2021 16th International Conference on Computer Science & Education (ICCSE), pp. 963–968. IEEE (2021)

    Google Scholar 

  12. Mou, L., Li, G., Zhang, L., Wang, T., Jin, Z.: Convolutional neural networks over tree structures for programming language processing. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 30 (2016)

    Google Scholar 

  13. Van den Oord, A., Li, Y., Vinyals, O.: Representation learning with contrastive predictive coding. arXiv e-prints pp. arXiv-1807 (2018)

    Google Scholar 

  14. Orr, J.W., Russell, N.: Automatic assessment of the design quality of python programs with personalized feedback. In: Proceedings of the 14th International Conference on Educational Data Mining, EDM (2021)

    Google Scholar 

  15. Peruma, A., Arnaoudova, V., Newman, C.D.: Ideal: an open-source identifier name appraisal tool. In: 2021 IEEE International Conference on Software Maintenance and Evolution (ICSME), pp. 599–603. IEEE (2021)

    Google Scholar 

  16. Qin, Y., Sun, G., Li, J., Hu, T., He, Y.: Scg_fbs: a code grading model for students’ program in programming education. In: 2021 13th International Conference on Machine Learning and Computing, pp. 210–216 (2021)

    Google Scholar 

  17. Schroff, F., Kalenichenko, D., Philbin, J.: FaceNet: a unified embedding for face recognition and clustering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 815–823 (2015)

    Google Scholar 

  18. Singh, G., Srikant, S., Aggarwal, V.: Question independent grading using machine learning: the case of computer program grading. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 263–272 (2016)

    Google Scholar 

  19. Srikant, S., Aggarwal, V.: A system to grade computer programming skills using machine learning. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1887–1896 (2014)

    Google Scholar 

  20. Takhar, R., Aggarwal, V.: Grading uncompilable programs. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 9389–9396 (2019)

    Google Scholar 

  21. Wang, G.P., Chen, S.Y., Yang, X., Feng, R.: OJPOT: online judge & practice oriented teaching idea in programming courses. Eur. J. Eng. Educ. 41(3), 304–319 (2016)

    Article  Google Scholar 

  22. Wang, M., Han, W., Chen, W.: MetaOJ: a massive distributed online judge system. Tsinghua Sci. Technol. 26(4), 548–557 (2021)

    Article  Google Scholar 

  23. Wasik, S., Antczak, M., Badura, J., Laskowski, A., Sternal, T.: A survey on online judge systems and their applications. ACM Comput. Surv. 51(1), 1–34 (2018)

    Article  Google Scholar 

  24. Yang, Z., Yang, D., Dyer, C., He, X., Smola, A., Hovy, E.: Hierarchical attention networks for document classification. In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 1480–1489 (2016)

    Google Scholar 

  25. Zhang, J., Wang, X., Zhang, H., Sun, H., Wang, K., Liu, X.: A novel neural source code representation based on abstract syntax tree. In: 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE), pp. 783–794. IEEE (2019)

    Google Scholar 

  26. Zhang, K., Shasha, D.: Simple fast algorithms for the editing distance between trees and related problems. SIAM J. Comput. 18(6), 1245–1262 (1989)

    Article  MathSciNet  Google Scholar 

  27. Zhou, W., Pan, Y., Zhou, Y., Sun, G.: The framework of a new online judge system for programming education. In: Proceedings of ACM Turing Celebration Conference, China, pp. 9–14 (2018)

    Google Scholar 

Download references

Acknowledgement

This work is supported by the grants from the National Natural Science Foundation of China (Grant No. 62137001, 62072185).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xuesong Lu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wang, D., Zhang, E., Lu, X. (2023). Automatic Grading of Student Code with Similarity Measurement. In: Amini, MR., Canu, S., Fischer, A., Guns, T., Kralj Novak, P., Tsoumakas, G. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2022. Lecture Notes in Computer Science(), vol 13718. Springer, Cham. https://doi.org/10.1007/978-3-031-26422-1_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-26422-1_18

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-26421-4

  • Online ISBN: 978-3-031-26422-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics