Abstract
Nowadays, online judges are extensively used for automatically grading student code. However, they grade code by only counting the number of passed test cases, which is not fair for assessing the overall quality of a code snippet. On the other hand, existing studies have used machine learning techniques for code grading. However, they usually require large amounts of labeled code to enable supervised learning and heavily rely on feature engineering. In this work, we design SimGrader, a code grading system that grades student code based on the measurement of similarity to the “good” code, and thus save the effort for code labeling. We extract three types of features to capture the overall quality of a code snippet, and design specific methods to enhance the feature discrimination, which facilitates the similarity measurement. We conduct extensive experiments to show the superiority of SimGrader over existing methods and justify the effect of the its system components. We deploy SimGrader to grade the student code submitted in an introductory programming course.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Alon, U., Brody, S., Levy, O., Yahav, E.: code2seq: generating sequences from structured representations of code. In: International Conference on Learning Representations (2018)
Alon, U., Zilberstein, M., Levy, O., Yahav, E.: code2vec: learning distributed representations of code. Proc. ACM Program. Lang. 3(POPL), 1–29 (2019)
Bielik, P., Raychev, V., Vechev, M.: PHOG: probabilistic model for code. In: International Conference on Machine Learning, pp. 2933–2942. PMLR (2016)
Bui, N.D., Yu, Y., Jiang, L.: Self-supervised contrastive learning for code retrieval and summarization via semantic-preserving transformations. In: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 511–521 (2021)
Bui, N.D., Yu, Y., Jiang, L.: TreeCaps: tree-based capsule networks for source code processing. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 30–38 (2021)
Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: International Conference on Machine Learning, pp. 1597–1607. PMLR (2020)
Dong, Yu., Hou, J., Lu, X.: An intelligent online judge system for programming training. In: Nah, Y., Cui, B., Lee, S.-W., Yu, J.X., Moon, Y.-S., Whang, S.E. (eds.) DASFAA 2020. LNCS, vol. 12114, pp. 785–789. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-59419-0_57
Hofmeister, J., Siegmund, J., Holt, D.V.: Shorter identifier names take longer to comprehend. In: 2017 IEEE 24th International Conference on Software Analysis, Evolution and Reengineering (SANER), pp. 217–227. IEEE (2017)
Johnson-Yu, S., Bowman, N., Sahami, M., Piech, C.: SimGrade: using code similarity measures for more accurate human grading. In: Proceedings of the 14th International Conference on Educational Data Mining, EDM 2021, virtual, 29 June–2 July 2021 (2021)
Kim, S., Park, J., Jeon, S., Seo, D.: Web-based online judge system for online programming education. In: 2022 IEEE International Conference on Consumer Electronics (ICCE), pp. 1–3. IEEE (2022)
Li, Z., Li, L., Wu, Y., Liu, Y., Chen, X.: Automated student code scoring by analyzing grammatical and semantic information of code. In: 2021 16th International Conference on Computer Science & Education (ICCSE), pp. 963–968. IEEE (2021)
Mou, L., Li, G., Zhang, L., Wang, T., Jin, Z.: Convolutional neural networks over tree structures for programming language processing. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 30 (2016)
Van den Oord, A., Li, Y., Vinyals, O.: Representation learning with contrastive predictive coding. arXiv e-prints pp. arXiv-1807 (2018)
Orr, J.W., Russell, N.: Automatic assessment of the design quality of python programs with personalized feedback. In: Proceedings of the 14th International Conference on Educational Data Mining, EDM (2021)
Peruma, A., Arnaoudova, V., Newman, C.D.: Ideal: an open-source identifier name appraisal tool. In: 2021 IEEE International Conference on Software Maintenance and Evolution (ICSME), pp. 599–603. IEEE (2021)
Qin, Y., Sun, G., Li, J., Hu, T., He, Y.: Scg_fbs: a code grading model for students’ program in programming education. In: 2021 13th International Conference on Machine Learning and Computing, pp. 210–216 (2021)
Schroff, F., Kalenichenko, D., Philbin, J.: FaceNet: a unified embedding for face recognition and clustering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 815–823 (2015)
Singh, G., Srikant, S., Aggarwal, V.: Question independent grading using machine learning: the case of computer program grading. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 263–272 (2016)
Srikant, S., Aggarwal, V.: A system to grade computer programming skills using machine learning. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1887–1896 (2014)
Takhar, R., Aggarwal, V.: Grading uncompilable programs. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 9389–9396 (2019)
Wang, G.P., Chen, S.Y., Yang, X., Feng, R.: OJPOT: online judge & practice oriented teaching idea in programming courses. Eur. J. Eng. Educ. 41(3), 304–319 (2016)
Wang, M., Han, W., Chen, W.: MetaOJ: a massive distributed online judge system. Tsinghua Sci. Technol. 26(4), 548–557 (2021)
Wasik, S., Antczak, M., Badura, J., Laskowski, A., Sternal, T.: A survey on online judge systems and their applications. ACM Comput. Surv. 51(1), 1–34 (2018)
Yang, Z., Yang, D., Dyer, C., He, X., Smola, A., Hovy, E.: Hierarchical attention networks for document classification. In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 1480–1489 (2016)
Zhang, J., Wang, X., Zhang, H., Sun, H., Wang, K., Liu, X.: A novel neural source code representation based on abstract syntax tree. In: 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE), pp. 783–794. IEEE (2019)
Zhang, K., Shasha, D.: Simple fast algorithms for the editing distance between trees and related problems. SIAM J. Comput. 18(6), 1245–1262 (1989)
Zhou, W., Pan, Y., Zhou, Y., Sun, G.: The framework of a new online judge system for programming education. In: Proceedings of ACM Turing Celebration Conference, China, pp. 9–14 (2018)
Acknowledgement
This work is supported by the grants from the National Natural Science Foundation of China (Grant No. 62137001, 62072185).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Wang, D., Zhang, E., Lu, X. (2023). Automatic Grading of Student Code with Similarity Measurement. In: Amini, MR., Canu, S., Fischer, A., Guns, T., Kralj Novak, P., Tsoumakas, G. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2022. Lecture Notes in Computer Science(), vol 13718. Springer, Cham. https://doi.org/10.1007/978-3-031-26422-1_18
Download citation
DOI: https://doi.org/10.1007/978-3-031-26422-1_18
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-26421-4
Online ISBN: 978-3-031-26422-1
eBook Packages: Computer ScienceComputer Science (R0)