Abstract
Automatic Short Answer Grading (ASAG) is a thriving domain of natural language understanding, focusing on learning analytics research. ASAG solutions are designed to alleviate the workload of teachers and instructors. While research in ASAG continues to advance through the application of deep learning, it faces certain limitations such as the need for extensive datasets and high computational costs. Our focus is on creating a machine-learning solution for ASAG that optimizes performance with small datasets and minimal computational demands. In this study, an ASAG framework namely Intelligent Descriptive answer E-Assessment System (IDEAS) is proposed. It uses a model answer-based approach that utilizes eight similarity metrics to compare the model's answer with student answers. These similarities are derived using the combination of both statistical and deep learning approaches. Unlike any prior work, this differs significantly because (i) the ASAG problem is conceptualized as multiclass classification rather than regression or binary classification, eliminating the necessity for extra discriminators. (ii) it aids evaluators in identifying inconsistencies in evaluation and provides comprehensive feedback. IDEAS is validated question-wise on various ASAG benchmark datasets namely ASAP-SAS, SciEntsBank, STITA Texas (Mohler). These datasets are constrained in ways such as lacking grading criteria for mark allocation. To address this limitation, a novel dataset, IDEAS_ASAG_DATA, is collected and utilized to validate the framework. Results demonstrate an accuracy of 94% when evaluating the framework on a specific dataset question. The results show that IDEAS attains comparable, and in certain instances, even superior performance when compared to human evaluators. We argue that the proposed framework establishes a robust baseline for future advancements in the ASAG field.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data Availability
Data is provided only upon request.
References
Mohler M, Mihalcea R. Text-to-text semantic similarity for automatic short answer grading. In: Proceedings of the 12th conference of the European chapter of the ACL (EACL 2009). Athens, Greece: Association for Computational Linguistics; 2009. p. 567–75.
Burrows S, Gurevych I, Stein B. The eras and trends of automatic short answer grading. Int J Artif Intell Educ. 2015. https://doi.org/10.1007/s40593-014-0026-8.
Sree Lakshm P, Kavitha. Intelligent scoring systems for descriptive answers—a review. Test Eng Manag. 2020;83:3595–600.
Lun J, Zhu J, Tang Y, Yang M. Multiple data augmentation strategies for improving performance on automatic short answer scoring, vol. 20; 2020.
Rajagede RA, Hastuti RP. Stacking neural network models for automatic short answer scoring. IOP Conf Ser Mater Sci Eng. 2021;1077:012013. https://doi.org/10.1088/1757-899x/1077/1/012013.
Zhang Y, Lin C, Chi M. Going deeper: automatic short-answer grading by combining student and question models. User Model User Adapt Interact. 2020;30:51–80. https://doi.org/10.1007/s11257-019-09251-6.
Siddiqi R, Harrison CJ, Siddiqi R. Improving teaching and learning through automated short-answer marking. IEEE Trans Learn Technol. 2010;3:237–49. https://doi.org/10.1109/TLT.2010.4.
Saha SK, Gupta R. Adopting computer-assisted assessment in evaluation of handwritten answer books: an experimental study. Edu Inform Technol. 2020;25:4845–60. https://doi.org/10.1007/s10639-020-10192-6.
Saha SK, Dhawaleswar Rao CH. Development of a practical system for computerized evaluation of descriptive answers of middle school level students. Interact Learn Environ. 2022;30:215–28. https://doi.org/10.1080/10494820.2019.1651743.
Bahel V, Thomas A. Text similarity analysis for evaluation of descriptive answers; 2021. arXiv:2105.02935.
Jamil F, Hameed IA. Toward intelligent open-ended questions evaluation based on predictive optimization. Expert Syst Appl. 2023;231:120640. https://doi.org/10.1016/J.ESWA.2023.120640.
Shukla A, Chaudhary BD. A strategy for detection of inconsistency in evaluation of essay type answers. Educ Inform Technol. 2014;19:899–912. https://doi.org/10.1007/s10639-013-9264-x.
Rico-Juan JR, Gallego A-J, Calvo-Zaragoza J. Automatic detection of inconsistencies between numerical scores and textual feedback in peer-assessment processes with machine learning. Comput Educ. 2019;140:103609. https://doi.org/10.1016/j.compedu.2019.103609.
Bernius JP, Krusche S, Bruegge B. Machine learning based feedback on textual student answers in large courses. Comput Educ Artif Intell. 2022. https://doi.org/10.1016/j.caeai.2022.100081.
Vwen YL, Luco AAC, Tan SC. A human-centric automated essay scoring and feedback system for the development of ethical reasoning. Technol Soc. 2023;26:147–59. https://doi.org/10.2307/48707973.
Hao Q, Smith DH IV, Ding L, Ko A, Ottaway C, Wilson J, Arakawa KH, Turcan A, Poehlman T, Greer T. Towards understanding the effective design of automated formative feedback for programming assignments. Comput Sci Educ. 2022;32:105–27. https://doi.org/10.1080/08993408.2020.1860408.
Wang Z, Lan AS, Waters AE, Grimaldi P, Baraniuk RG. A meta-learning augmented bidirectional transformer model for automatic short answer grading. In: Proceedings of the 12th international conference on educational data mining (EDM 2019); 2019.
Zhu H, Togo R, Ogawa T, Haseyama M. Prompt-based personalized federated learning for medical visual question answering; 2024. arXiv:2402.09677.
del Gobbo E, Guarino A, Cafarelli B, Grilli L. GradeAid: a framework for automatic short answers grading in educational contexts—design, implementation and evaluation. Knowl Inform Syst. 2023;65:4295–334. https://doi.org/10.1007/s10115-023-01892-9.
Kumar Y, Aggarwal S, Mahata D, Shah RR, Kumaraguru P, Zimmermann R. Get IT scored using AutoSAS—an automated system for scoring short answers. Proc AAAI Conf Artif Intell. 2019;33:9662–9. https://doi.org/10.1609/aaai.v33i01.33019662.
Wang T, Inoue N, Ouchi H, Mizumoto T, Inui K. Inject rubrics into short answer grading system; 2019. p. 175–82. https://doi.org/10.18653/v1/P17.
Riordan B, Horbach A, Cahill A, Zesch T, Lee CM. Investigating neural architectures for short answer scoring. In: EMNLP 2017-12th workshop on innovative use of NLP for building educational applications, BEA 2017—proceedings of the workshop. Association for Computational Linguistics (ACL); 2017. p. 159–68. https://doi.org/10.18653/v1/w17-5017.
Gaddipati SK, Nair D, Plöger PG. Comparative evaluation of pretrained transfer learning models on automatic short answer grading; 2020.
Sultan MA, Salazar C, Sumner T. Fast and easy short answer grading with high accuracy. In: Proceedings of the 2016 conference of the North American chapter of the association for computational linguistics: human language technologies. San Diego, California. Association for Computational Linguistics; 2016. p. 1070–5.
Callear D, Jerrams-Smith J, Soh V. CAA of short non-MCQ answers. In: Proceedings of the 5th CAA conference, Loughborough: Loughborough University; 2001.
Leacock C, Chodorow M. C-rater: automated scoring of short-answer questions. Comput Hum. 2003;37:37.
Siddiqi Ra, Harrison C. A systematic approach to the automated marking of short-answer questions. In: IEEE INMIC 2008: 12th IEEE international multitopic conference—conference proceedings; 2008. p. 329–32. https://doi.org/10.1109/INMIC.2008.4777758.
Mitchell T, Russell T. Towards robust computerised marking of free-text responses understanding evolution and inheritance in the national curriculum KS2-3 view project GEMSTONE technology: optimisation of global supply chain view project; 2002.
Alfonseca E, Pérez D. Automatic assessment of open ended questions with a Bleu-inspired algorithm and shallow NLP. In: Vicedo JL, Martínez-Barco P, Muńoz R, Saiz Noeda M, editors. Advances in natural language processing. EsTAL 2004. Lecture notes in computer science(), vol. 3230. Berlin: Springer; 2004.
Condor A. Exploring automatic short answer grading as a tool to assist in human rating. In: Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics), 12164 LNAI:74–79. London: Springer; 2020. https://doi.org/10.1007/978-3-030-52240-7_14.
Hou WJ, Tsao JH. Automatic assessment of students’ free-text answers with different levels. Int J Artif Intell Tools. 2011;20:327–47. https://doi.org/10.1142/S0218213011000188.
del Gobbo E, Guarino A, Cafarelli B, Grilli L. GradeAid: a framework for automatic short answers grading in educational contexts—design, implementation and evaluation. In: Knowledge and information systems. Springer Science and Business Media Deutschland GmbH; 2023. https://doi.org/10.1007/s10115-023-01892-9.
Gomaa WH, Fahmy AA. Ans2vec: a scoring system for short answers. Adv Intell Syst Comput. 2020;921:586–95. https://doi.org/10.1007/978-3-030-14118-9_59.
Prabhudesai A, Duong TNB. Automatic short answer grading using Siamese bidirectional LSTM based regression. In: 2019 IEEE international conference on engineering, technology and education (TALE). IEEE; 2019. p. 1–6. https://doi.org/10.1109/TALE48000.2019.9226026.
Chimingyang H. An automatic system for essay questions scoring based on LSTM and word embedding. In: Proceedings—2020 5th international conference on information science, computer technology and transportation, ISCTT. Institute of Electrical and Electronics Engineers Inc; 2020. p. 355–64. https://doi.org/10.1109/ISCTT51595.2020.00068.
Tulu CN, Ozkaya O, Orhan U. Automatic short answer grading with SemSpace sense vectors and MaLSTM. IEEE Access. 2021;9:19270–80. https://doi.org/10.1109/ACCESS.2021.3054346.
Zichao Y, Yang D, Dyer C, He X, Smola A, Hovy E. Hierarchical attention networks for document classification. In: Proceedings of the 2016 conference of the North American chapter of the association for computational linguistics: human language technologies. San Diego, California. Association for Computational Linguistics; 2016. p. 1480–9.
Cai C. Automatic essay scoring with recurrent neural network. In: Proceedings of the 3rd international conference on high performance compilation, computing and communications. New York, NY, USA: ACM; 2019. p. 1–7. https://doi.org/10.1145/3318265.3318296.
Sung C, Dhamecha T, Saha S, Ma T, Reddy V, Arora R. Pre-training BERT on domain resources for short answer grading. In: Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP). Stroudsburg, PA, USA: Association for Computational Linguistics; 2019. p. 607074. https://doi.org/10.18653/v1/D19-1628.
Ghavidel HA, Zouaq A, Desmarais MC. Using BERT and XLNET for the automatic short answer grading task. In: CSEDU 2020—proceedings of the 12th international conference on computer supported education, vol. 1. SciTePress; 2020. p. 58–67. https://doi.org/10.5220/0009422400580067.
Wiratmo A, Fatichah C. Assessment of Indonesian short essay using transfer learning Siamese dependency tree-LSTM. In: ICICoS 2020—proceeding: 4th international conference on informatics and computational sciences. Institute of Electrical and Electronics Engineers Inc; 2020. https://doi.org/10.1109/ICICoS51170.2020.9299044.
Chen Z, Zhou Y. Research on automatic essay scoring of composition based on CNN and OR. In: 2019 2nd international conference on artificial intelligence and big data (ICAIBD). IEEE; 2019. p. 13–8. https://doi.org/10.1109/ICAIBD.2019.8837007.
Lakshmi S. Document representation methods for text categorization: a review. International Journal of Scientific Research in Computer Science Applications and Management Studies IJSRCSAMS, vol. 7; 2018.
Stacey B, Meurers D. Diagnosing meaning errors in short answers to reading comprehension questions. In: Proceedings of the 3rd ACL Workshop on Innovative Use of NLP for Building Educational Applications; 2008. p. 107–14.
Hou W-J, Tsao J-H, Li S-Y, Chen L. LNAI 6096—automatic assessment of students’ free-text answers with support vector machines. IEA/AIE 2010, Part I, LNAI 6096, © Springer, Berlin; 2010.
Elnaka A, Nael O, Afifi H, Sharaf N. AraScore: investigating response-based Arabic short answer scoring. Proc CIRP. 2021;189:282–91. https://doi.org/10.1016/j.procs.2021.05.091.
Saha S, Dhamecha TI, Marvaniya S, Sindhgatta R, Sengupta B. Sentence level or token level features for automatic short answer grading? Use both. In: Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics), 10947 LNAI. London: Springer; 2018. p. 503–17. https://doi.org/10.1007/978-3-319-93843-1_37.
Acknowledgements
The authors acknowledge the support from REVA University for the facilities provided to carry out the research.
Funding
No funding available.
Author information
Authors and Affiliations
Contributions
P. Sree Lakshmi: Conceptualization, Methodology, Visualization, Writing-original draft. Simha J. B.: Supervision, Validation, Rajeev Ranjan: Writing-review and editing.
Corresponding authors
Ethics declarations
Conflict of interest
All authors declare that there is no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Sree Lakshmi, P., Simha, J.B. & Ranjan, R. Empowering Educators: Automated Short Answer Grading with Inconsistency Check and Feedback Integration using Machine Learning. SN COMPUT. SCI. 5, 653 (2024). https://doi.org/10.1007/s42979-024-02954-7
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s42979-024-02954-7