Nothing Special   »   [go: up one dir, main page]

Skip to main content

Advertisement

Log in

Empowering Educators: Automated Short Answer Grading with Inconsistency Check and Feedback Integration using Machine Learning

  • Original Research
  • Published:
SN Computer Science Aims and scope Submit manuscript

Abstract

Automatic Short Answer Grading (ASAG) is a thriving domain of natural language understanding, focusing on learning analytics research. ASAG solutions are designed to alleviate the workload of teachers and instructors. While research in ASAG continues to advance through the application of deep learning, it faces certain limitations such as the need for extensive datasets and high computational costs. Our focus is on creating a machine-learning solution for ASAG that optimizes performance with small datasets and minimal computational demands. In this study, an ASAG framework namely Intelligent Descriptive answer E-Assessment System (IDEAS) is proposed. It uses a model answer-based approach that utilizes eight similarity metrics to compare the model's answer with student answers. These similarities are derived using the combination of both statistical and deep learning approaches. Unlike any prior work, this differs significantly because (i) the ASAG problem is conceptualized as multiclass classification rather than regression or binary classification, eliminating the necessity for extra discriminators. (ii) it aids evaluators in identifying inconsistencies in evaluation and provides comprehensive feedback. IDEAS is validated question-wise on various ASAG benchmark datasets namely ASAP-SAS, SciEntsBank, STITA Texas (Mohler). These datasets are constrained in ways such as lacking grading criteria for mark allocation. To address this limitation, a novel dataset, IDEAS_ASAG_DATA, is collected and utilized to validate the framework. Results demonstrate an accuracy of 94% when evaluating the framework on a specific dataset question. The results show that IDEAS attains comparable, and in certain instances, even superior performance when compared to human evaluators. We argue that the proposed framework establishes a robust baseline for future advancements in the ASAG field.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22
Fig. 23
Fig. 24
Fig. 25
Fig. 26
Fig. 27

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Data Availability

Data is provided only upon request.

References

  1. Mohler M, Mihalcea R. Text-to-text semantic similarity for automatic short answer grading. In: Proceedings of the 12th conference of the European chapter of the ACL (EACL 2009). Athens, Greece: Association for Computational Linguistics; 2009. p. 567–75.

  2. Burrows S, Gurevych I, Stein B. The eras and trends of automatic short answer grading. Int J Artif Intell Educ. 2015. https://doi.org/10.1007/s40593-014-0026-8.

    Article  Google Scholar 

  3. Sree Lakshm P, Kavitha. Intelligent scoring systems for descriptive answers—a review. Test Eng Manag. 2020;83:3595–600.

    Google Scholar 

  4. Lun J, Zhu J, Tang Y, Yang M. Multiple data augmentation strategies for improving performance on automatic short answer scoring, vol. 20; 2020.

  5. Rajagede RA, Hastuti RP. Stacking neural network models for automatic short answer scoring. IOP Conf Ser Mater Sci Eng. 2021;1077:012013. https://doi.org/10.1088/1757-899x/1077/1/012013.

    Article  Google Scholar 

  6. Zhang Y, Lin C, Chi M. Going deeper: automatic short-answer grading by combining student and question models. User Model User Adapt Interact. 2020;30:51–80. https://doi.org/10.1007/s11257-019-09251-6.

    Article  Google Scholar 

  7. Siddiqi R, Harrison CJ, Siddiqi R. Improving teaching and learning through automated short-answer marking. IEEE Trans Learn Technol. 2010;3:237–49. https://doi.org/10.1109/TLT.2010.4.

    Article  Google Scholar 

  8. Saha SK, Gupta R. Adopting computer-assisted assessment in evaluation of handwritten answer books: an experimental study. Edu Inform Technol. 2020;25:4845–60. https://doi.org/10.1007/s10639-020-10192-6.

    Article  Google Scholar 

  9. Saha SK, Dhawaleswar Rao CH. Development of a practical system for computerized evaluation of descriptive answers of middle school level students. Interact Learn Environ. 2022;30:215–28. https://doi.org/10.1080/10494820.2019.1651743.

    Article  Google Scholar 

  10. Bahel V, Thomas A. Text similarity analysis for evaluation of descriptive answers; 2021. arXiv:2105.02935.

  11. Jamil F, Hameed IA. Toward intelligent open-ended questions evaluation based on predictive optimization. Expert Syst Appl. 2023;231:120640. https://doi.org/10.1016/J.ESWA.2023.120640.

    Article  Google Scholar 

  12. Shukla A, Chaudhary BD. A strategy for detection of inconsistency in evaluation of essay type answers. Educ Inform Technol. 2014;19:899–912. https://doi.org/10.1007/s10639-013-9264-x.

    Article  Google Scholar 

  13. Rico-Juan JR, Gallego A-J, Calvo-Zaragoza J. Automatic detection of inconsistencies between numerical scores and textual feedback in peer-assessment processes with machine learning. Comput Educ. 2019;140:103609. https://doi.org/10.1016/j.compedu.2019.103609.

    Article  Google Scholar 

  14. Bernius JP, Krusche S, Bruegge B. Machine learning based feedback on textual student answers in large courses. Comput Educ Artif Intell. 2022. https://doi.org/10.1016/j.caeai.2022.100081.

    Article  Google Scholar 

  15. Vwen YL, Luco AAC, Tan SC. A human-centric automated essay scoring and feedback system for the development of ethical reasoning. Technol Soc. 2023;26:147–59. https://doi.org/10.2307/48707973.

    Article  Google Scholar 

  16. Hao Q, Smith DH IV, Ding L, Ko A, Ottaway C, Wilson J, Arakawa KH, Turcan A, Poehlman T, Greer T. Towards understanding the effective design of automated formative feedback for programming assignments. Comput Sci Educ. 2022;32:105–27. https://doi.org/10.1080/08993408.2020.1860408.

    Article  Google Scholar 

  17. Wang Z, Lan AS, Waters AE, Grimaldi P, Baraniuk RG. A meta-learning augmented bidirectional transformer model for automatic short answer grading. In: Proceedings of the 12th international conference on educational data mining (EDM 2019); 2019.

  18. Zhu H, Togo R, Ogawa T, Haseyama M. Prompt-based personalized federated learning for medical visual question answering; 2024. arXiv:2402.09677.

  19. del Gobbo E, Guarino A, Cafarelli B, Grilli L. GradeAid: a framework for automatic short answers grading in educational contexts—design, implementation and evaluation. Knowl Inform Syst. 2023;65:4295–334. https://doi.org/10.1007/s10115-023-01892-9.

    Article  Google Scholar 

  20. Kumar Y, Aggarwal S, Mahata D, Shah RR, Kumaraguru P, Zimmermann R. Get IT scored using AutoSAS—an automated system for scoring short answers. Proc AAAI Conf Artif Intell. 2019;33:9662–9. https://doi.org/10.1609/aaai.v33i01.33019662.

    Article  Google Scholar 

  21. Wang T, Inoue N, Ouchi H, Mizumoto T, Inui K. Inject rubrics into short answer grading system; 2019. p. 175–82. https://doi.org/10.18653/v1/P17.

  22. Riordan B, Horbach A, Cahill A, Zesch T, Lee CM. Investigating neural architectures for short answer scoring. In: EMNLP 2017-12th workshop on innovative use of NLP for building educational applications, BEA 2017—proceedings of the workshop. Association for Computational Linguistics (ACL); 2017. p. 159–68. https://doi.org/10.18653/v1/w17-5017.

  23. Gaddipati SK, Nair D, Plöger PG. Comparative evaluation of pretrained transfer learning models on automatic short answer grading; 2020.

  24. Sultan MA, Salazar C, Sumner T. Fast and easy short answer grading with high accuracy. In: Proceedings of the 2016 conference of the North American chapter of the association for computational linguistics: human language technologies. San Diego, California. Association for Computational Linguistics; 2016. p. 1070–5.

  25. Callear D, Jerrams-Smith J, Soh V. CAA of short non-MCQ answers. In: Proceedings of the 5th CAA conference, Loughborough: Loughborough University; 2001.

  26. Leacock C, Chodorow M. C-rater: automated scoring of short-answer questions. Comput Hum. 2003;37:37.

    Article  Google Scholar 

  27. Siddiqi Ra, Harrison C. A systematic approach to the automated marking of short-answer questions. In: IEEE INMIC 2008: 12th IEEE international multitopic conference—conference proceedings; 2008. p. 329–32. https://doi.org/10.1109/INMIC.2008.4777758.

  28. Mitchell T, Russell T. Towards robust computerised marking of free-text responses understanding evolution and inheritance in the national curriculum KS2-3 view project GEMSTONE technology: optimisation of global supply chain view project; 2002.

  29. Alfonseca E, Pérez D. Automatic assessment of open ended questions with a Bleu-inspired algorithm and shallow NLP. In: Vicedo JL, Martínez-Barco P, Muńoz R, Saiz Noeda M, editors. Advances in natural language processing. EsTAL 2004. Lecture notes in computer science(), vol. 3230. Berlin: Springer; 2004.

    Google Scholar 

  30. Condor A. Exploring automatic short answer grading as a tool to assist in human rating. In: Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics), 12164 LNAI:74–79. London: Springer; 2020. https://doi.org/10.1007/978-3-030-52240-7_14.

  31. Hou WJ, Tsao JH. Automatic assessment of students’ free-text answers with different levels. Int J Artif Intell Tools. 2011;20:327–47. https://doi.org/10.1142/S0218213011000188.

    Article  Google Scholar 

  32. del Gobbo E, Guarino A, Cafarelli B, Grilli L. GradeAid: a framework for automatic short answers grading in educational contexts—design, implementation and evaluation. In: Knowledge and information systems. Springer Science and Business Media Deutschland GmbH; 2023. https://doi.org/10.1007/s10115-023-01892-9.

  33. Gomaa WH, Fahmy AA. Ans2vec: a scoring system for short answers. Adv Intell Syst Comput. 2020;921:586–95. https://doi.org/10.1007/978-3-030-14118-9_59.

    Article  Google Scholar 

  34. Prabhudesai A, Duong TNB. Automatic short answer grading using Siamese bidirectional LSTM based regression. In: 2019 IEEE international conference on engineering, technology and education (TALE). IEEE; 2019. p. 1–6. https://doi.org/10.1109/TALE48000.2019.9226026.

  35. Chimingyang H. An automatic system for essay questions scoring based on LSTM and word embedding. In: Proceedings—2020 5th international conference on information science, computer technology and transportation, ISCTT. Institute of Electrical and Electronics Engineers Inc; 2020. p. 355–64. https://doi.org/10.1109/ISCTT51595.2020.00068.

  36. Tulu CN, Ozkaya O, Orhan U. Automatic short answer grading with SemSpace sense vectors and MaLSTM. IEEE Access. 2021;9:19270–80. https://doi.org/10.1109/ACCESS.2021.3054346.

    Article  Google Scholar 

  37. Zichao Y, Yang D, Dyer C, He X, Smola A, Hovy E. Hierarchical attention networks for document classification. In: Proceedings of the 2016 conference of the North American chapter of the association for computational linguistics: human language technologies. San Diego, California. Association for Computational Linguistics; 2016. p. 1480–9.

  38. Cai C. Automatic essay scoring with recurrent neural network. In: Proceedings of the 3rd international conference on high performance compilation, computing and communications. New York, NY, USA: ACM; 2019. p. 1–7. https://doi.org/10.1145/3318265.3318296.

  39. Sung C, Dhamecha T, Saha S, Ma T, Reddy V, Arora R. Pre-training BERT on domain resources for short answer grading. In: Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP). Stroudsburg, PA, USA: Association for Computational Linguistics; 2019. p. 607074. https://doi.org/10.18653/v1/D19-1628.

  40. Ghavidel HA, Zouaq A, Desmarais MC. Using BERT and XLNET for the automatic short answer grading task. In: CSEDU 2020—proceedings of the 12th international conference on computer supported education, vol. 1. SciTePress; 2020. p. 58–67. https://doi.org/10.5220/0009422400580067.

  41. Wiratmo A, Fatichah C. Assessment of Indonesian short essay using transfer learning Siamese dependency tree-LSTM. In: ICICoS 2020—proceeding: 4th international conference on informatics and computational sciences. Institute of Electrical and Electronics Engineers Inc; 2020. https://doi.org/10.1109/ICICoS51170.2020.9299044.

  42. Chen Z, Zhou Y. Research on automatic essay scoring of composition based on CNN and OR. In: 2019 2nd international conference on artificial intelligence and big data (ICAIBD). IEEE; 2019. p. 13–8. https://doi.org/10.1109/ICAIBD.2019.8837007.

  43. Lakshmi S. Document representation methods for text categorization: a review. International Journal of Scientific Research in Computer Science Applications and Management Studies IJSRCSAMS, vol. 7; 2018.

  44. Stacey B, Meurers D. Diagnosing meaning errors in short answers to reading comprehension questions. In: Proceedings of the 3rd ACL Workshop on Innovative Use of NLP for Building Educational Applications; 2008. p. 107–14.

  45. Hou W-J, Tsao J-H, Li S-Y, Chen L. LNAI 6096—automatic assessment of students’ free-text answers with support vector machines. IEA/AIE 2010, Part I, LNAI 6096, © Springer, Berlin; 2010.

  46. Elnaka A, Nael O, Afifi H, Sharaf N. AraScore: investigating response-based Arabic short answer scoring. Proc CIRP. 2021;189:282–91. https://doi.org/10.1016/j.procs.2021.05.091.

    Article  Google Scholar 

  47. Saha S, Dhamecha TI, Marvaniya S, Sindhgatta R, Sengupta B. Sentence level or token level features for automatic short answer grading? Use both. In: Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics), 10947 LNAI. London: Springer; 2018. p. 503–17. https://doi.org/10.1007/978-3-319-93843-1_37.

Download references

Acknowledgements

The authors acknowledge the support from REVA University for the facilities provided to carry out the research.

Funding

No funding available.

Author information

Authors and Affiliations

Authors

Contributions

P. Sree Lakshmi: Conceptualization, Methodology, Visualization, Writing-original draft. Simha J. B.: Supervision, Validation, Rajeev Ranjan: Writing-review and editing.

Corresponding authors

Correspondence to P. Sree Lakshmi or Rajeev Ranjan.

Ethics declarations

Conflict of interest

All authors declare that there is no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sree Lakshmi, P., Simha, J.B. & Ranjan, R. Empowering Educators: Automated Short Answer Grading with Inconsistency Check and Feedback Integration using Machine Learning. SN COMPUT. SCI. 5, 653 (2024). https://doi.org/10.1007/s42979-024-02954-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s42979-024-02954-7

Keywords

Navigation