Towards Generating Counterfactual Examples as Automatic Short Answer Feedback

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13355))

Included in the following conference series:

International Conference on Artificial Intelligence in Education

4418 Accesses
3 Citations

Abstract

Receiving response-specific, individual improvement suggestions is one of the most helpful forms of feedback for students, especially for short answer questions. However, it is also expensive to construct manually. For this reason, we investigate to which extent counterfactual explanation methods can be used to generate feedback from short answer grading models automatically. Given an incorrect student response, counterfactual models suggest small modifications that would have led the response to being graded as correct. Successful modifications can then be displayed to the learner as improvement suggestions formulated in their own words. As not every response can be corrected with only minor modifications, we investigate the percentage of correctable answers in the automatic short answer grading datasets SciEntsBank, Beetle and SAF. In total, we compare three counterfactual explanation models and a paraphrasing approach. On all datasets, roughly a quarter of incorrect responses can be modified to be classified as correct by an automatic grading model without straying too far from the initial response. However, an expert reevaluation of the modified responses shows that nearly all of them remain incorrect, only fooling the grading model into thinking them correct. While one of the counterfactual generation approaches improved student responses at least partially, the results highlight the general weakness of neural networks to adversarial examples. Thus, we recommend further research with more reliable grading models, for example, by including external knowledge sources or training adversarially.

This research is funded by the Bundesministerium für Bildung und Forschung in the project: Software Campus 2.0 (ZN 01|S17050), Microproject: DA-VBB.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Explainability in Automatic Short Answer Grading

Assessing the Practical Benefit of Automated Short-Answer Graders

Fooling It - Student Attacks on Automatic Short Answer Grading

Notes

References

Buchsbaum, D., Bridgers, S., Skolnick Weisberg, D., Gopnik, A.: The power of possibility: causal learning, counterfactual reasoning, and pretend play. Philos. Trans. R. Soc. B Biol. Sci. 367(1599), 2202–2212 (2012). https://doi.org/10.1098/rstb.2012.0122
Article Google Scholar
Chou, Y.L., Moreira, C., Bruza, P., Ouyang, C., Jorge, J.: Counterfactuals and causability in explainable artificial intelligence: theory, algorithms, and applications. Inf. Fus. 81, 59–83 (2022)
Article Google Scholar
Deeva, G., Bogdanova, D., Serral, E., Snoeck, M., De Weerdt, J.: A review of automated feedback systems for learners: classification framework, challenges and opportunities. Comput. Educ. 162, 104094 (2021)
Article Google Scholar
Dzikovska, M., et al.: SemEval-2013 task 7: the joint student response analysis and 8th recognizing textual entailment challenge. In: 2nd Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the 7th International Workshop on Semantic Evaluation, SemEval 2013, Atlanta, Georgia, USA, pp. 263–274. Association for Computational Linguistics (June 2013). https://aclanthology.org/S13-2045
Dzikovska, M., Steinhauser, N., Farrow, E., Moore, J., Campbell, G.: BEETLE II: deep natural language understanding and automatic feedback generation for intelligent tutoring in basic electricity and electronics. Int. J. Artif. Intell. Educ. 24(3), 284–332 (2014). https://doi.org/10.1007/s40593-014-0017-9
Article Google Scholar
Filighera, A., Ochs, S., Steuer, T., Tregel, T.: Cheating automatic short answer grading: on the adversarial usage of adjectives and adverbs (2022). https://doi.org/10.48550/ARXIV.2201.08318
Filighera, A., Parihar, S., Steuer, T., Meuser, T., Ochs, S.: Your answer is incorrect... would you like to know why? Introducing a bilingual short answer feedback dataset. In: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Dublin, Ireland, pp. 8577–8591. Association for Computational Linguistics (May 2022)
Google Scholar
Hasan, M.A., Noor, N.F.M., Rahman, S.S.B.A., Rahman, M.M.: The transition from intelligent to affective tutoring system: a review and open issues. IEEE Access 8, 204612–204638 (2020). https://doi.org/10.1109/ACCESS.2020.3036990
Article Google Scholar
Hellman, S., et al.: Multiple instance learning for content feedback localization without annotation. In: Proceedings of the 15th Workshop on Innovative Use of NLP for Building Educational Applications, Seattle, WA, USA, pp. 30–40. Association for Computational Linguistics (July 2020)
Google Scholar
Jordan, S., Mitchell, T.: e-assessment for learning? The potential of short-answer free-text questions with tailored feedback. Br. J. Edu. Technol. 40(2), 371–385 (2009)
Article Google Scholar
Ke, Z., Ng, V.: Automated essay scoring: a survey of the state of the art. In: Proceedings of the 28th International Joint Conference on Artificial Intelligence, IJCAI-19, pp. 6300–6308. International Joint Conferences on Artificial Intelligence Organization (July 2019). https://doi.org/10.24963/ijcai.2019/879
Keane, M.T., Kenny, E.M., Delaney, E., Smyth, B.: If only we had better counterfactual explanations: five key deficits to rectify in the evaluation of counterfactual XAI techniques. In: Zhou, Z.H. (ed.) Proceedings of the 30th International Joint Conference on Artificial Intelligence, IJCAI-21, pp. 4466–4474. International Joint Conferences on Artificial Intelligence Organization (August 2021)
Google Scholar
Keuning, H., Jeuring, J., Heeren, B.: A systematic literature review of automated feedback generation for programming exercises. ACM Trans. Comput. Educ. (TOCE) 19(1), 1–43 (2018). https://doi.org/10.1145/3231711
Article Google Scholar
Kulik, J.A., Fletcher, J.: Effectiveness of intelligent tutoring systems: a meta-analytic review. Rev. Educ. Res. 86(1), 42–78 (2016). https://doi.org/10.3102/0034654315581420
Article Google Scholar
Ling, W., Yogatama, D., Dyer, C., Blunsom, P.: Program induction by rationale generation: learning to solve and explain algebraic word problems. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Vancouver, Canada, pp. 158–167. Association for Computational Linguistics (July 2017). https://doi.org/10.18653/v1/P17-1015
Livingston, S.A.: Constructed-response test questions: why we use them; how we score them. R &D Connections, vol. 11 (September 2009)
Google Scholar
Lu, X., Di Eugenio, B., Ohlsson, S., Fossati, D.: Simple but effective feedback generation to tutor abstract problem solving. In: Proceedings of the 5th International Natural Language Generation Conference, Salt Fork, Ohio, USA, pp. 104–112. Association for Computational Linguistics (June 2008)
Google Scholar
Makatchev, M., Jordan, P.W., VanLehn, K.: Abductive theorem proving for analyzing student explanations to guide feedback in intelligent tutoring systems. J. Autom. Reason. 32(3), 187–226 (2004)
Google Scholar
Mizumoto, T., et al.: Analytic score prediction and justification identification in automated short answer scoring. In: Proceedings of the 14th Workshop on Innovative Use of NLP for Building Educational Applications, Florence, Italy, pp. 316–325. Association for Computational Linguistics (August 2019). https://doi.org/10.18653/v1/W19-4433
Mousavinasab, E., Zarifsanaiey, N., Kalhori, S.R.N., Rakhshan, M., Keikha, L., Saeedi, M.G.: Intelligent tutoring systems: a systematic review of characteristics, applications, and evaluation methods. Interact. Learn. Environ. 29(1), 142–163 (2021). https://doi.org/10.1080/10494820.2018.1558257
Article Google Scholar
Olney, A.M.: Generating response-specific elaborated feedback using long-form neural question answering. In: Proceedings of the 8th ACM Conference on Learning @ Scale, L@S 2021, New York, NY, USA, pp. 27–36. Association for Computing Machinery (2021). https://doi.org/10.1145/3430895.3460131
Raffel, C., et al.: Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res. 21(140), 1–67 (2020)
MathSciNet MATH Google Scholar
Ross, A., Marasović, A., Peters, M.: Explaining NLP models via minimal contrastive editing (MiCE). In: Findings of the Association for Computational Linguistics, ACL-IJCNLP 2021, pp. 3840–3852. Association for Computational Linguistics (August 2021). https://doi.org/10.18653/v1/2021.findings-acl.336
Shin, D.: The effects of explainability and causability on perception, trust, and acceptance: implications for explainable AI. Int. J. Hum Comput Stud. 146, 102551 (2021). https://doi.org/10.1016/j.ijhcs.2020.102551
Article Google Scholar
Shute, V.J.: Focus on formative feedback. Rev. Educ. Res. 78(1), 153–189 (2008). https://doi.org/10.3102/0034654307313795
Article Google Scholar
Stepin, I., Alonso, J.M., Catala, A., Pereira-Fariña, M.: A survey of contrastive and counterfactual explanation generation methods for explainable artificial intelligence. IEEE Access 9, 11974–12001 (2021)
Article Google Scholar
Sundararajan, M., Taly, A., Yan, Q.: Axiomatic attribution for deep networks. In: International Conference on Machine Learning, pp. 3319–3328. PMLR (2017)
Google Scholar
Sung, C., Dhamecha, T.I., Mukhi, N.: Improving short answer grading using transformer-based pre-training. In: Isotani, S., Millán, E., Ogan, A., Hastings, P., McLaren, B., Luckin, R. (eds.) AIED 2019. LNCS (LNAI), vol. 11625, pp. 469–481. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-23204-7_39
Chapter Google Scholar
VanLehn, K.: The relative effectiveness of human tutoring, intelligent tutoring systems, and other tutoring systems. Educ. Psychol. 46(4), 197–221 (2011)
Article Google Scholar
Verma, S., Dickerson, J., Hines, K.: Counterfactual explanations for machine learning: a review. arXiv preprint arXiv:2010.10596 (2020)
Winstone, N.E., Nash, R.A., Parker, M., Rowntree, J.: Supporting learners’ agentic engagement with feedback: a systematic review and a taxonomy of recipience processes. Educ. Psychol. 52(1), 17–37 (2017)
Article Google Scholar
Wu, T., Ribeiro, M.T., Heer, J., Weld, D.: Polyjuice: generating counterfactuals for explaining, evaluating, and improving models. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp. 6707–6723. Association for Computational Linguistics (August 2021). https://doi.org/10.18653/v1/2021.acl-long.523
Xie, Z., Thiem, S., Martin, J., Wainwright, E., Marmorstein, S., Jansen, P.: WorldTree V2: a corpus of science-domain structured explanations and inference patterns supporting multi-hop inference. In: Proceedings of the 12th Language Resources and Evaluation Conference, Marseille, France, pp. 5456–5473. European Language Resources Association (May 2020)
Google Scholar

Download references

Author information

Authors and Affiliations

Multimedia Communications Lab, Technical University of Darmstadt, Darmstadt, Germany
Anna Filighera, Joel Tschesche, Tim Steuer, Thomas Tregel & Lisa Wernet

Authors

Anna Filighera
View author publications
You can also search for this author in PubMed Google Scholar
Joel Tschesche
View author publications
You can also search for this author in PubMed Google Scholar
Tim Steuer
View author publications
You can also search for this author in PubMed Google Scholar
Thomas Tregel
View author publications
You can also search for this author in PubMed Google Scholar
Lisa Wernet
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Anna Filighera .

Editor information

Editors and Affiliations

Ateneo De Manila University, Quezon, Philippines
Maria Mercedes Rodrigo
Department of Computer Science, North Carolina State University, Raleigh, NC, USA
Noburu Matsuda
Durham University, Durham, UK
Alexandra I. Cristea
University of Leeds, Leeds, UK
Vania Dimitrova

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Filighera, A., Tschesche, J., Steuer, T., Tregel, T., Wernet, L. (2022). Towards Generating Counterfactual Examples as Automatic Short Answer Feedback. In: Rodrigo, M.M., Matsuda, N., Cristea, A.I., Dimitrova, V. (eds) Artificial Intelligence in Education. AIED 2022. Lecture Notes in Computer Science, vol 13355. Springer, Cham. https://doi.org/10.1007/978-3-031-11644-5_17

Download citation

DOI: https://doi.org/10.1007/978-3-031-11644-5_17
Published: 27 July 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-11643-8
Online ISBN: 978-3-031-11644-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Towards Generating Counterfactual Examples as Automatic Short Answer Feedback

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Explainability in Automatic Short Answer Grading

Assessing the Practical Benefit of Automated Short-Answer Graders

Fooling It - Student Attacks on Automatic Short Answer Grading

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Towards Generating Counterfactual Examples as Automatic Short Answer Feedback

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Explainability in Automatic Short Answer Grading

Assessing the Practical Benefit of Automated Short-Answer Graders

Fooling It - Student Attacks on Automatic Short Answer Grading

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation