A Quantitative Study of NLP Approaches to Question Difficulty Estimation

Luca Benedetto ORCID: orcid.org/0000-0002-5113-4696¹⁰

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1831))

Included in the following conference series:

International Conference on Artificial Intelligence in Education

3788 Accesses
2 Citations

Abstract

Question Difficulty Estimation from Text (QDET) received an increased research interest in recent years, but most of previous work focused on single silos, without performing quantitative comparisons between different models or across datastes from different educational domains. To fill this gap, we quantitatively analyze several approaches proposed in previous research, and compare their performance on two publicly available datasets. Specifically, we consider reading comprehension Multiple Choice Questions (MCQs) and maths questions. We find that Transformer-based models are the best performing in both educational domains; models based on linguistic features perform well on reading comprehension questions, while frequency based features and word embeddings perform better in domain knowledge assessment.

This paper reports on research supported by Cambridge University Press & Assessment. We thank Dr Andrew Caines for the feedback on the manuscript.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 109.00; Price excludes VAT (USA)

Softcover Book: USD 139.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

AlKhuzaey, S., Grasso, F., Payne, T.R., Tamma, V.: A systematic review of data-driven approaches to item difficulty prediction. In: Roll, I., McNamara, D., Sosnovsky, S., Luckin, R., Dimitrova, V. (eds.) AIED 2021. LNCS (LNAI), vol. 12748, pp. 29–41. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-78292-4_3
Chapter Google Scholar
Beinborn, L., Zesch, T., Gurevych, I.: Candidate evaluation strategies for improved difficulty prediction of language tests. In: Proceedings of the Tenth Workshop on Innovative Use of NLP for Building Educational Applications, pp. 1–11 (2015)
Google Scholar
Benedetto, L., Aradelli, G., Cremonesi, P., Cappelli, A., Giussani, A., Turrin, R.: On the application of transformers for estimating the difficulty of multiple-choice questions from text. In: Proceedings of the 16th Workshop on Innovative Use of NLP for Building Educational Applications, pp. 147–157 (2021)
Google Scholar
Benedetto, L., Cappelli, A., Turrin, R., Cremonesi, P.: Introducing a framework to assess newly created questions with natural language processing. In: Bittencourt, I.I., Cukurova, M., Muldner, K., Luckin, R., Millán, E. (eds.) AIED 2020. LNCS (LNAI), vol. 12163, pp. 43–54. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-52237-7_4
Chapter Google Scholar
Benedetto, L., Cappelli, A., Turrin, R., Cremonesi, P.: R2DE: a NLP approach to estimating IRT parameters of newly generated questions. In: Proceedings of the Tenth International Conference on Learning Analytics & Knowledge, pp. 412–421 (2020)
Google Scholar
Benedetto, L., et al.: A survey on recent approaches to question difficulty estimation from text. ACM Comput. Surv. (CSUR) 55, 1–37 (2022)
Article Google Scholar
Culligan, B.: A comparison of three test formats to assess word difficulty. Lang. Test. 32(4), 503–520 (2015)
Article Google Scholar
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (2019)
Google Scholar
Ehara, Y.: Building an English vocabulary knowledge dataset of Japanese English-as-a-second-language learners using crowdsourcing. In: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (2018)
Google Scholar
Feng, M., Heffernan, N., Koedinger, K.: Addressing the assessment challenge with an online system that tutors as it assesses. User Model. User-Adapt. Interact. 19(3), 243–266 (2009)
Article Google Scholar
Hou, J., Maximilian, K., Quecedo, J.M.H., Stoyanova, N., Yangarber, R.: Modeling language learning using specialized Elo rating. In: Proceedings of the Fourteenth Workshop on Innovative Use of NLP for Building Educational Applications (2019)
Google Scholar
Huang, Y.T., Chen, M.C., Sun, Y.S.: Development and evaluation of a personalized computer-aided question generation for English learners to improve proficiency and correct mistakes. arXiv preprint arXiv:1808.09732 (2018)
Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.: RACE: large-scale reading comprehension dataset from examinations. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp. 785–794 (2017)
Google Scholar
Liang, Y., Li, J., Yin, J.: A new multi-choice reading comprehension dataset for curriculum learning. In: Asian Conference on Machine Learning. PMLR (2019)
Google Scholar
Loginova, E., Benedetto, L., Benoit, D., Cremonesi, P.: Towards the application of calibrated Transformers to the unsupervised estimation of question difficulty from text. In: RANLP 2021, pp. 846–855. INCOMA (2021)
Google Scholar
Sanh, V., Debut, L., Chaumond, J., Wolf, T.: DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108 (2019)
Trace, J., Brown, J.D., Janssen, G., Kozhevnikova, L.: Determining cloze item difficulty from item and passage characteristics across different learner backgrounds. Lang. Test. 34(2), 151–174 (2017)
Article Google Scholar
Vaswani, A., et al.: Attention is all you need. In: NIPS (2017)
Google Scholar
Yaneva, V., Baldwin, P., Mee, J., et al.: Predicting the difficulty of multiple choice questions in a high-stakes medical exam. In: Proceedings of the Fourteenth Workshop on Innovative Use of NLP for Building Educational Applications (2019)
Google Scholar
Zhou, Y., Tao, C.: Multi-task BERT for problem difficulty prediction. In: 2020 International Conference on Communications, Information System and Computer Engineering (CISCE), pp. 213–216. IEEE (2020)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Technology, University of Cambridge, Cambridge, UK
Luca Benedetto

Authors

Luca Benedetto
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Luca Benedetto .

Editor information

Editors and Affiliations

University of Southern California, Los Angeles, CA, USA
Ning Wang
University of British Columbia, Vancouver, BC, Canada
Genaro Rebolledo-Mendez
University of Leeds, Leeds, UK
Vania Dimitrova
North Carolina State University, Raleigh, NC, USA
Noboru Matsuda
UNED, Madrid, Spain
Olga C. Santos

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Benedetto, L. (2023). A Quantitative Study of NLP Approaches to Question Difficulty Estimation. In: Wang, N., Rebolledo-Mendez, G., Dimitrova, V., Matsuda, N., Santos, O.C. (eds) Artificial Intelligence in Education. Posters and Late Breaking Results, Workshops and Tutorials, Industry and Innovation Tracks, Practitioners, Doctoral Consortium and Blue Sky. AIED 2023. Communications in Computer and Information Science, vol 1831. Springer, Cham. https://doi.org/10.1007/978-3-031-36336-8_67

Download citation

DOI: https://doi.org/10.1007/978-3-031-36336-8_67
Published: 30 June 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-36335-1
Online ISBN: 978-3-031-36336-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics