Nothing Special   »   [go: up one dir, main page]

Skip to main content

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1831))

Included in the following conference series:

Abstract

Question Difficulty Estimation from Text (QDET) received an increased research interest in recent years, but most of previous work focused on single silos, without performing quantitative comparisons between different models or across datastes from different educational domains. To fill this gap, we quantitatively analyze several approaches proposed in previous research, and compare their performance on two publicly available datasets. Specifically, we consider reading comprehension Multiple Choice Questions (MCQs) and maths questions. We find that Transformer-based models are the best performing in both educational domains; models based on linguistic features perform well on reading comprehension questions, while frequency based features and word embeddings perform better in domain knowledge assessment.

This paper reports on research supported by Cambridge University Press & Assessment. We thank Dr Andrew Caines for the feedback on the manuscript.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 109.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 139.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. AlKhuzaey, S., Grasso, F., Payne, T.R., Tamma, V.: A systematic review of data-driven approaches to item difficulty prediction. In: Roll, I., McNamara, D., Sosnovsky, S., Luckin, R., Dimitrova, V. (eds.) AIED 2021. LNCS (LNAI), vol. 12748, pp. 29–41. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-78292-4_3

    Chapter  Google Scholar 

  2. Beinborn, L., Zesch, T., Gurevych, I.: Candidate evaluation strategies for improved difficulty prediction of language tests. In: Proceedings of the Tenth Workshop on Innovative Use of NLP for Building Educational Applications, pp. 1–11 (2015)

    Google Scholar 

  3. Benedetto, L., Aradelli, G., Cremonesi, P., Cappelli, A., Giussani, A., Turrin, R.: On the application of transformers for estimating the difficulty of multiple-choice questions from text. In: Proceedings of the 16th Workshop on Innovative Use of NLP for Building Educational Applications, pp. 147–157 (2021)

    Google Scholar 

  4. Benedetto, L., Cappelli, A., Turrin, R., Cremonesi, P.: Introducing a framework to assess newly created questions with natural language processing. In: Bittencourt, I.I., Cukurova, M., Muldner, K., Luckin, R., Millán, E. (eds.) AIED 2020. LNCS (LNAI), vol. 12163, pp. 43–54. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-52237-7_4

    Chapter  Google Scholar 

  5. Benedetto, L., Cappelli, A., Turrin, R., Cremonesi, P.: R2DE: a NLP approach to estimating IRT parameters of newly generated questions. In: Proceedings of the Tenth International Conference on Learning Analytics & Knowledge, pp. 412–421 (2020)

    Google Scholar 

  6. Benedetto, L., et al.: A survey on recent approaches to question difficulty estimation from text. ACM Comput. Surv. (CSUR) 55, 1–37 (2022)

    Article  Google Scholar 

  7. Culligan, B.: A comparison of three test formats to assess word difficulty. Lang. Test. 32(4), 503–520 (2015)

    Article  Google Scholar 

  8. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (2019)

    Google Scholar 

  9. Ehara, Y.: Building an English vocabulary knowledge dataset of Japanese English-as-a-second-language learners using crowdsourcing. In: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (2018)

    Google Scholar 

  10. Feng, M., Heffernan, N., Koedinger, K.: Addressing the assessment challenge with an online system that tutors as it assesses. User Model. User-Adapt. Interact. 19(3), 243–266 (2009)

    Article  Google Scholar 

  11. Hou, J., Maximilian, K., Quecedo, J.M.H., Stoyanova, N., Yangarber, R.: Modeling language learning using specialized Elo rating. In: Proceedings of the Fourteenth Workshop on Innovative Use of NLP for Building Educational Applications (2019)

    Google Scholar 

  12. Huang, Y.T., Chen, M.C., Sun, Y.S.: Development and evaluation of a personalized computer-aided question generation for English learners to improve proficiency and correct mistakes. arXiv preprint arXiv:1808.09732 (2018)

  13. Lai, G., Xie, Q., Liu, H., Yang, Y., Hovy, E.: RACE: large-scale reading comprehension dataset from examinations. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp. 785–794 (2017)

    Google Scholar 

  14. Liang, Y., Li, J., Yin, J.: A new multi-choice reading comprehension dataset for curriculum learning. In: Asian Conference on Machine Learning. PMLR (2019)

    Google Scholar 

  15. Loginova, E., Benedetto, L., Benoit, D., Cremonesi, P.: Towards the application of calibrated Transformers to the unsupervised estimation of question difficulty from text. In: RANLP 2021, pp. 846–855. INCOMA (2021)

    Google Scholar 

  16. Sanh, V., Debut, L., Chaumond, J., Wolf, T.: DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108 (2019)

  17. Trace, J., Brown, J.D., Janssen, G., Kozhevnikova, L.: Determining cloze item difficulty from item and passage characteristics across different learner backgrounds. Lang. Test. 34(2), 151–174 (2017)

    Article  Google Scholar 

  18. Vaswani, A., et al.: Attention is all you need. In: NIPS (2017)

    Google Scholar 

  19. Yaneva, V., Baldwin, P., Mee, J., et al.: Predicting the difficulty of multiple choice questions in a high-stakes medical exam. In: Proceedings of the Fourteenth Workshop on Innovative Use of NLP for Building Educational Applications (2019)

    Google Scholar 

  20. Zhou, Y., Tao, C.: Multi-task BERT for problem difficulty prediction. In: 2020 International Conference on Communications, Information System and Computer Engineering (CISCE), pp. 213–216. IEEE (2020)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Luca Benedetto .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Benedetto, L. (2023). A Quantitative Study of NLP Approaches to Question Difficulty Estimation. In: Wang, N., Rebolledo-Mendez, G., Dimitrova, V., Matsuda, N., Santos, O.C. (eds) Artificial Intelligence in Education. Posters and Late Breaking Results, Workshops and Tutorials, Industry and Innovation Tracks, Practitioners, Doctoral Consortium and Blue Sky. AIED 2023. Communications in Computer and Information Science, vol 1831. Springer, Cham. https://doi.org/10.1007/978-3-031-36336-8_67

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-36336-8_67

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-36335-1

  • Online ISBN: 978-3-031-36336-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics