Nothing Special   »   [go: up one dir, main page]

Skip to main content

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 2150))

Included in the following conference series:

  • 906 Accesses

Abstract

With the advent of Generative AI models, the automatic generation of educational questions plays a key role in developing online education. This work compares large-language model-based (LLM) systems and their small-language model (sLM) counterparts for educational question generation. Our experiments, quantitatively and qualitatively, demonstrate that sLMs can produce educational questions with comparable quality by further pre-training and fine-tuning.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 119.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://chat.openai.com.

References

  1. Amidei, J., Piwek, P., Willis, A.: The use of rating and Likert scales in natural language generation human evaluation tasks: a review and some recommendations. In: Proceedings of the 12th International Conference on Natural Language Generation. ACL (2019)

    Google Scholar 

  2. Bai, Y., Zhao, J., Shi, J., Wei, T., Wu, X., He, L.: FairBench: a four-stage automatic framework for detecting stereotypes and biases in large language models. arXiv preprint arXiv:2308.10397 (2023)

  3. Blobstein, A., Izmaylov, D., Yifat, T., Levy, M., Segal, A.: Angel: a new generation tool for learning material based questions and answers. In: Proceedings of the NeurIPS Workshop on Generative AI for Education (GAIED)

    Google Scholar 

  4. Bulathwela, S., Muse, H., Yilmaz, E.: Scalable educational question generation with pre-trained language models. In: Wang, N., Rebolledo-Mendez, G., Matsuda, N., Santos, O.C., Dimitrova, V. (eds.) AIED 2023. LNCS, pp. 327–339. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-36272-9_27

  5. Bulathwela, S., Pérez-Ortiz, M., Holloway, C., Cukurova, M., Shawe-Taylor, J.: Artificial intelligence alone will not democratise education: on educational inequality, techno-solutionism and inclusive tools. Sustainability 16(2), 781 (2024)

    Google Scholar 

  6. Bulathwela, S., Pérez-Ortiz, M., Yilmaz, E., Shawe-Taylor, J.: Power to the learner: towards human-intuitive and integrative recommendations with open educational resources. Sustainability 14(18), 11682 (2022)

    Google Scholar 

  7. Elkins, S., Kochmar, E., Cheung, J.C.K., Serban, I.: How teachers can use large language models and Bloom’s taxonomy to create educational quizzes. Proc. AAAI Conf. Artif. Intell. 38(21), 23084–23091 (2024). https://doi.org/10.1609/aaai.v38i21.30353

  8. Elkins, S., Kochmar, E., Serban, I., Cheung, J.C.K.: How useful are educational questions generated by large language models? In: Wang, N., Rebolledo-Mendez, G., Dimitrova, V., Matsuda, N., Santos, O.C. (eds.) AIED 2023. LNCS, pp. 536–542. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-36336-8_83

  9. Fawzi, F., Amini, S., Bulathwela, S.: Small generative language models for educational question generation. In: Proceedings of the NeurIPS Workshop on GAIED

    Google Scholar 

  10. Hansen, L., Olsen, L.R., Enevoldsen, K.: Textdescriptives: a python package for calculating a large variety of metrics from text. J. Open Source Softw. 8(84), 5153 (2023)

    Article  Google Scholar 

  11. van der Lee, C., Gatt, A., van Miltenburg, E., Wubben, S., Krahmer, E.: Best practices for the human evaluation of automatically generated text. In: Proceedings of the 12th International Conference on Natural Language Generation. ACL (2019)

    Google Scholar 

  12. Lo, K., Wang, L.L., Neumann, M., Kinney, R., Weld, D.: S2ORC: the semantic scholar open research corpus. In: Proceedings of the Annual Meeting of the ACL. Online (2020)

    Google Scholar 

  13. Lopez, L.E., Cruz, D.K., Cruz, J.C.B., Cheng, C.: Simplifying paragraph-level question generation via transformer language models. In: Pham, D.N., Theeramunkong, T., Governatori, G., Liu, F. (eds.) PRICAI 2021. LNCS (LNAI), vol. 13032, pp. 323–334. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-89363-7_25

  14. Loya, M., Sinha, D., Futrell, R.: Exploring the sensitivity of LLMs’ decision-making capabilities: insights from prompt variations and hyperparameters. In: Findings of the ACL: EMNLP 2023, pp. 3711–3716. ACL (2023)

    Google Scholar 

  15. Lu, Y., Bartolo, M., Moore, A., Riedel, S., Stenetorp, P.: Fantastically ordered prompts and where to find them: overcoming few-shot prompt order sensitivity. In: Proceedings of the ACL (vol. 1: Long Papers). ACL (2022)

    Google Scholar 

  16. Omelianchuk, K., Atrasevych, V., Chernodub, A., Skurzhanskyi, O.: GECToR – grammatical error correction: Tag, not rewrite. In: Proceedings of the Fifteenth Workshop on Innovative Use of NLP for Building Educational Applications, pp. 163–170. ACL, Seattle (2020)

    Google Scholar 

  17. Raffel, C., et al.: Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res. 21(140), 1–67 (2020)

    Google Scholar 

  18. Rajpurkar, P., Zhang, J., Lopyrev, K., Liang, P.: SQuAD: 100,000+ questions for machine comprehension of text. In: Proceedings of the 2016 Conference on EMNLP. ACL (2016)

    Google Scholar 

  19. Sclar, M., Choi, Y., Tsvetkov, Y., Suhr, A.: Quantifying language models’ sensitivity to spurious features in prompt design or: how i learned to start worrying about prompt formatting. arXiv preprint arXiv:2310.11324 (2023)

  20. Ushio, A., Alva-Manchego, F., Camacho-Collados, J.: A practical toolkit for multilingual question and answer generation. In: Proceedings of the 61st Annual Meeting of the ACL (vol. 3: System Demonstrations), pp. 86–94. ACL (2023)

    Google Scholar 

  21. Vachev, K., Hardalov, M., Karadzhov, G., Georgiev, G., Koychev, I., Nakov, P.: Leaf: multiple-choice question generation. In: Proceedings of the European Conference on Information Retrieval (2022)

    Google Scholar 

  22. van der Lee, C., Gatt, A., van Miltenburg, E., Krahmer, E.: Human evaluation of automatically generated text: current trends and best practice guidelines. Comput. Speech Lang. 67, 101–151 (2021)

    Google Scholar 

  23. Vernikos, G., Brazinskas, A., Adamek, J., Mallinson, J., Severyn, A., Malmi, E.: Small language models improve giants by rewriting their outputs. In: Proceedings of the 18th Conference of the European Chapter of the ACL (Vol. 1: Long Papers). ACL (2024)

    Google Scholar 

  24. Wang, Z., Valdez, J., Basu Mallick, D., Baraniuk, R.G.: Towards human-like educational question generation with large language models. In: Proceedings of International Conference on Artificial Intelligence in Education (2022)

    Google Scholar 

  25. Welbl, J., Liu, N.F., Gardner, M.: Crowdsourcing multiple choice science questions. In: Proceedings of the 3rd Workshop on Noisy User-Generated Text. ACL (2017)

    Google Scholar 

  26. Zhang, T., Kishore, V., Wu, F., Weinberger, K.Q., Artzi, Y.: Bertscore: evaluating text generation with BERT. In: Proceedings of 8th International Conference on Learning Representations. OpenReview.net (2020). https://openreview.net/forum?id=SkeHuCVFDr

Download references

Acknowledgements

This work is funded by the European Commission-funded projects “Humane AI” (grant 820437) and “X5GON” (grant No 761758). This work was also partially supported by the UCL Changemakers grant.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sahan Bulathwela .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Fawzi, F., Balan, S., Cukurova, M., Yilmaz, E., Bulathwela, S. (2024). Towards Human-Like Educational Question Generation with Small Language Models. In: Olney, A.M., Chounta, IA., Liu, Z., Santos, O.C., Bittencourt, I.I. (eds) Artificial Intelligence in Education. Posters and Late Breaking Results, Workshops and Tutorials, Industry and Innovation Tracks, Practitioners, Doctoral Consortium and Blue Sky. AIED 2024. Communications in Computer and Information Science, vol 2150. Springer, Cham. https://doi.org/10.1007/978-3-031-64315-6_25

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-64315-6_25

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-64314-9

  • Online ISBN: 978-3-031-64315-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics