Abstract
With the advent of Generative AI models, the automatic generation of educational questions plays a key role in developing online education. This work compares large-language model-based (LLM) systems and their small-language model (sLM) counterparts for educational question generation. Our experiments, quantitatively and qualitatively, demonstrate that sLMs can produce educational questions with comparable quality by further pre-training and fine-tuning.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
References
Amidei, J., Piwek, P., Willis, A.: The use of rating and Likert scales in natural language generation human evaluation tasks: a review and some recommendations. In: Proceedings of the 12th International Conference on Natural Language Generation. ACL (2019)
Bai, Y., Zhao, J., Shi, J., Wei, T., Wu, X., He, L.: FairBench: a four-stage automatic framework for detecting stereotypes and biases in large language models. arXiv preprint arXiv:2308.10397 (2023)
Blobstein, A., Izmaylov, D., Yifat, T., Levy, M., Segal, A.: Angel: a new generation tool for learning material based questions and answers. In: Proceedings of the NeurIPS Workshop on Generative AI for Education (GAIED)
Bulathwela, S., Muse, H., Yilmaz, E.: Scalable educational question generation with pre-trained language models. In: Wang, N., Rebolledo-Mendez, G., Matsuda, N., Santos, O.C., Dimitrova, V. (eds.) AIED 2023. LNCS, pp. 327–339. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-36272-9_27
Bulathwela, S., Pérez-Ortiz, M., Holloway, C., Cukurova, M., Shawe-Taylor, J.: Artificial intelligence alone will not democratise education: on educational inequality, techno-solutionism and inclusive tools. Sustainability 16(2), 781 (2024)
Bulathwela, S., Pérez-Ortiz, M., Yilmaz, E., Shawe-Taylor, J.: Power to the learner: towards human-intuitive and integrative recommendations with open educational resources. Sustainability 14(18), 11682 (2022)
Elkins, S., Kochmar, E., Cheung, J.C.K., Serban, I.: How teachers can use large language models and Bloom’s taxonomy to create educational quizzes. Proc. AAAI Conf. Artif. Intell. 38(21), 23084–23091 (2024). https://doi.org/10.1609/aaai.v38i21.30353
Elkins, S., Kochmar, E., Serban, I., Cheung, J.C.K.: How useful are educational questions generated by large language models? In: Wang, N., Rebolledo-Mendez, G., Dimitrova, V., Matsuda, N., Santos, O.C. (eds.) AIED 2023. LNCS, pp. 536–542. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-36336-8_83
Fawzi, F., Amini, S., Bulathwela, S.: Small generative language models for educational question generation. In: Proceedings of the NeurIPS Workshop on GAIED
Hansen, L., Olsen, L.R., Enevoldsen, K.: Textdescriptives: a python package for calculating a large variety of metrics from text. J. Open Source Softw. 8(84), 5153 (2023)
van der Lee, C., Gatt, A., van Miltenburg, E., Wubben, S., Krahmer, E.: Best practices for the human evaluation of automatically generated text. In: Proceedings of the 12th International Conference on Natural Language Generation. ACL (2019)
Lo, K., Wang, L.L., Neumann, M., Kinney, R., Weld, D.: S2ORC: the semantic scholar open research corpus. In: Proceedings of the Annual Meeting of the ACL. Online (2020)
Lopez, L.E., Cruz, D.K., Cruz, J.C.B., Cheng, C.: Simplifying paragraph-level question generation via transformer language models. In: Pham, D.N., Theeramunkong, T., Governatori, G., Liu, F. (eds.) PRICAI 2021. LNCS (LNAI), vol. 13032, pp. 323–334. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-89363-7_25
Loya, M., Sinha, D., Futrell, R.: Exploring the sensitivity of LLMs’ decision-making capabilities: insights from prompt variations and hyperparameters. In: Findings of the ACL: EMNLP 2023, pp. 3711–3716. ACL (2023)
Lu, Y., Bartolo, M., Moore, A., Riedel, S., Stenetorp, P.: Fantastically ordered prompts and where to find them: overcoming few-shot prompt order sensitivity. In: Proceedings of the ACL (vol. 1: Long Papers). ACL (2022)
Omelianchuk, K., Atrasevych, V., Chernodub, A., Skurzhanskyi, O.: GECToR – grammatical error correction: Tag, not rewrite. In: Proceedings of the Fifteenth Workshop on Innovative Use of NLP for Building Educational Applications, pp. 163–170. ACL, Seattle (2020)
Raffel, C., et al.: Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res. 21(140), 1–67 (2020)
Rajpurkar, P., Zhang, J., Lopyrev, K., Liang, P.: SQuAD: 100,000+ questions for machine comprehension of text. In: Proceedings of the 2016 Conference on EMNLP. ACL (2016)
Sclar, M., Choi, Y., Tsvetkov, Y., Suhr, A.: Quantifying language models’ sensitivity to spurious features in prompt design or: how i learned to start worrying about prompt formatting. arXiv preprint arXiv:2310.11324 (2023)
Ushio, A., Alva-Manchego, F., Camacho-Collados, J.: A practical toolkit for multilingual question and answer generation. In: Proceedings of the 61st Annual Meeting of the ACL (vol. 3: System Demonstrations), pp. 86–94. ACL (2023)
Vachev, K., Hardalov, M., Karadzhov, G., Georgiev, G., Koychev, I., Nakov, P.: Leaf: multiple-choice question generation. In: Proceedings of the European Conference on Information Retrieval (2022)
van der Lee, C., Gatt, A., van Miltenburg, E., Krahmer, E.: Human evaluation of automatically generated text: current trends and best practice guidelines. Comput. Speech Lang. 67, 101–151 (2021)
Vernikos, G., Brazinskas, A., Adamek, J., Mallinson, J., Severyn, A., Malmi, E.: Small language models improve giants by rewriting their outputs. In: Proceedings of the 18th Conference of the European Chapter of the ACL (Vol. 1: Long Papers). ACL (2024)
Wang, Z., Valdez, J., Basu Mallick, D., Baraniuk, R.G.: Towards human-like educational question generation with large language models. In: Proceedings of International Conference on Artificial Intelligence in Education (2022)
Welbl, J., Liu, N.F., Gardner, M.: Crowdsourcing multiple choice science questions. In: Proceedings of the 3rd Workshop on Noisy User-Generated Text. ACL (2017)
Zhang, T., Kishore, V., Wu, F., Weinberger, K.Q., Artzi, Y.: Bertscore: evaluating text generation with BERT. In: Proceedings of 8th International Conference on Learning Representations. OpenReview.net (2020). https://openreview.net/forum?id=SkeHuCVFDr
Acknowledgements
This work is funded by the European Commission-funded projects “Humane AI” (grant 820437) and “X5GON” (grant No 761758). This work was also partially supported by the UCL Changemakers grant.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Fawzi, F., Balan, S., Cukurova, M., Yilmaz, E., Bulathwela, S. (2024). Towards Human-Like Educational Question Generation with Small Language Models. In: Olney, A.M., Chounta, IA., Liu, Z., Santos, O.C., Bittencourt, I.I. (eds) Artificial Intelligence in Education. Posters and Late Breaking Results, Workshops and Tutorials, Industry and Innovation Tracks, Practitioners, Doctoral Consortium and Blue Sky. AIED 2024. Communications in Computer and Information Science, vol 2150. Springer, Cham. https://doi.org/10.1007/978-3-031-64315-6_25
Download citation
DOI: https://doi.org/10.1007/978-3-031-64315-6_25
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-64314-9
Online ISBN: 978-3-031-64315-6
eBook Packages: Computer ScienceComputer Science (R0)