Towards Human-Like Educational Question Generation with Small Language Models

Fares Fawzi⁹,
Sarang Balan⁹,
Mutlu Cukurova⁹,
Emine Yilmaz⁹ &
…
Sahan Bulathwela⁹

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 2150))

Included in the following conference series:

International Conference on Artificial Intelligence in Education

1563 Accesses

Abstract

With the advent of Generative AI models, the automatic generation of educational questions plays a key role in developing online education. This work compares large-language model-based (LLM) systems and their small-language model (sLM) counterparts for educational question generation. Our experiments, quantitatively and qualitatively, demonstrate that sLMs can produce educational questions with comparable quality by further pre-training and fine-tuning.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 119.00; Price excludes VAT (USA)

Softcover Book: USD 159.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Towards Human-Like Educational Question Generation with Large Language Models

BloomLLM: Large Language Models Based Question Generation Combining Supervised Fine-Tuning and Bloom’s Taxonomy

Comparison of Large Language Models for Generating Contextually Relevant Questions

Notes

1.
https://chat.openai.com.

References

Amidei, J., Piwek, P., Willis, A.: The use of rating and Likert scales in natural language generation human evaluation tasks: a review and some recommendations. In: Proceedings of the 12th International Conference on Natural Language Generation. ACL (2019)
Google Scholar
Bai, Y., Zhao, J., Shi, J., Wei, T., Wu, X., He, L.: FairBench: a four-stage automatic framework for detecting stereotypes and biases in large language models. arXiv preprint arXiv:2308.10397 (2023)
Blobstein, A., Izmaylov, D., Yifat, T., Levy, M., Segal, A.: Angel: a new generation tool for learning material based questions and answers. In: Proceedings of the NeurIPS Workshop on Generative AI for Education (GAIED)
Google Scholar
Bulathwela, S., Muse, H., Yilmaz, E.: Scalable educational question generation with pre-trained language models. In: Wang, N., Rebolledo-Mendez, G., Matsuda, N., Santos, O.C., Dimitrova, V. (eds.) AIED 2023. LNCS, pp. 327–339. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-36272-9_27
Bulathwela, S., Pérez-Ortiz, M., Holloway, C., Cukurova, M., Shawe-Taylor, J.: Artificial intelligence alone will not democratise education: on educational inequality, techno-solutionism and inclusive tools. Sustainability 16(2), 781 (2024)
Google Scholar
Bulathwela, S., Pérez-Ortiz, M., Yilmaz, E., Shawe-Taylor, J.: Power to the learner: towards human-intuitive and integrative recommendations with open educational resources. Sustainability 14(18), 11682 (2022)
Google Scholar
Elkins, S., Kochmar, E., Cheung, J.C.K., Serban, I.: How teachers can use large language models and Bloom’s taxonomy to create educational quizzes. Proc. AAAI Conf. Artif. Intell. 38(21), 23084–23091 (2024). https://doi.org/10.1609/aaai.v38i21.30353
Elkins, S., Kochmar, E., Serban, I., Cheung, J.C.K.: How useful are educational questions generated by large language models? In: Wang, N., Rebolledo-Mendez, G., Dimitrova, V., Matsuda, N., Santos, O.C. (eds.) AIED 2023. LNCS, pp. 536–542. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-36336-8_83
Fawzi, F., Amini, S., Bulathwela, S.: Small generative language models for educational question generation. In: Proceedings of the NeurIPS Workshop on GAIED
Google Scholar
Hansen, L., Olsen, L.R., Enevoldsen, K.: Textdescriptives: a python package for calculating a large variety of metrics from text. J. Open Source Softw. 8(84), 5153 (2023)
Article Google Scholar
van der Lee, C., Gatt, A., van Miltenburg, E., Wubben, S., Krahmer, E.: Best practices for the human evaluation of automatically generated text. In: Proceedings of the 12th International Conference on Natural Language Generation. ACL (2019)
Google Scholar
Lo, K., Wang, L.L., Neumann, M., Kinney, R., Weld, D.: S2ORC: the semantic scholar open research corpus. In: Proceedings of the Annual Meeting of the ACL. Online (2020)
Google Scholar
Lopez, L.E., Cruz, D.K., Cruz, J.C.B., Cheng, C.: Simplifying paragraph-level question generation via transformer language models. In: Pham, D.N., Theeramunkong, T., Governatori, G., Liu, F. (eds.) PRICAI 2021. LNCS (LNAI), vol. 13032, pp. 323–334. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-89363-7_25
Loya, M., Sinha, D., Futrell, R.: Exploring the sensitivity of LLMs’ decision-making capabilities: insights from prompt variations and hyperparameters. In: Findings of the ACL: EMNLP 2023, pp. 3711–3716. ACL (2023)
Google Scholar
Lu, Y., Bartolo, M., Moore, A., Riedel, S., Stenetorp, P.: Fantastically ordered prompts and where to find them: overcoming few-shot prompt order sensitivity. In: Proceedings of the ACL (vol. 1: Long Papers). ACL (2022)
Google Scholar
Omelianchuk, K., Atrasevych, V., Chernodub, A., Skurzhanskyi, O.: GECToR – grammatical error correction: Tag, not rewrite. In: Proceedings of the Fifteenth Workshop on Innovative Use of NLP for Building Educational Applications, pp. 163–170. ACL, Seattle (2020)
Google Scholar
Raffel, C., et al.: Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res. 21(140), 1–67 (2020)
Google Scholar
Rajpurkar, P., Zhang, J., Lopyrev, K., Liang, P.: SQuAD: 100,000+ questions for machine comprehension of text. In: Proceedings of the 2016 Conference on EMNLP. ACL (2016)
Google Scholar
Sclar, M., Choi, Y., Tsvetkov, Y., Suhr, A.: Quantifying language models’ sensitivity to spurious features in prompt design or: how i learned to start worrying about prompt formatting. arXiv preprint arXiv:2310.11324 (2023)
Ushio, A., Alva-Manchego, F., Camacho-Collados, J.: A practical toolkit for multilingual question and answer generation. In: Proceedings of the 61st Annual Meeting of the ACL (vol. 3: System Demonstrations), pp. 86–94. ACL (2023)
Google Scholar
Vachev, K., Hardalov, M., Karadzhov, G., Georgiev, G., Koychev, I., Nakov, P.: Leaf: multiple-choice question generation. In: Proceedings of the European Conference on Information Retrieval (2022)
Google Scholar
van der Lee, C., Gatt, A., van Miltenburg, E., Krahmer, E.: Human evaluation of automatically generated text: current trends and best practice guidelines. Comput. Speech Lang. 67, 101–151 (2021)
Google Scholar
Vernikos, G., Brazinskas, A., Adamek, J., Mallinson, J., Severyn, A., Malmi, E.: Small language models improve giants by rewriting their outputs. In: Proceedings of the 18th Conference of the European Chapter of the ACL (Vol. 1: Long Papers). ACL (2024)
Google Scholar
Wang, Z., Valdez, J., Basu Mallick, D., Baraniuk, R.G.: Towards human-like educational question generation with large language models. In: Proceedings of International Conference on Artificial Intelligence in Education (2022)
Google Scholar
Welbl, J., Liu, N.F., Gardner, M.: Crowdsourcing multiple choice science questions. In: Proceedings of the 3rd Workshop on Noisy User-Generated Text. ACL (2017)
Google Scholar
Zhang, T., Kishore, V., Wu, F., Weinberger, K.Q., Artzi, Y.: Bertscore: evaluating text generation with BERT. In: Proceedings of 8th International Conference on Learning Representations. OpenReview.net (2020). https://openreview.net/forum?id=SkeHuCVFDr

Download references

Acknowledgements

This work is funded by the European Commission-funded projects “Humane AI” (grant 820437) and “X5GON” (grant No 761758). This work was also partially supported by the UCL Changemakers grant.

Author information

Authors and Affiliations

University College London, London, UK
Fares Fawzi, Sarang Balan, Mutlu Cukurova, Emine Yilmaz & Sahan Bulathwela

Authors

Fares Fawzi
View author publications
You can also search for this author in PubMed Google Scholar
Sarang Balan
View author publications
You can also search for this author in PubMed Google Scholar
Mutlu Cukurova
View author publications
You can also search for this author in PubMed Google Scholar
Emine Yilmaz
View author publications
You can also search for this author in PubMed Google Scholar
Sahan Bulathwela
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sahan Bulathwela .

Editor information

Editors and Affiliations

University of Memphis, Memphis, TN, USA
Andrew M. Olney
University of Duisburg-Essen, Duisburg, Germany
Irene-Angelica Chounta
Jinan University, Guangzhou, China
Zitao Liu
UNED, Madrid, Spain
Olga C. Santos
Universidade Federal de Alagoas, Maceio, Brazil
Ig Ibert Bittencourt

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Fawzi, F., Balan, S., Cukurova, M., Yilmaz, E., Bulathwela, S. (2024). Towards Human-Like Educational Question Generation with Small Language Models. In: Olney, A.M., Chounta, IA., Liu, Z., Santos, O.C., Bittencourt, I.I. (eds) Artificial Intelligence in Education. Posters and Late Breaking Results, Workshops and Tutorials, Industry and Innovation Tracks, Practitioners, Doctoral Consortium and Blue Sky. AIED 2024. Communications in Computer and Information Science, vol 2150. Springer, Cham. https://doi.org/10.1007/978-3-031-64315-6_25

Download citation

DOI: https://doi.org/10.1007/978-3-031-64315-6_25
Published: 02 July 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-64314-9
Online ISBN: 978-3-031-64315-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics