AI-TA: Towards an Intelligent Question-Answer Teaching Assistant using Open-Source LLMs

Responding to the thousands of student questions on online QA platforms each semester has a considerable human cost, particularly in computing courses with rapidly growing enrollments. To address the challenges of scalable and intelligent question-answering (QA), we introduce an innovative solution that leverages open-source Large Language Models (LLMs) from the LLaMA-2 family to ensure data privacy. Our approach combines augmentation techniques such as retrieval augmented generation (RAG), supervised fine-tuning (SFT), and learning from human preferences data using Direct Preference Optimization (DPO). Through extensive experimentation on a Piazza dataset from an introductory CS course, comprising 10,000 QA pairs and 1,500 pairs of preference data, we demonstrate a significant 30% improvement in the quality of answers, with RAG being a particularly impactful addition. Our contributions include the development of a novel architecture for educational QA, extensive evaluations of LLM performance utilizing both human assessments and LLM-based metrics, and insights into the challenges and future directions of educational data processing. This work paves the way for the development of AI-TA, an intelligent QA assistant customizable for courses with an online QA platform

Publication:

arXiv e-prints

Pub Date:

November 2023

DOI:

10.48550/arXiv.2311.02775

arXiv:

arXiv:2311.02775

Bibcode:

2023arXiv231102775H

Keywords:

Computer Science - Machine Learning;
Computer Science - Artificial Intelligence;
Computer Science - Computation and Language

E-Print:

Updates for camera-ready submission

AI-TA: Towards an Intelligent Question-Answer Teaching Assistant using Open-Source LLMs

Abstract