Date of Award
Fall 11-20-2024
Degree Type
Thesis
Degree Name
Master of Science in Information Technology
Department
Department of Information Technology
Committee Chair/First Advisor
Ying Xie
Second Advisor
Shaoen Wu
Third Advisor
Linh Le
Abstract
Large Language Models (LLMs) have significantly advanced the field of natural language processing but remain resource-intensive and impractical for many organizations. Specialist models offer a viable alternative, often developed through Knowledge Distillation (KD) techniques. However, traditional KD methods rely on predefined static datasets to elicit knowledge from the teacher model, failing to dynamically address the weaknesses of the student model during training. This research introduces two novel methods for adaptive knowledge elicitation: Feedback-Driven Question Generation and Agent-Based Targeted Question Generation. These methods iteratively expand the training dataset based on the student model’s performance, leveraging a teacher model to generate targeted data that addresses specific weaknesses in the student model. The study evaluates these methods through a case study focused on Python programming question-answering tasks, using Gemma 2-billion-parameter as the student model and ChatGPT-4o-mini as the teacher model. Results demonstrate that both methods significantly enhance the student model's accuracy and response quality compared to other approaches. Notably, Method 2, which incorporates specialized agents focusing on evaluation criteria such as accuracy and logical consistency, shows improved performance over Method 1, which does not use agents. This highlights the potential of agent-based knowledge elicitation to extract knowledge from a larger LLM. Furthermore, the findings showcase the potential of adaptive KD in bridging the gap between generalist AI models and domain-specific applications. Future directions include expanding feedback datasets used in the testing phase of our novel approach, developing more specialized agents through a coordinator agent, and scaling these methods to larger models.