Date of Award

Fall 11-20-2024

Degree Type

Thesis

Degree Name

Master of Science in Information Technology

Department

Department of Information Technology

Committee Chair/First Advisor

Ying Xie

Second Advisor

Shaoen Wu

Third Advisor

Linh Le

Abstract

Large Language Models (LLMs) have significantly advanced the field of natural language processing but remain resource-intensive and impractical for many organizations. Specialist models offer a viable alternative, often developed through Knowledge Distillation (KD) techniques. However, traditional KD methods rely on predefined static datasets to elicit knowledge from the teacher model, failing to dynamically address the weaknesses of the student model during training. This research introduces two novel methods for adaptive knowledge elicitation: Feedback-Driven Question Generation and Agent-Based Targeted Question Generation. These methods iteratively expand the training dataset based on the student model’s performance, leveraging a teacher model to generate targeted data that addresses specific weaknesses in the student model. The study evaluates these methods through a case study focused on Python programming question-answering tasks, using Gemma 2-billion-parameter as the student model and ChatGPT-4o-mini as the teacher model. Results demonstrate that both methods significantly enhance the student model's accuracy and response quality compared to other approaches. Notably, Method 2, which incorporates specialized agents focusing on evaluation criteria such as accuracy and logical consistency, shows improved performance over Method 1, which does not use agents. This highlights the potential of agent-based knowledge elicitation to extract knowledge from a larger LLM. Furthermore, the findings showcase the potential of adaptive KD in bridging the gap between generalist AI models and domain-specific applications. Future directions include expanding feedback datasets used in the testing phase of our novel approach, developing more specialized agents through a coordinator agent, and scaling these methods to larger models.

Share

COinS