Semester of Gradation
Summer 2025
Degree Type
Dissertation
Degree Name
Doctor of Philosophy in Data Science and Analytics
Department
School of Data Science and Analytics
Committee Chair/First Advisor
Ying Xie
Second Advisor
Sherry Ni
Third Advisor
Linh Le
Abstract
This work explores applying Multi-Agent (MA) Large Language Models (LLMs) to enhance credit card management, an underexplored area for their multi-step reasoning capabilities. Focusing on Equifax’s Optimal Path™ model [1]—a personalized solution for credit score optimization—the study addresses two key challenges: first, designing a natural language interface for financial credit models to improve accessibility and aid customer decision-making, and second, enhancing the reliability and real-world applicability of complex financial models prone to generating invalid or unfeasible recommendations caused by a lack of practical interpretability and susceptibility to edge cases. To tackle these, we propose and evaluate various MA designs, including sequential critique, debate-based, self-reflective, and chain of agents’ frameworks. Each architecture utilizes specialized LLM agents to either generate explanations or identify/correct problematic action plans. Performance is assessed using a novel, unsupervised goal-alignment metric, evaluating generated plans against user requests and company objectives. Results show that one and two-agent systems effectively provide natural language interfaces. Notably, two-agent systems, employing self-consistency techniques, significantly improve response alignment and reduce variability for challenging requests compared to single-agent systems. Moreover, combining self-consistency with prompt engineering methods (e.g., few-shot Chain of Thought and score change distribution) leads to higher, more stable alignment for moderately complex user queries. The study reveals no single universally optimal architecture; while chain and debate agents excel in standard tasks, collaborator agents demonstrate unique robustness against unrealistic requests, producing grounded outputs.