Semester of Graduation
Spring 2026
Degree Type
Dissertation/Thesis
Degree Name
Masters Degree in Computer Science
Department
Computer Science
Committee Chair/First Advisor
Ramazan Aygun
Second Advisor
Mahmut Karakaya
Third Advisor
Md Abdullah Al Hafiz Khan
Abstract
This thesis presents AgentCite, a multi-agent framework for automated verification of referenced numerical data in research documents. The framework employs a hybrid design in which LLM-based agents handle document understanding and evidence retrieval, while deterministic components manage structured parsing and value comparison, improving consistency and reproducibility while reducing token consumption and execution time compared to fully agentic approaches.
AgentCite consists of three autonomous agents: a Negotiator, a Main Document Agent, and a Source Documents Agent, coordinated through a fixed tool-call pipeline. The Negotiator orchestrates verification by extracting tabular data from the main document, retrieving evidence from per-source vector stores, and applying deterministic rules to assign one of three classes: Verified, Not Verified, or Contradicted. Isolated vector stores constrain retrieval and mitigate cross-agent data leakage.
The framework is evaluated on three research papers using six backbone models from the GPT-4.1 and GPT-5.4 families. On Paper 1, both GPT-4.1 and GPT-5.4 achieved ≥0.95 recall and ≥0.93 Contradiction F1. GPT-5.4-mini achieved the highest accuracy on Paper 2 (92.86\%) and the strongest overall performance on Paper 3 (≥0.90 recall per class, 93.55\% accuracy) and was the only model to detect a natural contradiction and reliably identify injected contradictions.
Pipeline reliability proved sensitive to structured multi-step tool orchestration; GPT-4.1-nano failed in all cases. The primary failure mode across models is contradiction detection. These findings suggest GPT-5.4-mini is the most promising candidate for fine-tuning. Future work includes improved retrieval anchoring, automated document acquisition, and support for inline claims and image-based tables.