Semester of Graduation
Spring 2026
Degree Type
Dissertation/Thesis
Degree Name
Masters in Artificial Intelligence
Department
Department of Software Engineering and Game Development
Committee Chair/First Advisor
Dr. Jiho Noh
Second Advisor
Dr. Nasrin Dehbozorgi
Third Advisor
Dr. Dylan Gaines
Abstract
Assessing creativity at scale remains a persistent challenge in cognitive science, as human raters are costly, slow, and often inconsistent in their judgments. This thesis introduced a novel framework for automated scientific creativity assessment using forced pairwise ranking, in which fine-tuned large language models compared response pairs and determined which was more creative. Five empirical studies were conducted using Llama-2-7B and Llama-2-13B models adapted via LoRA fine-tuning and benchmarked against human scored responses from the Scientific Creative Thinking Test. A regression baseline achieved Pearson π = .74 on the test set, matching the human inter-rater ceiling reported in the literature. A pairwise classification paradigm aggregated judgments into continuous creativity scores via Elo rating, achieving π = .69. Increased model scale provided no meaningful performance gains under matched training conditions. Uncer- tainty quantification via stochastic sampling and DBSCAN clustering did not reliably pre- dict prediction error, while tolerance accuracy analysis confirmed that both models were most reliable in the mid-range of the creativity spectrum and could approximate whether a response reflects lower, average, or elevated creativity. Inter-rater disagreement among human judges was identified as a significant contributing source of the performance ceiling, with disagreement concentrated at the upper end of the creativity spectrum where model predictions were also least reliable. These findings suggest that future progress in automated creativity assessment depends more on improving ground truth label quality than on scaling model size or changing training paradigms.
Included in
Cognitive Science Commons, Computer and Systems Architecture Commons, Multivariate Analysis Commons
Comments
None