Semester of Graduation

Spring 2026

Degree Type

Dissertation/Thesis

Degree Name

Masters in Artificial Intelligence

Department

Department of Software Engineering and Game Development

Committee Chair/First Advisor

Dr. Jiho Noh

Second Advisor

Dr. Nasrin Dehbozorgi

Third Advisor

Dr. Dylan Gaines

Abstract

Assessing creativity at scale remains a persistent challenge in cognitive science, as human raters are costly, slow, and often inconsistent in their judgments. This thesis introduced a novel framework for automated scientific creativity assessment using forced pairwise ranking, in which fine-tuned large language models compared response pairs and determined which was more creative. Five empirical studies were conducted using Llama-2-7B and Llama-2-13B models adapted via LoRA fine-tuning and benchmarked against human scored responses from the Scientific Creative Thinking Test. A regression baseline achieved Pearson π‘Ÿ = .74 on the test set, matching the human inter-rater ceiling reported in the literature. A pairwise classification paradigm aggregated judgments into continuous creativity scores via Elo rating, achieving π‘Ÿ = .69. Increased model scale provided no meaningful performance gains under matched training conditions. Uncer- tainty quantification via stochastic sampling and DBSCAN clustering did not reliably pre- dict prediction error, while tolerance accuracy analysis confirmed that both models were most reliable in the mid-range of the creativity spectrum and could approximate whether a response reflects lower, average, or elevated creativity. Inter-rater disagreement among human judges was identified as a significant contributing source of the performance ceiling, with disagreement concentrated at the upper end of the creativity spectrum where model predictions were also least reliable. These findings suggest that future progress in automated creativity assessment depends more on improving ground truth label quality than on scaling model size or changing training paradigms.

Comments

None

Share

COinS