Semester of Gradation
Summer 2025
Degree Type
Dissertation
Degree Name
Doctor of Philosophy in Data Science and Analytics
Department
Data Science and Analytics
Committee Chair/First Advisor
Herman E. Ray
Second Advisor
Linh Le
Third Advisor
Xinyan Zhang
Abstract
Recent advancements in deep learning, particularly the development of large language models, have generated substantial interest, yet there remains limited evidence that these technologies consistently fulfill their anticipated potential. While uncertainty quantification has been extensively studied in the context of classification and regression tasks, it is comparatively underdeveloped in generative models, and image captioning models in particular. At present, there is limited consensus regarding appropriate methodologies for quantifying uncertainty in these systems. This research examines existing uncertainty quantification approaches and evaluates their suitability for image captioning models. The findings indicate that current methods are generally inadequate for the generative setting, owing to the conditional and recursive nature of language generation. To address this gap, we conduct experiments involving the generation of structured captions and developed a distributional framework to quantify uncertainty based on the predicted probabilities associated with generated tokens. We find the distributional method works for a limited number of tokens generated. Subsequently, the investigation extends to unstructured captions, wherein we introduce a method for constructing prediction sets around parts of speech, thereby providing a specified level of confidence that the true value resides within the set. These prediction sets can be utilized to score captions, facilitating the identification of captions that warrant further review. This approach not only enables the quantification of uncertainty in generated text captions but also supports the formation of word sets that are most relevant to the image.