Date of Award

Summer 7-19-2024

Degree Type

Dissertation/Thesis

Degree Name

Doctor of Philosophy in Data Science and Analytics

Department

School of Data Science and Analytics

Committee Chair/First Advisor

Dr. MinJae Woo

Second Advisor

Dr. Sherry Ni

Third Advisor

Dr. Xinyan Zhang

Fourth Advisor

Dr. Ramazan Aygun

Abstract

This dissertation addresses two primary research goals: first, to establish a pipeline for analyzing subgroup disparities in medical informatics predictive models, and second, to develop strategies that mitigate these disparities, enhancing model fairness. Initiating with an innovative analytical framework to identify specific data subgroups prone to higher model misclassification risks of the predictive models in medical field, the research underscores the need for bias reduction in predictive models, which could otherwise lead to significant ethical, legal, and professional repercussions. This dissertation then presents two tailored data preparation and modeling techniques that were customized for the two distinct dataset types, enhancing both fairness and reliability in predictive outcomes.

The structure of this dissertation unfolds across three comprehensive chapters. The initial chapter focuses on developing and applying a disparity identification pipeline tailored for mammogram abnormality classification, utilizing Convolutional Neural Networks (CNN). This chapter highlights how a ResNet152V2 model applied to the Emory BrEast Imaging Dataset, effectively elevate model performance, achieving an AUC of 0.975 and pinpointing significant performance disparities for mammogram patches from patients with high tissue density and architectural distortion image finding. Chapter Two describes the enhancements in model fairness achieved by employing a ResNet50 model for adaptive boosting, which not only maintained AUC at 0.976, but also substantially lowers the disparity score, a metric designed to quantify the overall disparity in model's performance accuracy across demographic, geographic, or other subgroups of interest. The third chapter broadens the scope of this methodology to the prediction of Postpartum Hemorrhage (PPH), implementing an XGBoost model optimized with Balance-SMOTE. This adjustment effectively addresses class imbalances and significantly mitigates disparity scores, illustrating the adaptability and effectiveness of these advanced modeling techniques in diverse medical contexts. Together, these chapters substantiate the robustness of the proposed methods in enhancing fairness, laying down a foundational blueprint for future research aimed at creating equitable machine learning applications in healthcare.

Available for download on Thursday, July 22, 2027

Share

COinS