Date of Award
Summer 7-19-2024
Degree Type
Dissertation
Degree Name
Ph.D. in Data Science and Analytics (Computer Science)
Department
College of Computing and Software Engineering - School of Data Science and Analytics
Committee Chair/First Advisor
Dr. MinJae Woo
Second Advisor
Dr. Sherry Ni
Third Advisor
Dr. Xinyan Zhang
Fourth Advisor
Dr. Ramazan Aygun
Abstract
This dissertation addresses two primary research goals: first, to establish a pipeline for analyzing subgroup disparities in medical informatics predictive models, and second, to develop strategies that mitigate these disparities, enhancing model fairness. Initiating with an innovative analytical framework to identify specific data subgroups prone to higher model misclassification risks of the predictive models in medical field, the research underscores the need for bias reduction in predictive models, which could otherwise lead to significant ethical, legal, and professional repercussions. This dissertation then presents two tailored data preparation and modeling techniques that were customized for the two distinct dataset types, enhancing both fairness and reliability in predictive outcomes.
The structure of this dissertation unfolds across three comprehensive chapters. The initial chapter focuses on developing and applying a disparity identification pipeline tailored for mammogram abnormality classification, utilizing Convolutional Neural Networks (CNN). This chapter highlights how a ResNet152V2 model applied to the Emory BrEast Imaging Dataset, effectively elevate model performance, achieving an AUC of 0.975 and pinpointing significant performance disparities for mammogram patches from patients with high tissue density and architectural distortion image finding. Chapter Two describes the enhancements in model fairness achieved by employing a ResNet50 model for adaptive boosting, which not only maintained AUC at 0.976, but also substantially lowers the disparity score, a metric designed to quantify the overall disparity in model's performance accuracy across demographic, geographic, or other subgroups of interest. The third chapter broadens the scope of this methodology to the prediction of Postpartum Hemorrhage (PPH), implementing an XGBoost model optimized with Balance-SMOTE. This adjustment effectively addresses class imbalances and significantly mitigates disparity scores, illustrating the adaptability and effectiveness of these advanced modeling techniques in diverse medical contexts. Together, these chapters substantiate the robustness of the proposed methods in enhancing fairness, laying down a foundational blueprint for future research aimed at creating equitable machine learning applications in healthcare.