Dissertations

Innovative Approaches for Identifying and Reducing Disparity in Machine Learning Model Performance – Bridging the Gap in Binary Classification for Health Informatics

Linglin ZhangFollow

Semester of Gradation

Summer 2024

Degree Type

Dissertation

Degree Name

Ph.D. in Data Science and Analytics (Computer Science)

Department

College of Computing and Software Engineering - School of Data Science and Analytics

Committee Chair/First Advisor

Dr. MinJae Woo

Second Advisor

Dr. Sherry Ni

Third Advisor

Dr. Xinyan Zhang

Fourth Advisor

Dr. Ramazan Aygun

Abstract

This dissertation addresses two primary research goals: first, to establish a pipeline for analyzing subgroup disparities in medical informatics predictive models, and second, to develop strategies that mitigate these disparities, enhancing model fairness. Initiating with an innovative analytical framework to identify specific data subgroups prone to higher model misclassification risks of the predictive models in medical field, the research underscores the need for bias reduction in predictive models, which could otherwise lead to significant ethical, legal, and professional repercussions. This dissertation then presents two tailored data preparation and modeling techniques that were customized for the two distinct dataset types, enhancing both fairness and reliability in predictive outcomes.

The structure of this dissertation unfolds across three comprehensive chapters. The initial chapter focuses on developing and applying a disparity identification pipeline tailored for mammogram abnormality classification, utilizing Convolutional Neural Networks (CNN). This chapter highlights how a ResNet152V2 model applied to the Emory BrEast Imaging Dataset, effectively elevate model performance, achieving an AUC of 0.975 and pinpointing significant performance disparities for mammogram patches from patients with high tissue density and architectural distortion image finding. Chapter Two describes the enhancements in model fairness achieved by employing a ResNet50 model for adaptive boosting, which not only maintained AUC at 0.976, but also substantially lowers the disparity score, a metric designed to quantify the overall disparity in model's performance accuracy across demographic, geographic, or other subgroups of interest. The third chapter broadens the scope of this methodology to the prediction of Postpartum Hemorrhage (PPH), implementing an XGBoost model optimized with Balance-SMOTE. This adjustment effectively addresses class imbalances and significantly mitigates disparity scores, illustrating the adaptability and effectiveness of these advanced modeling techniques in diverse medical contexts. Together, these chapters substantiate the robustness of the proposed methods in enhancing fairness, laying down a foundational blueprint for future research aimed at creating equitable machine learning applications in healthcare.

Dissertations

Innovative Approaches for Identifying and Reducing Disparity in Machine Learning Model Performance – Bridging the Gap in Binary Classification for Health Informatics

Semester of Gradation

Degree Type

Degree Name

Department

Committee Chair/First Advisor

Second Advisor

Third Advisor

Fourth Advisor

Abstract

Included in

Search

Authors

Browse

Useful Links

Dissertations

Innovative Approaches for Identifying and Reducing Disparity in Machine Learning Model Performance – Bridging the Gap in Binary Classification for Health Informatics

Author

Semester of Gradation

Degree Type

Degree Name

Department

Committee Chair/First Advisor

Second Advisor

Third Advisor

Fourth Advisor

Abstract

Included in

Share

Search

Authors

Browse

Useful Links