Date of Award

Spring 5-11-2022

Degree Type

Dissertation

Degree Name

Doctor of Philosophy in Analytic and Data Science

Department

Statistics and Analytical Sciences

Committee Chair/First Advisor

Dr. Jennifer Priestley

Committee Member

Dr. Herman Ray

Committee Member

Dr. Ying Xie

Abstract

Binary classification using imbalanced datasets remains a challenge. Typically, supervised learning algorithms minimize the binary cross-entropy objective function to determine the final parameter estimates. This objective function assumes an equal class distribution between the minority (i.e. events) and majority (i.e. non-events) classes, which almost never exists in real-world modeling. In the imbalanced data setting, the equal class distribution is grossly violated, and the resulting parameter estimates are biased toward the majority class. To overcome the bias and improve model generalization, we focus on modifying the original binary cross-entropy objective function by uniquely weighting each minority class observation. We base our weighting methodology from a technique developed in a recently published manuscript, which implemented a locally weighted log-likelihood objective function within logistic regression. Building from this published method, we develop instance-level weights for each minority class observation that are learned from the data but overcome the challenges of the original method. Our method drastically reduces the number of decision variables that must be estimated, ensures the boundedness of the instance-level weights, and maintains the convexity of the objective function for efficient and reliable parameter estimation. This dissertation provides a comprehensive critique of the recently published base algorithm and derives an alternative formulation of the objective function. We implement this novel objective function in logistic regression and neural networks models and show significant performance improvement using synthetic and real-world imbalanced datasets.

Share

COinS