Date of Award
Doctor of Philosophy in Analytic and Data Science
Statistics and Analytical Sciences
Dr. Jennifer Priestley
Dr. Herman Ray
Dr. Ying Xie
Binary classification using imbalanced datasets remains a challenge. Typically, supervised learning algorithms minimize the binary cross-entropy objective function to determine the final parameter estimates. This objective function assumes an equal class distribution between the minority (i.e. events) and majority (i.e. non-events) classes, which almost never exists in real-world modeling. In the imbalanced data setting, the equal class distribution is grossly violated, and the resulting parameter estimates are biased toward the majority class. To overcome the bias and improve model generalization, we focus on modifying the original binary cross-entropy objective function by uniquely weighting each minority class observation. We base our weighting methodology from a technique developed in a recently published manuscript, which implemented a locally weighted log-likelihood objective function within logistic regression. Building from this published method, we develop instance-level weights for each minority class observation that are learned from the data but overcome the challenges of the original method. Our method drastically reduces the number of decision variables that must be estimated, ensures the boundedness of the instance-level weights, and maintains the convexity of the objective function for efficient and reliable parameter estimation. This dissertation provides a comprehensive critique of the recently published base algorithm and derives an alternative formulation of the objective function. We implement this novel objective function in logistic regression and neural networks models and show significant performance improvement using synthetic and real-world imbalanced datasets.
Available for download on Sunday, October 29, 2023