Date of Award

Summer 7-21-2020

Degree Type

Dissertation

Degree Name

Doctor of Philosophy in Analytic and Data Science

Department

Statistics and Analytical Sciences

Committee Chair/First Advisor

Herman Ray

Committee Member

Jennifer Priestley

Committee Member

Lin Li

Abstract

Through a review of epistemological frameworks in social sciences, history of frameworks in statistics, as well as the current state of research, we establish that there appears to be no consistent, quantitatively motivated model development framework in data science, and the downstream analysis effects of various modeling choices are not uniformly documented. Examples are provided which illustrate that analytic choices, even if justifiable and statistically valid, have a downstream analysis effect on model results. This study proposes a unified model development framework that allows researchers to make statistically motivated modeling choices within the development pipeline. Additionally, a simulation study is used to determine empirical justification of the proposed framework. This study tests the utility of the proposed framework by investigating the effects of normalization on downstream analysis results. Normalization methods are investigated by utilizing a decomposition of the empirical risk functions, measuring effects on model bias, variance, and irreducible error. Measurements of bias and variance are then applied as diagnostic procedures for model pre-processing and development within the unified framework. Findings from simulation results are included in the proposed framework and stress-tested on benchmark datasets as well as several applications.

Share

COinS