Statistics and Analytical Sciences

Document Type


Submission Date



This paper aims at predicting businesses’ past due in service accounts as well as determining the variables that impact the likelihood of repayment. Two binary classification approaches, logistic regression and the decision tree, were conducted and compared. Both approaches have very good performances with respect to the accuracy. However, the decision tree only uses 10 predictors and reaches an accuracy of 96.69% on the validation set while logistic regression includes 14 predictors and reaches an accuracy of 94.58%. Due to the large concern of false negatives in financial industry, the decision tree technique is a better option than logistic regression on the given dataset in terms of its relative lower false negative. Accuracy, false positive and false negative are all very important criteria in model selection and evaluation. Decision making should rely more on the research purpose, rather than on the exact values of these criteria.