Improving risk modeling via feature selection, hyper-parameter adjusting, and model ensembling


Statistics and Analytical Sciences

Document Type


Submission Date

Spring 5-17-2020


It has been demonstrated that methods including feature selection (FS), hyper-parameter adjusting, and model ensembling can improve the performance of binary classifiers. In this study, we propose a framework that aims at improving risk modeling by simultaneously using the above-mentioned model-improving methods. The feasibility of the framework is assessed on a dataset containing commercial information of the US companies. Three FS methods including weight by Relief, weight by information gain, and weight by correlation, are employed on each of the four classifiers including logistic regression (LR), decision tree (DT), neural network (NN), and support vector machine (SVM). After identifying the most appropriate FS method for each classifier, the hyper-parameters are then adjusted. Finally, each classifier is ensembled using bagging and boosting techniques. To investigate the effect of these model-improving methods, the model performance is evaluated using classification accuracy, area under the curve (AUC), false positive rate (FPR), and false negative rate (FNR). The results exhibited that FS and boosting on LR could largely increase its accuracy and decrease FNR. On the contrary, regularization via hyper-parameter adjusting on LR cannot further improve model performance. DT is not sensitive to any of the fore-mentioned methods. The beneficial effect of model-improving methods is obvious on NN with respect to FPR and FNR while negligible in accuracy. SVM is no longer a good base classifier to be ensembled after applying FS and hyper-parameter adjusting methods. The proposed framework provides a reference for the simultaneous utilization of these model-improving methods in business delinquency modeling.