Department

Statistics and Analytical Sciences

Document Type

Article

Submission Date

Spring 1-26-2020

Abstract

Data mining techniques have numerous applications in bankcard response modeling. Logistic regression has been used as the standard modeling tool in the financial industry because of its almost always desirable performance and its interpretability. In this paper, we propose a hybrid bankcard response model, which integrates decision tree-based chi-square automatic interaction detection (CHAID) into logistic regression. In the first stage of the hybrid model, CHAID analysis is used to detect the possible potential variable interactions. Then in the second stage, these potential interactions are served as the additional input variables in logistic regression. The motivation of the proposed hybrid model is that adding variable interactions may improve the performance of logistic regression. Theoretically, all possible interactions could be added in logistic regression and significant interactions could be identified by feature selection procedures. However, even the stepwise selection is very time-consuming when the number of independent variables is large and tends to cause the p >> n problem. On the other hand, using CHAID analysis for the detection of variable interactions has the potential to overcome the above-mentioned drawbacks. To demonstrate the effectiveness of the proposed hybrid model, it is evaluated on a real credit customer response data set. As the results reveal, by identifying potential interactions among independent variables, the proposed hybrid approach outperforms the logistic regression without searching for interactions in terms of classification accuracy, the area under the receiver operating characteristic curve (ROC), and Kolmogorov-Smirnov (KS) statistics. Furthermore, CHAID analysis for interaction detection is much more computationally efficient than the stepwise search mentioned above and some identified interactions are shown to have statistically significant predictive power on the target variable. Last but not least, the customer profile created based on the CHAID tree provides a reasonable interpretation of the interactions, which is required by regulations of the credit industry. Hence, this study provides an alternative for handling bankcard classification tasks.

COinS