Document Type
Event
Start Date
28-4-2022 5:00 PM
Description
Credit-lending companies have resorted to the use of Machine Learning algorithms in the recent past to predict the probability of default of a customer for future credit lending purposes. Most credit companies view this as a binary classification problem of predicting whether an individual would default or not. Companies have been using models of Logistic Regression for a long time because of the explainability of the final feature set used in modeling. Explainability brings transparency to every stakeholder involved in the process. Other models like Neural Nets have achieved better accuracy scores, but the features generated by them are not easily comprehendible. By using the credit data of money borrowing companies collected by the credit bureau, Equifax over the last 10 years, we made use of the Pyspark Framework to come up with a model that can predict the reliability of a money borrowing company. After the data preprocessing phase, our predictors showed an accuracy of 97.73% using Logistic Regression and 98.58% accuracy using a Random Forest ensemble classifier. We were also able to identify a few predictors as the key performance indicators using the coefficients in Logistic Regression.
Included in
GR-175 - Credit Default prediction of money borrowing companies using Pyspark framework
Credit-lending companies have resorted to the use of Machine Learning algorithms in the recent past to predict the probability of default of a customer for future credit lending purposes. Most credit companies view this as a binary classification problem of predicting whether an individual would default or not. Companies have been using models of Logistic Regression for a long time because of the explainability of the final feature set used in modeling. Explainability brings transparency to every stakeholder involved in the process. Other models like Neural Nets have achieved better accuracy scores, but the features generated by them are not easily comprehendible. By using the credit data of money borrowing companies collected by the credit bureau, Equifax over the last 10 years, we made use of the Pyspark Framework to come up with a model that can predict the reliability of a money borrowing company. After the data preprocessing phase, our predictors showed an accuracy of 97.73% using Logistic Regression and 98.58% accuracy using a Random Forest ensemble classifier. We were also able to identify a few predictors as the key performance indicators using the coefficients in Logistic Regression.