Event Title

GR-175 - Credit Default prediction of money borrowing companies using Pyspark framework

Streaming Media

Document Type

Event

Start Date

28-4-2022 5:00 PM

Description

Credit-lending companies have resorted to the use of Machine Learning algorithms in the recent past to predict the probability of default of a customer for future credit lending purposes. Most credit companies view this as a binary classification problem of predicting whether an individual would default or not. Companies have been using models of Logistic Regression for a long time because of the explainability of the final feature set used in modeling. Explainability brings transparency to every stakeholder involved in the process. Other models like Neural Nets have achieved better accuracy scores, but the features generated by them are not easily comprehendible. By using the credit data of money borrowing companies collected by the credit bureau, Equifax over the last 10 years, we made use of the Pyspark Framework to come up with a model that can predict the reliability of a money borrowing company. After the data preprocessing phase, our predictors showed an accuracy of 97.73% using Logistic Regression and 98.58% accuracy using a Random Forest ensemble classifier. We were also able to identify a few predictors as the key performance indicators using the coefficients in Logistic Regression.

This document is currently not available here.

Share

COinS
 
Apr 28th, 5:00 PM

GR-175 - Credit Default prediction of money borrowing companies using Pyspark framework

Credit-lending companies have resorted to the use of Machine Learning algorithms in the recent past to predict the probability of default of a customer for future credit lending purposes. Most credit companies view this as a binary classification problem of predicting whether an individual would default or not. Companies have been using models of Logistic Regression for a long time because of the explainability of the final feature set used in modeling. Explainability brings transparency to every stakeholder involved in the process. Other models like Neural Nets have achieved better accuracy scores, but the features generated by them are not easily comprehendible. By using the credit data of money borrowing companies collected by the credit bureau, Equifax over the last 10 years, we made use of the Pyspark Framework to come up with a model that can predict the reliability of a money borrowing company. After the data preprocessing phase, our predictors showed an accuracy of 97.73% using Logistic Regression and 98.58% accuracy using a Random Forest ensemble classifier. We were also able to identify a few predictors as the key performance indicators using the coefficients in Logistic Regression.