Statistics and Analytical Sciences
The MIMIC III data comes from an Intensive Care Unit in Boston over a period of 10 years. This data contains billing codes as well as lab and demographic data. This project predicts the outcome “death within 30 days of discharge” through the lense of a healthcare billing company, to see if healthcare companies can play a role in healthcare quality, by only using data that they would have access to (billing and demographic data). This project used a unique method of nominal data variable reduction specific to ICD 9 and CPT codes, and compared the performance of logistic regression and neural networks on the prediction of a balanced binary target variable (death within 30 days of discharge). Averaged cross validated accuracies of all methods were around 71%, which is 21% better than chance alone.