Date of Award
Fall 11-30-2022
Degree Type
Dissertation
Degree Name
Doctor of Philosophy in Analytic and Data Science
Department
Statistics and Analytical Sciences
Committee Chair/First Advisor
Ying Xie
Committee Member
Sherry Ni
Committee Member
Yifan Zhang
Committee Member
Sumit Chakravarty
Committee Member
Xinyue Zhang
Abstract
We have seen complex deep learning models outperforming human benchmarks in many areas (e.g. computer vision, natural language processing). Clever architectures and higher model complexity are two of the major drivers of such outstanding performances. Higher model complexity generally makes the decision-making process of a model opaque to human perception. But understanding the decision-making process is very important for many reasons including enhancing trust in the model's prediction, improving model robustness, gaining actionable insight from why a model made a particular prediction, and discovering new knowledge about a problem. Model explainability has been an active area of research for some time now, but the problem is still far from being solved. An established way of model explanation (also known as variable attribution) is to assign a score to each variable, which represents the importance of the variable in a particular prediction of a model. In a lot of techniques, the scoring process involves distributing the output to each variable. This approach becomes challenging when the model is complex and consists of a high degree of interaction terms. A coalition game theoretic approach called Shapley Value provides a fair way to tackle the challenge. However, the growth of computation time of the exact Shapley Values is exponential in the number of variables. Hence, it is common to use approximations as opposed to the exact Shapley Values as attribution for relatively larger problems. There has been a lot of progress in the Shapley Value approximation techniques for variable attribution in recent years. However, there is still a lot of room for improvement, especially for complex models. In this manuscript, we propose a novel variable attribution technique called Appley (short for Approximate Shapley) by approximating the Shapley Values in linear time. We show that the "Appley'' attributions are generally closer to the exact Shapley Values than a few existing state-of-the-art attribution techniques.