Date of Award

Fall 11-30-2022

Degree Type

Dissertation

Degree Name

Doctor of Philosophy in Analytic and Data Science

Department

Statistics and Analytical Sciences

Committee Chair/First Advisor

Ying Xie

Committee Member

Sherry Ni

Committee Member

Yifan Zhang

Committee Member

Sumit Chakravarty

Committee Member

Xinyue Zhang

Abstract

We have seen complex deep learning models outperforming human benchmarks in many areas (e.g. computer vision, natural language processing). Clever architectures and higher model complexity are two of the major drivers of such outstanding performances. Higher model complexity generally makes the decision-making process of a model opaque to human perception. But understanding the decision-making process is very important for many reasons including enhancing trust in the model's prediction, improving model robustness, gaining actionable insight from why a model made a particular prediction, and discovering new knowledge about a problem. Model explainability has been an active area of research for some time now, but the problem is still far from being solved. An established way of model explanation (also known as variable attribution) is to assign a score to each variable, which represents the importance of the variable in a particular prediction of a model. In a lot of techniques, the scoring process involves distributing the output to each variable. This approach becomes challenging when the model is complex and consists of a high degree of interaction terms. A coalition game theoretic approach called Shapley Value provides a fair way to tackle the challenge. However, the growth of computation time of the exact Shapley Values is exponential in the number of variables. Hence, it is common to use approximations as opposed to the exact Shapley Values as attribution for relatively larger problems. There has been a lot of progress in the Shapley Value approximation techniques for variable attribution in recent years. However, there is still a lot of room for improvement, especially for complex models. In this manuscript, we propose a novel variable attribution technique called Appley (short for Approximate Shapley) by approximating the Shapley Values in linear time. We show that the "Appley'' attributions are generally closer to the exact Shapley Values than a few existing state-of-the-art attribution techniques.

Available for download on Thursday, December 12, 2024

Included in

Data Science Commons

Share

COinS