Bringing Order to Chaos: Navigating the Disagreement Problem in Explainable ML

APA

(2022). Bringing Order to Chaos: Navigating the Disagreement Problem in Explainable ML. The Simons Institute for the Theory of Computing. https://old.simons.berkeley.edu/node/22930

MLA

Bringing Order to Chaos: Navigating the Disagreement Problem in Explainable ML. The Simons Institute for the Theory of Computing, Nov. 09, 2022, https://old.simons.berkeley.edu/node/22930

BibTex

          @misc{ scivideos_22930,
            doi = {},
            url = {https://old.simons.berkeley.edu/node/22930},
            author = {},
            keywords = {},
            language = {en},
            title = {Bringing Order to Chaos: Navigating the Disagreement Problem in Explainable ML},
            publisher = {The Simons Institute for the Theory of Computing},
            year = {2022},
            month = {nov},
            note = {22930 see, \url{https://scivideos.org/simons-institute/22930}}
          }
          
Hima Lakkaraju (Harvard University)
Source Repository Simons Institute

Abstract

As various post hoc explanation methods are increasingly being leveraged to explain complex models in high-stakes settings, it becomes critical to develop a deeper understanding of if and when the explanations output by these methods disagree with each other, why these disagreements occur, and how to address these disagreements in a rigorous fashion. However, there is little to no research that provides answers to these critical questions. In this talk, I will present some of our recent research which addresses the aforementioned questions. More specifically, I will discuss i) a novel quantitative framework to formalize the disagreement between state-of-the-art feature attribution based explanation methods (e.g., LIME, SHAP, Gradient based methods). I will also touch upon on how this framework was constructed by leveraging inputs from interviews and user studies with data scientists who utilize explanation methods in their day-to-day work; ii) an online user study to understand how data scientists resolve disagreements in explanations output by the aforementioned methods; iii) a novel function approximation framework to explain why explanation methods often disagree with each other. I will demonstrate that all the key feature attribution based explanation methods are essentially performing local function approximations albeit, with different loss functions and notions of neighborhood. (iv) a set of guiding principles on how to choose explanation methods and resulting explanations when they disagree in real-world settings. I will conclude this talk by presenting a brief overview of an open source framework that we recently developed called Open-XAI which enables researchers and practitioners to seamlessly evaluate and benchmark both existing and new explanation methods based on various characteristics such as faithfulness, stability, and fairness.