New Methods Seminar — Bank of England
Counterfactual Explanations explain how inputs into a model need to change for it to produce different outputs. Explanations that involve realistic and actionable changes can be used for the purpose of Algorithmic Recourse: they offer human stakeholders a principled approach to not only understand the model they are seeking to explain, but also react to it or adjust it.
The general setup lends itself naturally to Bank datasets that revolve around counterparty risk, for example. In this seminar I will introduce the topic and place it into the broader context of Explainable AI. Using my Julia package I will go through a worked example involving a publicly available credit data set. Finally, I will also briefly present some of our recent research that points to potential pitfalls of current state-of-the-art approaches and proposes mitigation strategies.
DISCLAIMER: Views presented in this presentation are my own.
From human to data-driven decision-making …
… where black boxes are recipe for disaster.
Ground Truthing
Probabilistic Models
Counterfactual Reasoning
Ground Truthing
Probabilistic Models
Counterfactual Reasoning
We typically want to maximize the likelihood of observing \(\mathcal{D}_n\) under given parameters (Murphy 2022):
\[ \theta^* = \arg \max_{\theta} p(\mathcal{D}_n|\theta) \qquad(1)\]
Compute an MLE (or MAP) point estimate \(\hat\theta = \mathbb{E} \theta^*\) and use plugin approximation for prediction:
\[ p(y|x,\mathcal{D}_n) \approx p(y|x,\hat\theta) \qquad(2)\]
Ground Truthing
Probabilistic Models
Counterfactual Reasoning
. . .
[…] deep neural networks are typically very underspecified by the available data, and […] parameters [therefore] correspond to a diverse variety of compelling explanations for the data. (Wilson 2020)
In this setting it is often crucial to treat models probabilistically!
\[ p(y|x,\mathcal{D}_n) = \int p(y|x,\theta)p(\theta|\mathcal{D}_n)d\theta \qquad(3)\]
Ground Truthing
Probabilistic Models
Counterfactual Reasoning
We can now make predictions – great! But do we know how the predictions are actually being made?
With the model trained for its task, we are interested in understanding how its predictions change in response to input changes.
\[ \nabla_x p(y|x,\mathcal{D}_n;\hat\theta) \qquad(4)\]
. . .
Important to realize that we are keeping \(\hat\theta\) constant!
CounterfactualExplanations.jl
in Julia (\(\approx\) 15min)Even though […] interpretability is of great importance and should be pursued, explanations can, in principle, be offered without opening the “black box”. (Wachter, Mittelstadt, and Russell 2017)
. . .
Objective originally proposed by Wachter, Mittelstadt, and Russell (2017) is as follows
\[ \min_{x\prime \in \mathcal{X}} h(x\prime) \ \ \ \mbox{s. t.} \ \ \ M(x\prime) = t \qquad(5)\]
where \(h\) relates to the complexity of the counterfactual and \(M\) denotes the classifier.
. . .
Typically this is approximated through regularization:
\[ x\prime = \arg \min_{x\prime} \ell(M(x\prime),t) + \lambda h(x\prime) \qquad(6)\]
. . .
Yes and no!
While both are methodologically very similar, adversarial examples are meant to go undetected while CEs ought to be meaningful.
NO!
Causal inference: counterfactuals are thought of as unobserved states of the world that we would like to observe in order to establish causality.
Counterfactual Explanations: involves perturbing features after some model has been trained.
The number of ostensibly pro data scientists confusing themselves into believing that "counterfactual explanations" capture real-world causality is just staggering🤦♀️. Where do we go from here? How can a community that doesn't even understand what's already known make advances?
— Zachary Lipton (@zacharylipton) June 20, 2022
A (highly) simplified and incomplete overview …
“Explanatory models by definition do not produce 100% reliable explanations, because they are approximations. This means explanations can’t be fully trusted, and so neither can the original model.” – causaLens, 2021
“You cannot appeal to (algorithms). They do not listen. Nor do they bend.”
— Cathy O’Neil in Weapons of Math Destruction, 2016
. . .
. . .
. . .
Counterfactual Explanations that involve actionable and realistic feature perturbations can be used for the purpose of Algorithmic Recourse.
CounterfactualExplanations.jl
in Julia 🛠️CounterfactualExplanations.jl
📦Julia has an edge with respect to Trustworthy AI: it’s open-source, uniquely transparent and interoperable 🔴🟢🟣
. . .
We begin by instantiating the fitted model …
. . .
… then based on its prediction for \(x\) we choose the opposite label as our target …
. . .
… and finally generate the counterfactual.
. . .
… et voilà!
When people say that counterfactuals should look realistic or plausible, they really mean that counterfactuals should be generated by the same Data Generating Process (DGP) as the factuals:
\[ x\prime \sim p(x) \]
But how do we estimate \(p(x)\)? Two probabilistic approaches …
Schut et al. (2021) note that by maximizing predictive probabilities \(\sigma(M(x\prime))\) for probabilistic models \(M\in\mathcal{\widetilde{M}}\) one implicitly minimizes epistemic and aleotoric uncertainty.
\[ x\prime = \arg \min_{x\prime} \ell(M(x\prime),t) \ \ \ , \ \ \ M\in\mathcal{\widetilde{M}} \qquad(7)\]
Instead of perturbing samples directly, some have proposed to instead traverse a lower-dimensional latent embedding learned through a generative model (Joshi et al. 2019).
\[ z\prime = \arg \min_{z\prime} \ell(M(dec(z\prime)),t) + \lambda h(x\prime) \qquad(8)\]
and
\[x\prime = dec(z\prime)\]
where \(dec(\cdot)\) is the decoder function.
. . .
This time we use a Bayesian classifier …
. . .
… and once again choose our target label as before …
. . .
… to then finally use greedy search to find a counterfactual.
. . .
In this case the Bayesian approach yields a similar outcome.
. . .
Using the same classifier as before we can either use the specific REVISEGenerator
…
. . .
… or realize that that REVISE (Joshi et al. 2019) just boils down to generic search in a latent space:
. . .
We have essentially combined latent search with a probabilistic classifier (as in Antorán et al. (2020)).
. . .
We can use the DiCEGenerator
to produce multiple diverse counterfactuals:
. . .
Improve on the state of the art in credit scoring by predicting the probability that somebody will experience financial distress in the next two years.
Using DiCE to generate counterfactuals for a single individual, ignoring actionability:
Using the generic generator to generate counterfactuals for multiple individuals, respecting that age
cannot be decreased (you might argue that age
also cannot be easily increased …):
TL;DR: We find that standard implementation of various SOTA approaches to AR can induce substantial domain and model shifts. We argue that these dynamics indicate that individual recourse generates hidden external costs and provide mitigation strategies.
In this work we investigate what happens if Algorithmic Recourse is actually implemented by a large number of individuals.
Figure 7 illustrates what we mean by Endogenous Macrodynamics in Algorithmic Recourse:
We argue that these shifts should be considered as an expected external cost of individual recourse and call for a paradigm shift from individual to collective recourse in these types of situations.
We restate Equation 6 to encapsulate latent space search:
\[ \begin{aligned} \mathbf{s}^\prime &= \arg \min_{\mathbf{s}^\prime \in \mathcal{S}} \left\{ {\text{yloss}(M(f(\mathbf{s}^\prime)),y^*)}+ \lambda {\text{cost}(f(\mathbf{s}^\prime)) } \right\} \end{aligned} \qquad(9)\]
We borrow the notion of negative externalities from Economics, to formalise the idea that individual recourse fails to account for external costs:
\[ \begin{aligned} \mathbf{s}^\prime &= \arg \min_{\mathbf{s}^\prime \in \mathcal{S}} \{ {\text{yloss}(M(f(\mathbf{s}^\prime)),y^*)} \\ &+ \lambda_1 {\text{cost}(f(\mathbf{s}^\prime))} + \lambda_2 {\text{extcost}(f(\mathbf{s}^\prime))} \} \end{aligned} \qquad(10)\]
\[ \begin{aligned} \text{extcost}(f(\mathbf{s}^\prime)) = l(M(f(\mathbf{s}^\prime)),y^\prime) \end{aligned} \qquad(11)\]
\[ \begin{aligned} \text{extcost}(f(\mathbf{s}^\prime)) = \text{dist}(f(\mathbf{s}^\prime),\bar{x}) \end{aligned} \qquad(12)\]
LaplaceRedux.jl
(formerly BayesLaplace.jl
) is a small package that can be used for effortless Bayesian Deep Learning and Logistic Regression trough Laplace Approximation. It is inspired by this Python library and its companion paper.
ConformalPrediction.jl
ConformalPrediction.jl
is a package for Uncertainty Quantification (UQ) through Conformal Prediction (CP) in Julia. It is designed to work with supervised models trained in MLJ (Blaom et al. 2020). Conformal Prediction is distribution-free, easy-to-understand, easy-to-use and model-agnostic.
Read on …
… or get involved! 🤗
CounterfactualExplanations.jl
ConformalPrediction.jl
Explaining Machine Learning Models through Counterfactuals — Patrick Altmeyer — CC BY-NC