Faithful Model Explanations

Supervision in the Age of AI

Patrick Altmeyer

Delft University of Technology

Venkatesh Chandrasekar

May 20, 2025

Quick Introduction

3rd year PhD in Trustworthy Artificial Intelligence at Delft University of Technology.
Part of AI for FinTech Research Lab—5yr collaboration between TU Delft and ING.
Previously, educational background in Economics and Finance and two years in Monetary Policy at the Bank of England.
Research: Trustworthy AI for real-world problems, particularly finance.
Blogger, Julia developer and founder of Taija.

Research

Trustworthy AI

Faithful Model Explanations through Energy-Constrained Conformal Counterfactuals (under review).
Endogenous Macrodynamics in Algorithmic Recourse (Altmeyer, Angela, et al. 2023).
Explaining Black-Box Models Through Counterfactuals (Altmeyer, Deursen, and Liem 2023).

Finance and Economics

Yield curve sensitivity to investor positioning around economic shocks (Altmeyer, Boneva, et al. 2023) (BoE Staff Working Paper).
Deep vector autoregression for macroeconomic data (Agustı́, Altmeyer, and Vidal-Quadras 2021) (NeurIPS MLECON 2021, DNB DS workshop and published through BIS)
Option pricing in the Heston stochastic volatility model: an empirical evaluation (Altmeyer et al. 2018)

Delft FinTech Lab

In A Nutshell

“[…] unified front of our expertise in this area formed to help the financial industry solve the growingly complex challenges.”

—Delft FinTech Lab

Background

Finance at the forefront of digitalization.
Over 50 TU Delft researchers in FinTech.
With dozens of societal partners.

Objective

Strengthen societal and industrial impact.
Increase collaboration and visibility.

Core Pillars

Trustworthy Financial Systems led by Stefan Buijsman.
Quantitative Modelling led by Antonis Papapantoleon.
Financial Data Intelligence led by Asterios Katsifodimos.
Blockchain led by Jérémie Decouchant.

What about topics specific to supervision?

Natural language processing

Mine large volumes of text.
Retrieve relevant information from large collections of text.
Long document understanding and track/identify/predict anomaly patterns.
Explainable AI for NLP (Arous et al. 2021).
Predictive Uncertainty Quantification for LLMs (see blog post).

Privacy Enhancing Technologies (PET)

Synthetic data generation by ML, algorithmic, probabilistic and statistical techniques (Porsius Martins 2023; Werf 2021).
Statistics/econometrics under privacy constraints ex. differential privacy.
Multiparty computations and homomorphic encryptions.

Human-AI Collaboration

Augmenting human experts with AI.

Hybrid human-AI workflows: amplify human intelligence, reduce human costs, and improve precaution.
Collaborative human-AI knowledge synthesis.
Explainable AI (Anand et al. 2022; Leonhardt, Rudra, and Anand 2023), and human-in-the-loop continual learning (Yang et al. 2018).

Automated Compliance

Ensure decisions made by banks’ models are compliant by being explainable and robust:

Counterfactual reasoning and contestability (Altmeyer, Angela, et al. 2023; Altmeyer, Deursen, and Liem 2023)

Ethics assessment, AI governance
Predictive uncertainty: How robust/uncertain predictions are (conformal predictions) in a model-agnostic manner.

Background

Counterfactual Explanations

Born out of the need for explanations …

Counterfactual Explanation (CE) explain how inputs into a model need to change for it to produce different outputs (Wachter, Mittelstadt, and Russell 2017).

Provided the changes are realistic and actionable, they can be used for Algorithmic Recourse (AR) to help individuals who face adverse outcomes.

Example: Consumer Credit

From ‘loan denied’ to ‘loan supplied’: CounterfactualExplanations.jl 📦.

Figure 1: Gradient-based counterfactual search.

Figure 2: Counterfactuals for Give Me Some Credit dataset (Kaggle 2011).

Example: Insurance Premium¹

Input \(\mathbf{X}\): A dataset of individuals containing demographic and financial information.
Additional Input \(\mathbf{Z}\): Individuals can opt-in to provide their personal Apple Health data to improve their chance of receiving a lower premium.
Binary output \(\mathbf{Y}\): based on the data, the individual is either eligible (\(y=1\)) or not eligible (\(y=0\)) for a lower premium.
To model \(p(y=1|X)\) the insurance provider can rely on an interpretable linear classifier.
To model \(p(y=1|X,Z)\) the insurance provider turns to a more accurate but less interpretable black-box model.

Example: Insurance Premium

In the EU, individuals have the right “[…] to obtain an explanation of the decision reached after such assessment and to challenge the decision.” (Recital 71 of the General Data Protection Regulation (GDPR))

In our example, who do you think is most likely to ask for an explanation?

Een heel kleen beetje maths …

Gradient-based Counterfactual Search

The starting point for most counterfactual generators is as follows,

\[ \begin{aligned} \mathbf{Z}^\prime =& \arg \min_{\mathbf{Z}^\prime \in \mathcal{Z}^L} \{ {\text{yloss}(M_{\theta}(f(\mathbf{Z}^\prime)),\mathbf{y}^+)} \\ &+ \lambda {\text{cost}(f(\mathbf{Z}^\prime)) } \} \end{aligned} \tag{1}\]

where \(\mathbf{Z}^\prime\) is a counterfactual, \(M_{\theta}\) is the black-box model and \(\mathbf{y}^+\) is the desired output.

But wait a second …

Equation 1 looks a lot like an adversarial attack (Goodfellow, Shlens, and Szegedy 2015), doesn’t it?

Figure 3: Adversarial attack on an Image Classifier.

In both settings, we take gradients with respect to features \(\nabla_{\mathbf{Z}^\prime}\text{yloss}(M_{\theta}(f(\mathbf{Z}^\prime)),\mathbf{y}^+)\) in order to trigger changes in the model’s output.

Gradient Descend Visualized

Figure 4: Gradient-based counterfactual search.

Our Research

Open Questions

What makes a counterfactual plausible?
Why do we need plausibility?
Is plausibility all we need?
What makes models more explainable?

Plausibility

There’s no consensus on the exact definition of plausibility but we think about it as follows:

Definition 1 (Plausible Counterfactuals) Let \(\mathcal{X}|\mathbf{y}^+= p(\mathbf{x}|\mathbf{y}^+)\) denote the true conditional distribution of samples in the target class \(\mathbf{y}^+\). Then for \(\mathbf{x}^{\prime}\) to be considered a plausible counterfactual, we need: \(\mathbf{x}^{\prime} \sim \mathcal{X}|\mathbf{y}^+\).

Counter Example

The counterfactual in Figure 5 is valid: it has crossed the decision boundary.
But is it consistent with the data in the target class (blue)?

Figure 5: A valid but implausible counterfactual. Source: Altmeyer, Deursen, and Liem (2023)

Why Plausibility?

Actionability: If a counterfactual is implausible, it is unlikely to be actionable.
Fairness: If a counterfactual is implausible, it is unlikely to be fair.
Robustness: If a counterfactual is implausible, it is unlikely to be robust.

But: Higher plausibility seems to require larger changes and hence increase costs to individuals.

Recourse Dynamics

Moving just across the decision boundary may minimize costs to individuals but it may also generate external costs for other stakeholders (Altmeyer, Angela, et al. 2023).

A Balancing Act

Minimizing private costs generates external costs for other stakeholders.
To avoid this, counterfactuals need to be plausible, i.e. comply with the data-generating process.
In practice, costs to various stakeholders need to be carefully balanced.

Is plausibility really all we need?

Pick your Poison?

All of these counterfactuals are valid explanations for the model’s prediction. Which one would you pick?

Figure 6: Turning a 9 into a 7: Counterfactual Examplanations for an Image Classifier.

What do Models Learn?

These images are sampled from the posterior distribution learned by the model. Looks different, no?

Figure 7: Conditional Generated Images from the Image Classifier

Faithful Counterfactuals

We propose a way to generate counterfactuals that are as plausible as the underlying model permits (under review).

Definition 2 (Faithful Counterfactuals) Let \(\mathcal{X}_{\theta}|\mathbf{y}^+ = p_{\theta}(\mathbf{x}|\mathbf{y}^+)\) denote the conditional distribution of \(\mathbf{x}\) in the target class \(\mathbf{y}^+\), where \(\theta\) denotes the parameters of model \(M_{\theta}\). Then for \(\mathbf{x}^{\prime}\) to be considered a faithful counterfactual, we need: \(\mathbf{x}^{\prime} \sim \mathcal{X}_{\theta}|\mathbf{y}^+\).

Figure 8: Gradient fields and counterfactual paths for different generators.

Improving Models

Now that we have a tool to faithfully explain models we may ask: how do models learn plausible explanations? Initial evidence:

Incorporating predictive uncertainty (e.g. ensembling).
Addressing robustness (e.g. adversarial training in Schut et al. (2021)).
Better model architectures.
Hybrid modelling (i.e. combining generative and discriminative models).

Example: Architecture

Figure 9: Counterfactuals for LeNet-5 convolutional neural network (LeCun et al. 1998).

Example: JEM Ensemble

Figure 10: Counterfactuals for an ensemble of Joint Energy Models (JEM) (Grathwohl et al. 2020).

Open-Source work in `Julia`

🐶 Taija

Research informs development, development informs research.

Trustworthy Artificial Intelligence in Julia.

Taija is a collection of open-source packages for Trustworthy AI in Julia. Our goal is to help researchers and practitioners assess the trustworthiness of predictive models.

Our work has been presented online for JuliaCon 2022, at MIT in Boston for JuliaCon 2023 and hopefully beyond.

Counterfactual Explanations

All the work presented today is powered by CounterfactualExplanations.jl 📦.

There is also a corresponding paper, Explaining Black-Box Models through Counterfactuals, which has been published in JuliaCon Proceedings.

If you decide to use this package in your work, please consider citing the paper:

Conformal Prediction

Conformal Prediction is a model-agnostic, distribution-free approach to Predictive Uncertainty Quantification: ConformalPrediction.jl 📦.

Figure 12: Conformal Prediction sets for an Image Classifier.

Laplace Redux

Effortless Bayesian Deep Learning through Laplace Approximation Daxberger et al. (2021): LaplaceRedux.jl 📦.

Figure 13: Predictive interval for neural network with Laplace Approximation.

Joint Energy Models

Joint Energy Models (JEMs) are hybrid models trained to learn the conditional output and input distribution (Grathwohl et al. 2020): JointEnergyModels.jl 📦.

Figure 14: A JEM trained on Circles data.

Questions?

Includes joint work with Cynthia C. S. Liem, Arie van Deursen, Mojtaba Farmanbar, Aleksander Buszydlik, Karol Dobiczek, Giovan Angela and many other students at TU Delft.

Slides power by Quarto.

References

Agustı́, Marc, Patrick Altmeyer, and Ignacio Vidal-Quadras. 2021. “Deep Vector Autoregression for Macroeconomic Data.”

Altmeyer, Patrick, Giovan Angela, Aleksander Buszydlik, Karol Dobiczek, Arie van Deursen, and Cynthia CS Liem. 2023. “Endogenous Macrodynamics in Algorithmic Recourse.” In 2023 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML), 418–31. IEEE.

Altmeyer, Patrick, Lena Boneva, Rafael Kinston, Shreyosi Saha, and Evarist Stoja. 2023. “Yield Curve Sensitivity to Investor Positioning Around Economic Shocks.”

Altmeyer, Patrick, Arie van Deursen, and Cynthia C. S. Liem. 2023. “Explaining Black-Box Models through Counterfactuals.” In Proceedings of the JuliaCon Conferences, 1:130.

Altmeyer, Patrick, Jacob Daniel Grapendal, Makar Pravosud, and Gand Derry Quintana. 2018. “Option Pricing in the Heston Stochastic Volatility Model: An Empirical Evaluation.”

Anand, Avishek, Lijun Lyu, Maximilian Idahl, Yumeng Wang, Jonas Wallat, and Zijian Zhang. 2022. “Explainable Information Retrieval: A Survey.” arXiv Preprint arXiv:2211.02405.

Arous, Ines, Ljiljana Dolamic, Jie Yang, Akansha Bhardwaj, Giuseppe Cuccu, and Philippe Cudré-Mauroux. 2021. “Marta: Leveraging Human Rationales for Explainable Text Classification.” In Proceedings of the AAAI Conference on Artificial Intelligence, 35:5868–76. 7.

Daxberger, Erik, Agustinus Kristiadi, Alexander Immer, Runa Eschenhagen, Matthias Bauer, and Philipp Hennig. 2021. “Laplace Redux-Effortless Bayesian Deep Learning.” Advances in Neural Information Processing Systems 34.

Goodfellow, Ian, Jonathon Shlens, and Christian Szegedy. 2015. “Explaining and Harnessing Adversarial Examples.” https://arxiv.org/abs/1412.6572.

Grathwohl, Will, Kuan-Chieh Wang, Joern-Henrik Jacobsen, David Duvenaud, Mohammad Norouzi, and Kevin Swersky. 2020. “Your Classifier Is Secretly an Energy Based Model and You Should Treat It Like One.” In International Conference on Learning Representations.

Immer, Alexander, Maciej Korzepa, and Matthias Bauer. 2020. “Improving Predictions of Bayesian Neural Networks via Local Linearization.” https://arxiv.org/abs/2008.08400.

Kaggle. 2011. “Give Me Some Credit, Improve on the State of the Art in Credit Scoring by Predicting the Probability That Somebody Will Experience Financial Distress in the Next Two Years.” https://www.kaggle.com/c/GiveMeSomeCredit; Kaggle. https://www.kaggle.com/c/GiveMeSomeCredit.

LeCun, Yann, Léon Bottou, Yoshua Bengio, and Patrick Haffner. 1998. “Gradient-Based Learning Applied to Document Recognition.” Proceedings of the IEEE 86 (11): 2278–2324.

Leonhardt, Jurek, Koustav Rudra, and Avishek Anand. 2023. “Extractive Explanations for Interpretable Text Ranking.” ACM Transactions on Information Systems 41 (4): 1–31.

Porsius Martins, Célio. 2023. “Private Cycle Detection in Financial Transactions.”

Schut, Lisa, Oscar Key, Rory McGrath, Luca Costabello, Bogdan Sacaleanu, Yarin Gal, et al. 2021. “Generating Interpretable Counterfactual Explanations By Implicit Minimisation of Epistemic and Aleatoric Uncertainties.” In International Conference on Artificial Intelligence and Statistics, 1756–64. PMLR.

Spooner, Thomas, Danial Dervovic, Jason Long, Jon Shepard, Jiahao Chen, and Daniele Magazzeni. 2021. “Counterfactual Explanations for Arbitrary Regression Models.” https://arxiv.org/abs/2106.15212.

Wachter, Sandra, Brent Mittelstadt, and Chris Russell. 2017. “Counterfactual Explanations Without Opening the Black Box: Automated Decisions and the GDPR.” Harv. JL & Tech. 31: 841. https://doi.org/10.2139/ssrn.3063289.

Werf, Daan van der. 2021. “One Step Ahead: A Weakly-Supervised Approach to Training Robust Machine Learning Models for Transaction Monitoring.”

Yang, Jie, Thomas Drake, Andreas Damianou, and Yoelle Maarek. 2018. “Leveraging Crowdsourcing Data for Deep Active Learning an Application: Learning Intents in Alexa.” In Proceedings of the 2018 World Wide Web Conference, 23–32.

Faithful Model Explanations

Quick Introduction

Research

Trustworthy AI

Finance and Economics

Delft FinTech Lab

In A Nutshell

Background

Objective

Core Pillars

Natural language processing

Privacy Enhancing Technologies (PET)

Human-AI Collaboration

Automated Compliance

Background

Counterfactual Explanations

Example: Consumer Credit

Example: Insurance Premium1

Example: Insurance Premium

Een heel kleen beetje maths …

Gradient-based Counterfactual Search

But wait a second …

Gradient Descend Visualized

Our Research

Open Questions

Plausibility

Counter Example

Why Plausibility?

Recourse Dynamics

A Balancing Act

Pick your Poison?

What do Models Learn?

Faithful Counterfactuals

Improving Models

Example: Architecture

Example: JEM Ensemble

Open-Source work in Julia

🐶 Taija

Counterfactual Explanations

Conformal Prediction

Laplace Redux

Joint Energy Models

Questions?

References

Example: Insurance Premium¹

Open-Source work in `Julia`