Faithful Model Explanations

Supervision in the Age of AI

Delft University of Technology

Venkatesh Chandrasekar

April 25, 2025

Quick Introduction

  • 3rd year PhD in Trustworthy Artificial Intelligence at Delft University of Technology.
  • Part of AI for FinTech Research Lab—5yr collaboration between TU Delft and ING.
  • Previously, educational background in Economics and Finance and two years in Monetary Policy at the Bank of England.
  • Research: Trustworthy AI for real-world problems, particularly finance.
  • Blogger, Julia developer and founder of Taija.

Research

Trustworthy AI
Finance and Economics

Delft FinTech Lab

In A Nutshell

“[…] unified front of our expertise in this area formed to help the financial industry solve the growingly complex challenges.”

Delft FinTech Lab

Background

  • Finance at the forefront of digitalization.
  • Over 50 TU Delft researchers in FinTech.
  • With dozens of societal partners.

Objective

  • Strengthen societal and industrial impact.
  • Increase collaboration and visibility.

Core Pillars

  1. Trustworthy Financial Systems led by Stefan Buijsman.
  2. Quantitative Modelling led by Antonis Papapantoleon.
  3. Financial Data Intelligence led by Asterios Katsifodimos.
  4. Blockchain led by Jérémie Decouchant.

What about topics specific to supervision?

Natural language processing

  1. Mine large volumes of text.
  2. Retrieve relevant information from large collections of text.
  3. Long document understanding and track/identify/predict anomaly patterns.
  4. Explainable AI for NLP (Arous et al. 2021).
  5. Predictive Uncertainty Quantification for LLMs (see blog post).

Privacy Enhancing Technologies (PET)

  1. Synthetic data generation by ML, algorithmic, probabilistic and statistical techniques (Porsius Martins 2023; Werf 2021).
  2. Statistics/econometrics under privacy constraints ex. differential privacy.
  3. Multiparty computations and homomorphic encryptions.

Human-AI Collaboration

Augmenting human experts with AI.

  1. Hybrid human-AI workflows: amplify human intelligence, reduce human costs, and improve precaution.
  2. Collaborative human-AI knowledge synthesis.
  3. Explainable AI (Anand et al. 2022; Leonhardt, Rudra, and Anand 2023), and human-in-the-loop continual learning (Yang et al. 2018).

Automated Compliance

  1. Ensure decisions made by banks’ models are compliant by being explainable and robust:
  1. Ethics assessment, AI governance
  2. Predictive uncertainty: How robust/uncertain predictions are (conformal predictions) in a model-agnostic manner.

Background

Counterfactual Explanations

Born out of the need for explanations

Counterfactual Explanation (CE) explain how inputs into a model need to change for it to produce different outputs (Wachter, Mittelstadt, and Russell 2017).

Provided the changes are realistic and actionable, they can be used for Algorithmic Recourse (AR) to help individuals who face adverse outcomes.

Example: Consumer Credit

From ‘loan denied’ to ‘loan supplied’: CounterfactualExplanations.jl 📦.

Figure 1: Gradient-based counterfactual search.
Figure 2: Counterfactuals for Give Me Some Credit dataset (Kaggle 2011).

Example: Insurance Premium1

  • Input \(\mathbf{X}\): A dataset of individuals containing demographic and financial information.
  • Additional Input \(\mathbf{Z}\): Individuals can opt-in to provide their personal Apple Health data to improve their chance of receiving a lower premium.
  • Binary output \(\mathbf{Y}\): based on the data, the individual is either eligible (\(y=1\)) or not eligible (\(y=0\)) for a lower premium.
  • To model \(p(y=1|X)\) the insurance provider can rely on an interpretable linear classifier.
  • To model \(p(y=1|X,Z)\) the insurance provider turns to a more accurate but less interpretable black-box model.

Example: Insurance Premium

In the EU, individuals have the right “[…] to obtain an explanation of the decision reached after such assessment and to challenge the decision.” (Recital 71 of the General Data Protection Regulation (GDPR))

In our example, who do you think is most likely to ask for an explanation?

Een heel kleen beetje maths …

But wait a second …

Equation 1 looks a lot like an adversarial attack (Goodfellow, Shlens, and Szegedy 2014), doesn’t it?

Figure 3: Adversarial attack on an Image Classifier.

In both settings, we take gradients with respect to features \(\nabla_{\mathbf{Z}^\prime}\text{yloss}(M_{\theta}(f(\mathbf{Z}^\prime)),\mathbf{y}^+)\) in order to trigger changes in the model’s output.

Gradient Descend Visualized

Figure 4: Gradient-based counterfactual search.

Our Research

Open Questions

  1. What makes a counterfactual plausible?
  2. Why do we need plausibility?
  3. Is plausibility all we need?
  4. What makes models more explainable?

Plausibility

There’s no consensus on the exact definition of plausibility but we think about it as follows:

Definition 1 (Plausible Counterfactuals) Let \(\mathcal{X}|\mathbf{y}^+= p(\mathbf{x}|\mathbf{y}^+)\) denote the true conditional distribution of samples in the target class \(\mathbf{y}^+\). Then for \(\mathbf{x}^{\prime}\) to be considered a plausible counterfactual, we need: \(\mathbf{x}^{\prime} \sim \mathcal{X}|\mathbf{y}^+\).

Counter Example

  • The counterfactual in Figure 5 is valid: it has crossed the decision boundary.
  • But is it consistent with the data in the target class (blue)?
Figure 5: A valid but implausible counterfactual. Source: Altmeyer, Deursen, et al. (2023)

Why Plausibility?

  • Actionability: If a counterfactual is implausible, it is unlikely to be actionable.
  • Fairness: If a counterfactual is implausible, it is unlikely to be fair.
  • Robustness: If a counterfactual is implausible, it is unlikely to be robust.

But: Higher plausibility seems to require larger changes and hence increase costs to individuals.

Recourse Dynamics

Moving just across the decision boundary may minimize costs to individuals but it may also generate external costs for other stakeholders (Altmeyer, Angela, et al. 2023).

A Balancing Act

  • Minimizing private costs generates external costs for other stakeholders.
  • To avoid this, counterfactuals need to be plausible, i.e. comply with the data-generating process.
  • In practice, costs to various stakeholders need to be carefully balanced.

Is plausibility really all we need?

Pick your Poison?

All of these counterfactuals are valid explanations for the model’s prediction. Which one would you pick?

Figure 6: Turning a 9 into a 7: Counterfactual Examplanations for an Image Classifier.

What do Models Learn?

These images are sampled from the posterior distribution learned by the model. Looks different, no?

Figure 7: Conditional Generated Images from the Image Classifier

Faithful Counterfactuals

We propose a way to generate counterfactuals that are as plausible as the underlying model permits (under review).

Definition 2 (Faithful Counterfactuals) Let \(\mathcal{X}_{\theta}|\mathbf{y}^+ = p_{\theta}(\mathbf{x}|\mathbf{y}^+)\) denote the conditional distribution of \(\mathbf{x}\) in the target class \(\mathbf{y}^+\), where \(\theta\) denotes the parameters of model \(M_{\theta}\). Then for \(\mathbf{x}^{\prime}\) to be considered a faithful counterfactual, we need: \(\mathbf{x}^{\prime} \sim \mathcal{X}_{\theta}|\mathbf{y}^+\).

Figure 8: Gradient fields and counterfactual paths for different generators.

Improving Models

Now that we have a tool to faithfully explain models we may ask: how do models learn plausible explanations? Initial evidence:

  1. Incorporating predictive uncertainty (e.g. ensembling).
  2. Addressing robustness (e.g. adversarial training in Schut et al. (2021)).
  3. Better model architectures.
  4. Hybrid modelling (i.e. combining generative and discriminative models).

Example: Architecture

Figure 9: Counterfactuals for LeNet-5 convolutional neural network (LeCun et al. 1998).

Example: JEM Ensemble

Figure 10: Counterfactuals for an ensemble of Joint Energy Models (JEM) (Grathwohl et al. 2020).

Open-Source work in Julia

🐶 Taija

Research informs development, development informs research.

Trustworthy Artificial Intelligence in Julia.

Trustworthy Artificial Intelligence in Julia.

Taija is a collection of open-source packages for Trustworthy AI in Julia. Our goal is to help researchers and practitioners assess the trustworthiness of predictive models.

Our work has been presented online for JuliaCon 2022, at MIT in Boston for JuliaCon 2023 and hopefully beyond.

Counterfactual Explanations

All the work presented today is powered by CounterfactualExplanations.jl 📦.

There is also a corresponding paper, Explaining Black-Box Models through Counterfactuals, which has been published in JuliaCon Proceedings.

If you decide to use this package in your work, please consider citing the paper:

DOI DOI

Conformal Prediction

Conformal Prediction is a model-agnostic, distribution-free approach to Predictive Uncertainty Quantification: ConformalPrediction.jl 📦.

Figure 11: Conformal Prediction intervals for regression.
Figure 12: Conformal Prediction sets for an Image Classifier.

Laplace Redux

Effortless Bayesian Deep Learning through Laplace Approximation Daxberger et al. (2021): LaplaceRedux.jl 📦.

Figure 13: Predictive interval for neural network with Laplace Approximation.

Joint Energy Models

Joint Energy Models (JEMs) are hybrid models trained to learn the conditional output and input distribution (Grathwohl et al. 2020): JointEnergyModels.jl 📦.

Figure 14: A JEM trained on Circles data.

Questions?

Includes joint work with Cynthia C. S. Liem, Arie van Deursen, Mojtaba Farmanbar, Aleksander Buszydlik, Karol Dobiczek, Giovan Angela and many other students at TU Delft.

Slides power by Quarto.

References

Agustı́, Marc, Patrick Altmeyer, and Ignacio Vidal-Quadras. 2021. “Deep Vector Autoregression for Macroeconomic Data.”
Altmeyer, Patrick, Giovan Angela, Aleksander Buszydlik, Karol Dobiczek, Arie van Deursen, and Cynthia CS Liem. 2023. “Endogenous Macrodynamics in Algorithmic Recourse.” In 2023 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML), 418–31. IEEE.
Altmeyer, Patrick, Lena Boneva, Rafael Kinston, Shreyosi Saha, and Evarist Stoja. 2023. “Yield Curve Sensitivity to Investor Positioning Around Economic Shocks.”
Altmeyer, Patrick, Arie van Deursen, et al. 2023. “Explaining Black-Box Models Through Counterfactuals.” In Proceedings of the JuliaCon Conferences, 1:130. 1.
Altmeyer, Patrick, Jacob Daniel Grapendal, Makar Pravosud, and Gand Derry Quintana. 2018. “Option Pricing in the Heston Stochastic Volatility Model: An Empirical Evaluation.”
Anand, Avishek, Lijun Lyu, Maximilian Idahl, Yumeng Wang, Jonas Wallat, and Zijian Zhang. 2022. “Explainable Information Retrieval: A Survey.” arXiv Preprint arXiv:2211.02405.
Arous, Ines, Ljiljana Dolamic, Jie Yang, Akansha Bhardwaj, Giuseppe Cuccu, and Philippe Cudré-Mauroux. 2021. “Marta: Leveraging Human Rationales for Explainable Text Classification.” In Proceedings of the AAAI Conference on Artificial Intelligence, 35:5868–76. 7.
Daxberger, Erik, Agustinus Kristiadi, Alexander Immer, Runa Eschenhagen, Matthias Bauer, and Philipp Hennig. 2021. “Laplace Redux-Effortless Bayesian Deep Learning.” Advances in Neural Information Processing Systems 34.
Goodfellow, Ian J, Jonathon Shlens, and Christian Szegedy. 2014. “Explaining and Harnessing Adversarial Examples.” https://arxiv.org/abs/1412.6572.
Grathwohl, Will, Kuan-Chieh Wang, Joern-Henrik Jacobsen, David Duvenaud, Mohammad Norouzi, and Kevin Swersky. 2020. “Your Classifier Is Secretly an Energy Based Model and You Should Treat It Like One.” In International Conference on Learning Representations.
Immer, Alexander, Maciej Korzepa, and Matthias Bauer. 2020. “Improving Predictions of Bayesian Neural Networks via Local Linearization.” https://arxiv.org/abs/2008.08400.
Kaggle. 2011. “Give Me Some Credit, Improve on the State of the Art in Credit Scoring by Predicting the Probability That Somebody Will Experience Financial Distress in the Next Two Years.” https://www.kaggle.com/c/GiveMeSomeCredit; Kaggle. https://www.kaggle.com/c/GiveMeSomeCredit.
LeCun, Yann, Léon Bottou, Yoshua Bengio, and Patrick Haffner. 1998. “Gradient-Based Learning Applied to Document Recognition.” Proceedings of the IEEE 86 (11): 2278–2324.
Leonhardt, Jurek, Koustav Rudra, and Avishek Anand. 2023. “Extractive Explanations for Interpretable Text Ranking.” ACM Transactions on Information Systems 41 (4): 1–31.
Porsius Martins, Célio. 2023. “Private Cycle Detection in Financial Transactions.”
Schut, Lisa, Oscar Key, Rory Mc Grath, Luca Costabello, Bogdan Sacaleanu, Yarin Gal, et al. 2021. “Generating Interpretable Counterfactual Explanations By Implicit Minimisation of Epistemic and Aleatoric Uncertainties.” In International Conference on Artificial Intelligence and Statistics, 1756–64. PMLR.
Spooner, Thomas, Danial Dervovic, Jason Long, Jon Shepard, Jiahao Chen, and Daniele Magazzeni. 2021. “Counterfactual Explanations for Arbitrary Regression Models.” https://arxiv.org/abs/2106.15212.
Wachter, Sandra, Brent Mittelstadt, and Chris Russell. 2017. “Counterfactual Explanations Without Opening the Black Box: Automated Decisions and the GDPR.” Harv. JL & Tech. 31: 841. https://doi.org/10.2139/ssrn.3063289.
Werf, Daan van der. 2021. “One Step Ahead: A Weakly-Supervised Approach to Training Robust Machine Learning Models for Transaction Monitoring.”
Yang, Jie, Thomas Drake, Andreas Damianou, and Yoelle Maarek. 2018. “Leveraging Crowdsourcing Data for Deep Active Learning an Application: Learning Intents in Alexa.” In Proceedings of the 2018 World Wide Web Conference, 23–32.