References
Keywords
Artificial Intelligence, Trustworthy AI, Counterfactual Explanations, Algorithmic Recourse
Abbasnejad, Ehsan, Damien Teney, Amin Parvaneh, Javen Shi, and Anton van
den Hengel. 2020. “Counterfactual Vision and
Language Learning.” In 2020 IEEE/CVF Conference on
Computer Vision and Pattern Recognition (CVPR), 10041–51. https://doi.org/10.1109/CVPR42600.2020.01006.
Ackerman, Samuel, Parijat Dube, Eitan Farchi, Orna Raz, and Marcel
Zalmanovici. 2021. “Machine Learning Model Drift Detection
Via Weak Data Slices.” In 2021
IEEE/ACM Third International Workshop on
Deep Learning for Testing and
Testing for Deep Learning
(DeepTest), 1–8. IEEE. https://doi.org/10.1109/deeptest52559.2021.00007.
Agustı́, Marc, Ignacio Vidal-Quadras Costa, and Patrick Altmeyer. 2023.
“Deep Vector Autoregression for Macroeconomic Data.”
IFC Bulletins Chapters 59. https://www.bis.org/ifc/publ/ifcb59_39.pdf.
Alain, Guillaume, and Yoshua Bengio. 2016. “Understanding intermediate layers using linear classifier
probes.” ArXiv. https://api.semanticscholar.org/CorpusID:9794990.
Alperin, Juan P, Carol Muñoz Nieves, Lesley A Schimanski, Gustavo E
Fischman, Meredith T Niles, and Erin C McKiernan. 2019. “How significant are the public dimensions of faculty work
in review, promotion and tenure documents?” ELife
8: e42254.
Altman, Sam. 2025. “Reflections.” January 6, 2025. https://blog.samaltman.com/reflections.
Altmeyer, Patrick, Giovan Angela, Aleksander Buszydlik, Karol Dobiczek,
Arie van Deursen, and Cynthia C. S. Liem. 2023. “Endogenous
Macrodynamics in Algorithmic Recourse.” IEEE. https://doi.org/10.1109/satml54575.2023.00036.
Altmeyer, Patrick, Leva Boneva, Rafael Kinston, Shreyosi Saha, and
Evarist Stoja. 2023. “Yield Curve Sensitivity to Investor
Positioning Around Economic Shocks.” Bank of England working
papers 1029. Bank of England. https://doi.org/None.
Altmeyer, Patrick, Aleksander Buszydlik, Arie van Deursen, and Cynthia
C. S. Liem. 2026. “Counterfactual Training: Teaching Models
Plausible and Actionable Explanations.” https://arxiv.org/abs/2601.16205.
Altmeyer, Patrick, Andrew M Demetriou, Antony Bartlett, and Cynthia C.
S. Liem. 2024. “Position: Stop Making Unscientific AGI Performance
Claims.” In International Conference on Machine
Learning, 1222–42. PMLR. https://proceedings.mlr.press/v235/altmeyer24a.html.
Altmeyer, Patrick, Arie van Deursen, and Cynthia C. S. Liem. 2023b.
“Explaining Black-Box Models through
Counterfactuals.” In Proceedings of the JuliaCon
Conferences, 1:130. https://doi.org/10.21105/jcon.00130.
———. 2023a. “Explaining Black-Box Models
through Counterfactuals.” In Proceedings of the
JuliaCon Conferences, 1:130. https://doi.org/10.21105/jcon.00130.
Altmeyer, Patrick, Mojtaba Farmanbar, Arie van Deursen, and Cynthia C.
S. Liem. 2024b. “Faithful Model Explanations
through Energy-Constrained Conformal Counterfactuals.” In
Proceedings of the Thirty-Eighth AAAI Conference on Artificial
Intelligence, 38:10829–37. 10. https://doi.org/10.1609/aaai.v38i10.28956.
———. 2024a. “Faithful Model Explanations
through Energy-Constrained Conformal Counterfactuals.” In
Proceedings of the Thirty-Eighth AAAI Conference on Artificial
Intelligence, 38:10829–37. 10. https://doi.org/10.1609/aaai.v38i10.28956.
Altmeyer, Patrick, Pedro Gurrola-Perez, Rafael Kinston, and Jessica
Redmond. 2019. “Modelling the Demand for Central Bank
Reserves.” https://www.ecb.europa.eu/press/conferences/html/20191111_ecb_money_market_workshop_conference.en.html.
Angelopoulos, Anastasios N., and Stephen Bates. 2022. “A Gentle
Introduction to Conformal Prediction and Distribution-Free Uncertainty
Quantification.” https://arxiv.org/abs/2107.07511.
Antorán, Javier, Umang Bhatt, Tameem Adel, Adrian Weller, and José
Miguel Hernández-Lobato. 2020. “Getting a Clue: A
Method for Explaining Uncertainty Estimates.” https://arxiv.org/abs/2006.06848.
Arcones, Miguel A, and Evarist Gine. 1992. “On the Bootstrap of
U and V Statistics.” The Annals of
Statistics, 655–74.
Arrieta, Alejandro Barredo, Natalia Diaz-Rodriguez, Javier Del Ser,
Adrien Bennetot, Siham Tabik, Alberto Barbado, Salvador Garcia, et al.
2020. “Explainable Artificial Intelligence
(XAI): Concepts, Taxonomies, Opportunities and
Challenges Toward Responsible AI.” Information
Fusion 58: 82–115. https://doi.org/10.1016/j.inffus.2019.12.012.
Artelt, André, Valerie Vaquet, Riza Velioglu, Fabian Hinder, Johannes
Brinkrolf, Malte Schilling, and Barbara Hammer. 2021. “Evaluating
Robustness of Counterfactual Explanations.” In 2021 IEEE
Symposium Series on Computational Intelligence (SSCI), 01–09. IEEE.
Augustin, Maximilian, Alexander Meinke, and Matthias Hein. 2020.
“Adversarial Robustness on In- and
Out-Distribution Improves Explainability.” In Computer
Vision – ECCV 2020, edited by Andrea Vedaldi, Horst Bischof, Thomas
Brox, and Jan-Michael Frahm, 228–45. Cham: Springer.
Balashankar, Ananth, Xuezhi Wang, Yao Qin, Ben Packer, Nithum Thain, Ed
Chi, Jilin Chen, and Alex Beutel. 2023. “Improving Classifier Robustness through Active Generative
Counterfactual Data Augmentation.” In Findings of the
Association for Computational Linguistics: EMNLP 2023, 127–39. ACL.
https://doi.org/10.18653/v1/2023.findings-emnlp.10.
Barocas, Solon, Moritz Hardt, and Arvind Narayanan. 2022.
“Fairness and Machine Learning.” December 2022. https://fairmlbook.org/index.html.
Barry Becker, Ronny Kohavi. 1996. “Adult.” UCI Machine
Learning Repository. https://doi.org/10.24432/C5XW20.
Becker, Barry, and Ronny Kohavi. 1996.
“Adult.” UCI Machine Learning Repository.
Belinkov, Yonatan. 2021. “Probing Classifiers: Promises,
Shortcomings, and Advances.” https://arxiv.org/abs/2102.12452.
Bell, Andrew, Joao Fonseca, Carlo Abrate, Francesco Bonchi, and Julia
Stoyanovich. 2024. “Fairness in Algorithmic
Recourse Through the Lens of Substantive Equality of
Opportunity.” https://arxiv.org/abs/2401.16088.
Bender, Emily M, Timnit Gebru, Angelina McMillan-Major, and Shmargaret
Shmitchell. 2021. “On the Dangers of Stochastic Parrots: Can
Language Models Be Too Big?” In Proceedings of the 2021 ACM
Conference on Fairness, Accountability, and Transparency, 610–23.
Berardi, Andrea, and Alberto Plazzi. 2022. “Dissecting the yield curve: The international
evidence.” Journal of Banking & Finance 134:
106286.
Bereska, Leonard, and Efstratios Gavves. 2024. “Mechanistic
Interpretability for AI Safety – a Review.” https://arxiv.org/abs/2404.14082.
Berlinet, Alain, and Christine Thomas-Agnan. 2011. Reproducing
Kernel Hilbert Spaces in Probability and Statistics.
Springer Science & Business Media. https://doi.org/10.1007/978-1-4419-9096-9.
Bezanson, Jeff, Alan Edelman, Stefan Karpinski, and Viral B Shah. 2017.
“Julia: A Fresh Approach to Numerical Computing.” SIAM
Review 59 (1): 65–98. https://doi.org/10.1137/141000671.
Birhane, Abeba, Pratyusha Kalluri, Dallas Card, William Agnew, Ravit
Dotan, and Michelle Bao. 2022. “The Values
Encoded in Machine Learning Research.” In Proceedings
of the 2022 ACM Conference on Fairness, Accountability, and Transparency
(FAccT ’22).
Bischl, Bernd, Martin Binder, Michel Lang, Tobias Pielok, Jakob Richter,
Stefan Coors, Janek Thomas, et al. 2023. “Hyperparameter optimization: Foundations, algorithms,
best practices, and open challenges.” WIREs Data
Mining and Knowledge Discovery 13 (2): e1484. https://doi.org/https://doi.org/10.1002/widm.1484.
Blaom, Anthony D., Franz Kiraly, Thibaut Lienart, Yiannis Simillides,
Diego Arenas, and Sebastian J. Vollmer. 2020. “MLJ:
A Julia Package for Composable Machine Learning.”
Journal of Open Source Software 5 (55): 2704. https://doi.org/10.21105/joss.02704.
Blili-Hamelin, Borhane, Christopher Graziul, Leif Hancox-Li, Hananel
Hazan, El-Mahdi El-Mhamdi, Avijit Ghosh, Katherine A Heller, et al.
2025. “Position: Stop treating ’AGI’ as the
north-star goal of AI research.” In Forty-Second
International Conference on Machine Learning Position Paper Track.
Borch, Christian. 2022. “Machine Learning, Knowledge Risk, and
Principal-Agent Problems in Automated Trading.” Technology in
Society, 101852. https://doi.org/10.1016/j.techsoc.2021.101852.
Borisov, Vadim, Tobias Leemann, Kathrin Seßler, Johannes Haug, Martin
Pawelczyk, and Gjergji Kasneci. 2022. “Deep Neural Networks and
Tabular Data: A Survey.” IEEE Transactions on Neural Networks
and Learning Systems.
Brunnermeier, Markus K. 2016. “Bubbles.” In Banking
Crises: Perspectives from the New Palgrave Dictionary, 28–36.
Springer.
Buolamwini, Joy, and Timnit Gebru. 2018. “Gender Shades:
Intersectional Accuracy Disparities in Commercial Gender
Classification.” In Conference on Fairness, Accountability
and Transparency, 77–91. PMLR.
Buszydlik, Aleksander, Patrick Altmeyer, Cynthia C. S. Liem, and Roel
Dobbe. 2024. “Grounding and Validation of Algorithmic Recourse in
Real-World Contexts: A Systematized Literature Review.” https://openreview.net/pdf?id=oEmyoy5H5P.
———. 2025. “Understanding the Affordances and Constraints of
Explainable AI in Safety-Critical Contexts: A Case Study in Dutch Social
Welfare.” In Electronic Government. EGOV 2025. Lecture Notes
in Computer Science. upcoming.
Carlisle, M. 2019. “Racist Data Destruction? - a
Boston Housing Dataset Controversy.” https://medium.com/@docintangible/racist-data-destruction-113e3eff54a8.
Carrizosa, Emilio, Jasone Ramırez-Ayerbe, and Dolores Romero. 2021.
“Generating Collective Counterfactual Explanations in
Score-Based Classification via Mathematical
Optimization.”
Caterino, Pasquale. 2024. “Google Summer of Code 2024 Final
Report: Add Support for Conformalized Bayes to
ConformalPrediction.jl.” https://gist.github.com/pasq-cat/f25eebc492366fb6a4f428426f93f45f.
Chandola, Varun, Arindam Banerjee, and Vipin Kumar. 2009. “Anomaly
Detection: A Survey.” ACM Computing Surveys
(CSUR) 41 (3): 1–58.
Claesen, Aline, Daniel Lakens, Noah van Dongen, et al. 2022.
“Severity and Crises in Science: Are We
Getting It Right When We’re Right and Wrong When We’re
Wrong?”
Cost, Ben. 2023. “Bing AI chatbot goes on
‘destructive’ rampage: ‘I want to be powerful — and
alive’.” https://nypost.com/2023/02/16/bing-ai-chatbots-destructive-rampage-i-want-to-be-powerful/.
Crump, Richard K, and Nikolay Gospodinov. n.d. “Deconstructing the yield curve.”
Dandl, Susanne, Andreas Hofheinz, Martin Binder, Bernd Bischl, and
Giuseppe Casalicchio. 2023. “Counterfactuals: An
R Package for Counterfactual
Explanation Methods.” arXiv. http://arxiv.org/abs/2304.06569.
Dasgupta, Sanjoy. 2013. “Experiments with
Random Projection.” https://arxiv.org/abs/1301.3849.
Delaney, Eoin, Derek Greene, and Mark T. Keane. 2021. “Uncertainty
Estimation and
Out-of-Distribution Detection for
Counterfactual Explanations:
Pitfalls and Solutions.” arXiv. http://arxiv.org/abs/2107.09734.
(DHPC), Delft High Performance Computing Centre. 2022.
“DelftBlue Supercomputer
(Phase 1).” https://www.tudelft.nl/dhpc/ark:/44463/DelftBluePhase1.
Dhurandhar, Amit, Pin-Yu Chen, Ronny Luss, Chun-Chen Tu, Paishun Ting,
Karthikeyan Shanmugam, and Payel Das. 2018. “Explanations Based on
the Missing: Towards Contrastive Explanations with Pertinent
Negatives.” Advances in Neural Information Processing
Systems 31.
Dombrowski, Ann-Kathrin, Jan E Gerken, and Pan Kessel. 2021.
“Diffeomorphic Explanations with Normalizing Flows.” In
ICML Workshop on Invertible Neural
Networks, Normalizing Flows, and Explicit
Likelihood Models.
Du, Yilun, and Igor Mordatch. 2020. “Implicit
Generation and Generalization in Energy-Based Models.” https://arxiv.org/abs/1903.08689.
Epley, Nicholas, Adam Waytz, and John T Cacioppo. 2007. “On seeing human: a three-factor theory of
anthropomorphism.” Psychological Review 114 (4):
864.
Falk, Ruma, and Clifford Konold. 1997. “Making sense of randomness: Implicit encoding as a basis
for judgment.” Psychological Review 104 (2): 301.
Fan, Fenglei, Jinjun Xiong, and Ge Wang. 2020. “On
Interpretability of Artificial Neural Networks.” https://arxiv.org/abs/2001.02522.
Field, Hayden. 2025. “OpenAI Partners with u.s. National
Laboratories on Scientific Research, Nuclear Weapons Security.”
CNBC. January 30, 2025. https://www.cnbc.com/2025/01/30/openai-partners-with-us-national-laboratories-on-scientific-research.html.
Foster, Kevin R, and Hanna Kokko. 2009. “The
evolution of superstitious and superstition-like
behaviour.” Proceedings of the Royal Society B:
Biological Sciences 276 (1654): 31–37.
Frankle, Jonathan, and Michael Carbin. 2019. “The Lottery
Ticket Hypothesis: Finding Sparse, Trainable Neural
Networks.” In International Conference on Learning
Representations.
Freiesleben, Timo. 2022. “The Intriguing
Relation Between Counterfactual Explanations and Adversarial
Examples.” Minds and Machines 32 (1): 77–109.
Future of Life Institute. 2023a. “Pause Giant AI Experiments: An
Open Letter.”
https://futureoflife.org/open-letter/pause-giant-ai-experiments/.
———. 2023b. “Pause Giant AI Experiments: An Open Letter.”
March 22, 2023. https://futureoflife.org/open-letter/pause-giant-ai-experiments/.
Gama, João, Indrė Žliobaitė, Albert Bifet, Mykola Pechenizkiy, and
Abdelhamid Bouchachia. 2014. “A Survey on Concept Drift
Adaptation.” ACM Computing Surveys (CSUR) 46 (4): 1–37.
Goertzel, Ben. 2014. “Artificial general
intelligence: concept, state of the art, and future
prospects.” Journal of Artificial General
Intelligence 5 (1): 1.
Goodfellow, Ian, Yoshua Bengio, and Aaron Courville. 2016. Deep
Learning. MIT Press.
Goodfellow, Ian, Jonathon Shlens, and Christian Szegedy. 2015.
“Explaining and Harnessing Adversarial
Examples.” https://arxiv.org/abs/1412.6572.
Grathwohl, Will, Kuan-Chieh Wang, Joern-Henrik Jacobsen, David Duvenaud,
Mohammad Norouzi, and Kevin Swersky. 2020. “Your Classifier Is
Secretly an Energy Based Model and You Should Treat It Like One.”
In International Conference on Learning Representations.
Gretton, Arthur, Karsten M Borgwardt, Malte J Rasch, Bernhard Schölkopf,
and Alexander Smola. 2012. “A Kernel Two-Sample Test.”
The Journal of Machine Learning Research 13 (1): 723–73.
Grinsztajn, Léo, Edouard Oyallon, and Gaël Varoquaux. 2022. “Why
Do Tree-Based Models Still Outperform Deep Learning on Tabular
Data?” https://arxiv.org/abs/2207.08815.
Guidotti, Riccardo. 2022. “Counterfactual
Explanations and How to Find Them: Literature Review and
Benchmarking.” Data Mining and Knowledge
Discovery 38 (5): 2770–2824. https://doi.org/10.1007/s10618-022-00831-6.
Guo, Hangzhi, Thanh H. Nguyen, and Amulya Yadav. 2023. “CounterNet: End-to-End Training of Prediction Aware
Counterfactual Explanations.” In Proceedings of the
29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining,
577--589. KDD ’23. New York, NY, USA: Association for Computing
Machinery. https://doi.org/10.1145/3580305.3599290.
Gurnee, Wes, Neel Nanda, Matthew Pauly, Katherine Harvey, Dmitrii
Troitskii, and Dimitris Bertsimas. 2023. “Finding Neurons in a
Haystack: Case Studies with Sparse Probing.” arXiv Preprint
arXiv:2305.01610.
Gurnee, Wes, and Max Tegmark. 2023b. “Language Models Represent Space and Time.”
arXiv Preprint arXiv:2310.02207v2.
———. 2023a. “Language Models Represent Space and Time.”
arXiv Preprint arXiv:2310.02207v1.
Haig, Brian D. 2003. “What is a spurious
correlation?” Understanding Statistics: Statistical
Issues in Psychology, Education, and the Social Sciences 2 (2):
125–32.
Haltmeier, Markus, and Linh Nguyen. 2023. “Regularization of
Inverse Problems by Neural Networks.” In Handbook of
Mathematical Models and Algorithms in Computer Vision and Imaging:
Mathematical Imaging and Vision, 1065–93. Springer.
Hanneke, Steve. 2007. “A Bound on the Label Complexity of Agnostic
Active Learning.” In Proceedings of the 24th International
Conference on Machine Learning, 353–60. https://doi.org/10.1145/1273496.1273541.
Heider, Fritz, and Marianne Simmel. 1944. “An
experimental study of apparent behavior.” The American
Journal of Psychology 57 (2): 243–59.
Hengst, Floris, Ralf Wolter, Patrick Altmeyer, and Arda Kaygan. 2024.
“Conformal Intent Classification and Clarification for Fast and
Accurate Intent Recognition.” In Findings of the Association
for Computational Linguistics: NAACL 2024, 2412–32. https://doi.org/10.18653/v1/2024.findings-naacl.156.
Hoffman, Hans. 1994. “German Credit Data.” https://archive.ics.uci.edu/ml/datasets/statlog+(german+credit+data).
Innes, Michael, Elliot Saba, Keno Fischer, Dhairya Gandhi, Marco
Concetto Rudilosso, Neethu Mariya Joy, Tejan Karmali, Avik Pal, and
Viral Shah. 2018. “Fashionable Modelling with Flux.” https://arxiv.org/abs/1811.01457.
Innes, Mike. 2018. “Flux: Elegant Machine Learning with
Julia.” Journal of Open Source Software 3 (25): 602. https://doi.org/10.21105/joss.00602.
Joshi, Shalmali, Oluwasanmi Koyejo, Warut Vijitbenjaronk, Been Kim, and
Joydeep Ghosh. 2019. “Towards Realistic
Individual Recourse and Actionable Explanations in Black-Box Decision
Making Systems.” https://arxiv.org/abs/1907.09615.
Kaggle. 2011. “Give Me Some Credit, Improve on the
State of the Art in Credit Scoring by Predicting the Probability That
Somebody Will Experience Financial Distress in the Next Two
Years.” https://www.kaggle.com/c/GiveMeSomeCredit;
Kaggle. https://www.kaggle.com/c/GiveMeSomeCredit.
Karimi, Amir-Hossein, Gilles Barthe, Bernhard Schölkopf, and Isabel
Valera. 2021. “A Survey of Algorithmic Recourse: Definitions,
Formulations, Solutions, and Prospects.” https://arxiv.org/abs/2010.04050.
Karimi, Amir-Hossein, Bernhard Schölkopf, and Isabel Valera. 2021.
“Algorithmic Recourse: From Counterfactual Explanations to
Interventions.” In Proceedings of the 2021 ACM Conference on
Fairness, Accountability, and Transparency, 353–62. FAccT ’21. New
York, NY, USA: Association for Computing Machinery. https://doi.org/10.1145/3442188.3445899.
Karimi, Amir-Hossein, Julius Von Kügelgen, Bernhard Schölkopf, and
Isabel Valera. 2020. “Algorithmic Recourse Under Imperfect Causal
Knowledge: A Probabilistic Approach.” https://arxiv.org/abs/2006.06831.
Kaufmann, Maximilian, Yiren Zhao, Ilia Shumailov, Robert Mullins, and
Nicolas Papernot. 2022. “Efficient Adversarial Training with Data
Pruning.” arXiv Preprint arXiv:2207.00694.
Kaur, Harmanpreet, Harsha Nori, Samuel Jenkins, Rich Caruana, Hanna
Wallach, and Jennifer Wortman Vaughan. 2020. “Interpreting
Interpretability: Understanding Data Scientists’ Use of Interpretability
Tools for Machine Learning.” In Proceedings of the 2020
CHI Conference on Human Factors in Computing Systems,
1–14. https://doi.org/10.1145/3313831.3376219.
Kingma, Diederik P., and Jimmy Ba. 2017. “Adam: A Method for Stochastic
Optimization.” https://arxiv.org/abs/1412.6980.
Kıcıman, Emre, Robert Ness, Amit Sharma, and Chenhao Tan. 2023.
“Causal Reasoning and Large Language Models: Opening a New
Frontier for Causality.” arXiv Preprint
arXiv:2305.00050.
Kloft, Agnes Mercedes, Robin Welsch, Thomas Kosch, and Steeven Villa.
2024. “" AI Enhances Our Performance, i Have No Doubt This One
Will Do the Same": The Placebo Effect Is Robust to Negative Descriptions
of AI.” In Proceedings of the CHI Conference on Human Factors
in Computing Systems, 1–24.
Kolter, Zico. 2023.“Keynote Addresses: SaTML 2023
.” In 2023 IEEE Conference on Secure and Trustworthy
Machine Learning (SaTML). Los Alamitos, CA, USA: IEEE Computer
Society. https://doi.org/10.1109/SaTML54575.2023.00009.
Krizhevsky, A. 2009. “Learning Multiple
Layers of Features from Tiny
Images.” In. https://www.semanticscholar.org/paper/Learning-Multiple-Layers-of-Features-from-Tiny-Krizhevsky/5d90f06bb70a0a3dced62413346235c02b1aa086.
Kumar, Sumit. 2022. “Effective hedging
strategy for us treasury bond portfolio using principal component
analysis.” Academy of Accounting and Financial
Studies 26 (1).
Kurakin, Alexey, Ian Goodfellow, and Samy Bengio. 2017.
“Adversarial Machine Learning at Scale.” https://arxiv.org/abs/1611.01236.
Ladouceur, Robert, Claude Paquet, and Dominique Dubé. 1996. “Erroneous Perceptions in Generating Sequences of Random
Events 1.” Journal of Applied Social Psychology
26 (24): 2157–66.
Lakshminarayanan, Balaji, Alexander Pritzel, and Charles Blundell. 2017.
“Simple and Scalable Predictive Uncertainty Estimation Using Deep
Ensembles.” In Proceedings of the 31st International
Conference on Neural Information Processing Systems, 6405–16.
NIPS’17. Red Hook, NY, USA: Curran Associates Inc.
Laugel, Thibault, Marie-Jeanne Lesot, Christophe Marsala, Xavier Renard,
and Marcin Detyniecki. 2017. “Inverse Classification
for Comparison-Based Interpretability in
Machine Learning.” arXiv. https://doi.org/10.48550/arXiv.1712.08443.
LeCun, Yann. 1998. “The MNIST database of
handwritten digits.” http://yann.lecun.com/exdb/mnist/.
LeCun, Yann, Léon Bottou, Yoshua Bengio, and Patrick Haffner. 1998.
“Gradient-Based Learning Applied to Document Recognition.”
Proceedings of the IEEE 86 (11): 2278–2324.
Leofante, Francesco, and Nico Potyka. 2024. “Promoting
Counterfactual Robustness Through Diversity.” Proceedings of
the AAAI Conference on Artificial Intelligence 38 (19): 21322–30.
https://doi.org/10.1609/aaai.v38i19.30127.
Li, Kenneth, Aspen K Hopkins, David Bau, Fernanda Viégas, Hanspeter
Pfister, and Martin Wattenberg. 2022. “Emergent World
Representations: Exploring a Sequence Model Trained on a Synthetic
Task.” arXiv Preprint arXiv:2210.13382.
Liem, Cynthia C. S., and Andrew M Demetriou. 2023. “Treat
Societally Impactful Scientific Insights as Open-Source Software
Artifacts.” In 2023 IEEE/ACM 45th International Conference on
Software Engineering: Software Engineering in Society (ICSE-SEIS),
150–56. IEEE.
Lippe, Phillip. 2024. “UvA Deep Learning
Tutorials.” https://uvadlc-notebooks.readthedocs.io/en/latest/.
Liu, Yinhan, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi
Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov.
2019. “RoBERTa: A Robustly Optimized BERT Pretraining
Approach.” https://arxiv.org/abs/1907.11692.
Luiz Franco, Jorge. 2024. “JSoC: When Causality Meets
Recourse.” https://www.taija.org/blog/posts/causal-recourse/.
Lundberg, Scott M, and Su-In Lee. 2017. “A Unified Approach to
Interpreting Model Predictions.” In Proceedings of the 31st
International Conference on Neural Information Processing Systems,
4768–77.
Luu, Hoai Linh, and Naoya Inoue. 2023. “Counterfactual Adversarial Training for Improving
Robustness of Pre-trained Language Models.” In
Proceedings of the 37th Pacific Asia Conference on Language,
Information and Computation, 881–88. ACL. https://aclanthology.org/2023.paclic-1.88/.
Madry, Aleksander, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras,
and Adrian Vladu. 2017. “Towards Deep Learning Models Resistant to
Adversarial Attacks.” arXiv Preprint arXiv:1706.06083.
Mahajan, Divyat, Chenhao Tan, and Amit Sharma. 2020. “Preserving
Causal Constraints in Counterfactual Explanations for Machine Learning
Classifiers.” https://arxiv.org/abs/1912.03277.
Manokhin, Valery. 2022. “Awesome Conformal Prediction.”
Zenodo. https://doi.org/10.5281/zenodo.6467205.
Marcus, Gary. 2023. “Muddles about
Models.” https://garymarcus.substack.com/p/muddles-about-models.
Maslej, Nestor, Loredana Fattorini, Erik Brynjolfsson, John Etchemendy,
Katrina Ligett, Terah Lyons, James Manyika, et al. 2023.
“Artificial Intelligence Index Report 2023.”
Institute for Human-Centered AI.
McAfee, Andrew, Erik Brynjolfsson, Thomas H Davenport, DJ Patil, and
Dominic Barton. 2012. “Big Data: The Management
Revolution.” Harvard Business Review 90 (10): 60–68.
Merton, Robert K et al. 1942. “Science and
technology in a democratic order.” Journal of Legal
and Political Sociology 1 (1): 115–26.
Michal S Gal, Daniel L Rubinfeld. 2019. “Data
Standardization.” NYUL Rev.
Miller, John, Smitha Milli, and Moritz Hardt. 2020. “Strategic
Classification Is Causal Modeling in
Disguise.” In Proceedings of the 37th
International Conference on Machine
Learning, 6917–26. PMLR. https://proceedings.mlr.press/v119/miller20b.html.
Miller, Tim. 2019. “Explanation in Artificial Intelligence:
Insights from the Social Sciences.” Artificial
Intelligence 267: 1–38. https://doi.org/10.1016/j.artint.2018.07.007.
Mishkin, Frederic S et al. 2008. “How Should We Respond to Asset
Price Bubbles.” Financial Stability Review 12 (1):
65–74.
Molnar, Christoph. 2022. Interpretable Machine Learning: A Guide for
Making Black Box Models Explainable. 2nd ed. https://christophm.github.io/interpretable-ml-book.
Morrone, Megan. 2023. “Replika exec: AI
friends can improve human relationships.” https://www.axios.com/2023/11/09/replika-blush-rita-popova-ai-relationships-dating.
Mothilal, Ramaravind K, Amit Sharma, and Chenhao Tan. 2020.
“Explaining Machine Learning Classifiers Through Diverse
Counterfactual Explanations.” In Proceedings of the 2020
Conference on Fairness,
Accountability, and Transparency, 607–17.
https://doi.org/10.1145/3351095.3372850.
Müller, Petra, and Matthias Hartmann. 2023. “Linking paranormal and conspiracy beliefs to illusory
pattern perception through signal detection theory.”
Scientific Reports 13 (1): 9739.
Murphy, Kevin P. 2022. Probabilistic Machine Learning:
An Introduction. MIT Press.
———. 2023. Probabilistic Machine Learning: Advanced Topics. MIT
press.
Nanda, Neel, Andrew Lee, and Martin Wattenberg. 2023. “Emergent
Linear Representations in World Models of Self-Supervised Sequence
Models.” arXiv Preprint arXiv:2309.00941.
Nelson, Kevin, George Corbin, Mark Anania, Matthew Kovacs, Jeremy
Tobias, and Misty Blowers. 2015. “Evaluating Model Drift in
Machine Learning Algorithms.” In 2015 IEEE
Symposium on Computational Intelligence for
Security and Defense Applications
(CISDA), 1–8. IEEE. https://doi.org/10.1109/cisda.2015.7208643.
Neri, Hugo, and Fabio Cozman. 2020. “The role
of experts in the public perception of risk of artificial
intelligence.” AI & SOCIETY 35: 663–73.
Nickerson, Raymond S. 1998. “Confirmation
bias: A ubiquitous phenomenon in many guises.” Review
of General Psychology 2 (2): 175–220.
O’Neil, Cathy. 2016. Weapons of Math Destruction: How
Big Data Increases Inequality and Threatens Democracy.
Crown.
Oliveira, Raphael Mazzine Barbosa de, and David Martens. 2021. “A
Framework and Benchmarking Study for Counterfactual Generating Methods
on Tabular Data.” Applied Sciences 11 (16): 7274. https://doi.org/10.3390/app11167274.
OM, ANDERS BJ ORKSTR. 2001. “Ridge Regression and Inverse
Problems.” Stockholm University, Department of
Mathematics.
OpenAI. 2025. “Strengthening America’s AI Leadership with the u.s.
National Laboratories.” Edited by OpenAI. January 30, 2025. https://openai.com/index/strengthening-americas-ai-leadership-with-the-us-national-laboratories/.
Pace, R Kelley, and Ronald Barry. 1997. “Sparse Spatial
Autoregressions.” Statistics & Probability Letters
33 (3): 291–97. https://doi.org/10.1016/s0167-7152(96)00140-x.
Pawelczyk, Martin, Chirag Agarwal, Shalmali Joshi, Sohini Upadhyay, and
Himabindu Lakkaraju. 2022. “Exploring Counterfactual Explanations
Through the Lens of Adversarial Examples: A Theoretical and Empirical
Analysis.” In Proceedings of the 25th International
Conference on Artificial Intelligence and Statistics, edited by
Gustau Camps-Valls, Francisco J. R. Ruiz, and Isabel Valera,
151:4574–94. Proceedings of Machine Learning Research. PMLR. https://proceedings.mlr.press/v151/pawelczyk22a.html.
Pawelczyk, Martin, Sascha Bielawski, Johannes van den Heuvel, Tobias
Richter, and Gjergji Kasneci. 2021. “CARLA: A Python Library to
Benchmark Algorithmic Recourse and Counterfactual Explanation
Algorithms.” https://arxiv.org/abs/2108.00783.
Pawelczyk, Martin, Teresa Datta, Johannes van-den-Heuvel, Gjergji
Kasneci, and Himabindu Lakkaraju. 2023. “Probabilistically Robust
Recourse: Navigating the Trade-Offs Between Costs and Robustness in
Algorithmic Recourse.” https://arxiv.org/abs/2203.06768.
Pindyck, Robert S, and Daniel L Rubinfeld. 2014.
Microeconomics. Pearson Education.
Poyiadzi, Rafael, Kacper Sokol, Raul Santos-Rodriguez, Tijl De Bie, and
Peter Flach. 2020. “FACE: Feasible and
Actionable Counterfactual Explanations.” In Proceedings of
the AAAI/ACM Conference on AI,
Ethics, and Society, 344–50.
Prado-Romero, Mario Alfonso, and Giovanni Stilo. 2022. “GRETEL:
Graph Counterfactual Explanation Evaluation Framework.” In
Proceedings of the 31st ACM International Conference on Information
and Knowledge Management. CIKM ’22. New York, NY, USA: Association
for Computing Machinery. https://doi.org/10.1145/3511808.3557608.
Prince, Simon J. D. 2023. Understanding Deep Learning. The MIT
Press. http://udlbook.com.
Rabanser, Stephan, Stephan Günnemann, and Zachary Lipton. 2019.
“Failing Loudly: An Empirical Study of Methods for
Detecting Dataset Shift.” Advances in Neural Information
Processing Systems 32.
Ribeiro, Marco Tulio, Sameer Singh, and Carlos Guestrin. 2016.
“"Why Should i Trust You?" Explaining
the Predictions of Any Classifier.” In Proceedings of the
22nd ACM SIGKDD International Conference on Knowledge
Discovery and Data Mining, 1135–44.
Rooij, Iris van, Olivia Guest, Federico G Adolfi, Ronald de Haan,
Antonina Kolokolova, and Patricia Rich. 2023. “Reclaiming AI as a theoretical tool for cognitive
science.” psyarXiv. https://osf.io/4cbuv.
Ross, Alexis, Himabindu Lakkaraju, and Osbert Bastani. 2024.
“Learning Models for Actionable
Recourse.” In Proceedings of the 35th International
Conference on Neural Information Processing Systems. NIPS ’21. Red
Hook, NY, USA: Curran Associates Inc.
Rudin, Cynthia. 2019. “Stop Explaining Black Box Machine Learning
Models for High Stakes Decisions and Use Interpretable Models
Instead.” Nature Machine Intelligence 1 (5): 206–15. https://doi.org/10.1038/s42256-019-0048-x.
Saffari, Ehsan, Seyed Ramezan Hosseini, Alireza Taheri, and Ali
Meghdari. 2021. “‘Does cinema form the
future of robotics?’: a survey on fictional robots in sci-fi
movies.” SN Applied Sciences 3 (6): 655.
Sauer, Axel, and Andreas Geiger. 2021. “Counterfactual
Generative Networks.” https://arxiv.org/abs/2101.06046.
Schaeffer, Rylan, Brando Miranda, and Sanmi Koyejo. 2024. “Are
Emergent Abilities of Large Language Models a Mirage?”
Advances in Neural Information Processing Systems 36.
Schut, Lisa, Oscar Key, Rory McGrath, Luca Costabello, Bogdan Sacaleanu,
Yarin Gal, et al. 2021. “Generating Interpretable
Counterfactual Explanations By Implicit Minimisation of
Epistemic and Aleatoric Uncertainties.”
In International Conference on Artificial
Intelligence and Statistics, 1756–64.
PMLR.
Shah, Agam, Suvan Paturi, and Sudheer Chava. 2023. “Trillion
Dollar Words: A New Financial Dataset, Task & Market
Analysis.” arXiv Preprint arXiv:2310.02207v1. https://arxiv.org/abs/2305.07972.
Shanahan, Murray. 2024. “Talking about Large Language
Models.” Communications of the ACM 67 (2): 68–79.
Sharma, Shubham, Jette Henderson, and Joydeep Ghosh. 2020. “CERTIFAI: A Common Framework to Provide Explanations and
Analyse the Fairness and Robustness of Black-box Models.”
In Proceedings of the AAAI/ACM Conference on AI, Ethics, and
Society, 166–72. AIES ’20. New York, NY, USA: Association for
Computing Machinery. https://doi.org/10.1145/3375627.3375812.
Slack, Dylan, Anna Hilgard, Himabindu Lakkaraju, and Sameer Singh. 2021.
“Counterfactual Explanations Can Be Manipulated.”
Advances in Neural Information Processing Systems 34.
Slack, Dylan, Sophie Hilgard, Emily Jia, Sameer Singh, and Himabindu
Lakkaraju. 2020. “Fooling Lime and Shap: Adversarial
Attacks on Post Hoc Explanation Methods.” In Proceedings of
the AAAI/ACM Conference on AI,
Ethics, and Society, 180–86.
Spooner, Thomas, Danial Dervovic, Jason Long, Jon Shepard, Jiahao Chen,
and Daniele Magazzeni. 2021. “Counterfactual
Explanations for Arbitrary Regression Models.” https://arxiv.org/abs/2106.15212.
Steinberg, Brooke. 2023. “I fell in love with
an AI chatbot — she rejected me sexually.” https://nypost.com/2023/04/03/40-year-old-man-falls-in-love-with-ai-chatbot-phaedra/.
Stutz, David, Krishnamurthy, Dvijotham, Ali Taylan Cemgil, and Arnaud
Doucet. 2022. “Learning Optimal Conformal Classifiers.” https://arxiv.org/abs/2110.09192.
Sundararajan, Mukund, Ankur Taly, and Qiqi Yan. 2017. “Axiomatic
Attribution for Deep Networks.” https://arxiv.org/abs/1703.01365.
Szegedy, Christian, Wojciech Zaremba, Ilya Sutskever, Joan Bruna,
Dumitru Erhan, Ian Goodfellow, and Rob Fergus. 2014. “Intriguing
Properties of Neural Networks.” https://arxiv.org/abs/1312.6199.
Tank, Aytekin. 2017. “This Is the Year of the Machine Learning
Revolution.” Edited by Entrepreneur Magazine. January 12, 2017.
https://www.entrepreneur.com/leadership/this-is-the-year-of-the-machine-learning-revolution/287324.
Teh, Yee Whye, Max Welling, Simon Osindero, and Geoffrey E. Hinton.
2003. “Energy-Based Models for Sparse Overcomplete
Representations.” J. Mach. Learn. Res. 4 (null):
1235–60.
Teney, Damien, Ehsan Abbasnedjad, and Anton van den Hengel. 2020.
“Learning What Makes a Difference from Counterfactual Examples and
Gradient Supervision.” In Computer Vision - ECCV 2020,
580–99. Berlin, Heidelberg: Springer-Verlag. https://doi.org/10.1007/978-3-030-58607-2_34.
Tolomei, Gabriele, Fabrizio Silvestri, Andrew Haines, and Mounia Lalmas.
2017. “Interpretable Predictions of
Tree-Based Ensembles via
Actionable Feature
Tweaking.” In Proceedings of the 23rd
ACM SIGKDD International
Conference on Knowledge Discovery
and Data Mining, 465–74. https://doi.org/10.1145/3097983.3098039.
Touvron, Hugo, Thibaut Lavril, Gautier Izacard, Xavier Martinet,
Marie-Anne Lachaux, Timothée Lacroix, Baptiste Rozière, et al. 2023.
“LLaMA: Open and Efficient Foundation
Language Models.” https://arxiv.org/abs/2302.13971.
Trinh, T. H., Wu, Y., Le, and Q. V. et al. 2024. “Solving olympiad geometry without human
demonstrations.” Nature 625, 476–82.
https://doi.org/https://doi.org/10.1038/s41586-023-06747-5.
Upadhyay, Sohini, Shalmali Joshi, and Himabindu Lakkaraju. 2021.
“Towards Robust and Reliable Algorithmic Recourse.”
Advances in Neural Information Processing Systems 34: 16926–37.
Ustun, Berk, Alexander Spangher, and Yang Liu. 2019. “Actionable
Recourse in Linear Classification.” In Proceedings of the
Conference on Fairness,
Accountability, and Transparency, 10–19.
https://doi.org/10.1145/3287560.3287566.
Van Prooijen, Jan-Willem, Karen M Douglas, and Clara De Inocencio. 2018.
“Connecting the dots: Illusory pattern
perception predicts belief in conspiracies and the
supernatural.” European Journal of Social
Psychology 48 (3): 320–35.
Vardi, Moshe Y. 2018. “Vardi’s Insights: Move Fast and Break
Things.” 2018. https://doi.org/10.1145/3244026.
Varshney, Kush R. 2022. Trustworthy Machine
Learning. Chappaqua, NY, USA:
Independently Published.
Venkatasubramanian, Suresh, and Mark Alfano. 2020. “The
Philosophical Basis of Algorithmic Recourse.” In Proceedings
of the 2020 Conference on Fairness, Accountability, and
Transparency, 284–93. FAT* ’20. New York, NY, USA: Association for
Computing Machinery. https://doi.org/10.1145/3351095.3372876.
Verma, Sahil, Varich Boonsanong, Minh Hoang, Keegan E. Hines, John P.
Dickerson, and Chirag Shah. 2022. “Counterfactual Explanations and
Algorithmic Recourses for Machine Learning: A Review.” https://arxiv.org/abs/2010.10596.
Wachter, Sandra, Brent Mittelstadt, and Chris Russell. 2017.
“Counterfactual Explanations Without Opening the Black Box:
Automated Decisions and the GDPR.”
Harv. JL & Tech. 31: 841. https://doi.org/10.2139/ssrn.3063289.
Walker, Alexander C, Martin Harry Turpin, Jennifer A Stolz, Jonathan A
Fugelsang, and Derek J Koehler. 2019. “Finding meaning in the clouds: Illusory pattern
perception predicts receptivity to pseudo-profound
bullshit.” Judgment and Decision Making 14 (2):
109–19.
Wang, Zhou, Eero P Simoncelli, and Alan C Bovik. 2003. “Multiscale
Structural Similarity for Image Quality Assessment.” In The
Thrity-Seventh Asilomar Conference on Signals, Systems & Computers,
2003, 2:1398–1402. Ieee.
Weissburg, Iain Xie, Mehir Arora, Liangming Pan, and William Yang Wang.
2024. “Tweets to Citations: Unveiling the
Impact of Social Media Influencers on AI Research
Visibility.” arXiv Preprint arXiv:2401.13782.
Welling, Max, and Yee W Teh. 2011. “Bayesian Learning via
Stochastic Gradient Langevin Dynamics.” In Proceedings of the
28th International Conference on Machine Learning (ICML-11),
681–88. Citeseer.
Widmer, Gerhard, and Miroslav Kubat. 1996. “Learning in the
Presence of Concept Drift and Hidden Contexts.” Machine
Learning 23 (1): 69–101. https://doi.org/10.1007/bf00116900.
Wilkinson, Mark D, Michel Dumontier, IJsbrand Jan Aalbersberg, Gabrielle
Appleton, Myles Axton, Arie Baak, Niklas Blomberg, et al. 2016.
“The FAIR Guiding Principles for Scientific Data Management and
Stewardship.” Scientific Data 3 (1): 1–9.
Wilson, Andrew Gordon. 2020. “The Case for
Bayesian Deep Learning.” https://arxiv.org/abs/2001.10995.
Wu, Tongshuang, Marco Tulio Ribeiro, Jeffrey Heer, and Daniel Weld.
2021. “Polyjuice: Generating Counterfactuals for Explaining,
Evaluating, and Improving Models.” In Proceedings of the 59th
Annual Meeting of the Association for Computational Linguistics and the
11th International Joint Conference on Natural Language Processing
(Volume 1: Long Papers), edited by Chengqing Zong, Fei Xia, Wenjie
Li, and Roberto Navigli, 6707–23. Online: ACL. https://doi.org/10.18653/v1/2021.acl-long.523.
Xiao, Han, Kashif Rasul, and Roland Vollgraf. 2017.
“Fashion-MNIST: A Novel Image Dataset for Benchmarking Machine
Learning Algorithms.” https://arxiv.org/abs/1708.07747.
Yeh, I-Cheng. 2016. “Default of Credit Card
Clients.” UCI Machine Learning Repository.
Yeh, I-Cheng, and Che-hui Lien. 2009. “The Comparisons of Data
Mining Techniques for the Predictive Accuracy of Probability of Default
of Credit Card Clients.” Expert Systems with
Applications 36 (2): 2473–80. https://doi.org/10.1016/j.eswa.2007.12.020.
Zečević, Matej, Moritz Willig, Devendra Singh Dhami, and Kristian
Kersting. 2023. “Causal Parrots: Large Language Models May Talk
Causality but Are Not Causal.” arXiv Preprint
arXiv:2308.13067.
Zenil, Hector. 2024. “Curb The Enthusiasm.” https://www.linkedin.com/posts/zenil_google-deepmind-makes-breakthrough-in-solving-activity-7154157779136446464-Gvv-.
Zgraggen, Emanuel, Zheguang Zhao, Robert Zeleznik, and Tim Kraska. 2018.
“Investigating the effect of the multiple
comparisons problem in visual analysis.” In
Proceedings of the 2018 CHI Conference on Human Factors in Computing
Systems, 1–12.
Zhang, Chiyuan, Samy Bengio, Moritz Hardt, Benjamin Recht, and Oriol
Vinyals. 2021. “Understanding Deep Learning (Still) Requires
Rethinking Generalization.” Commun. ACM 64 (3): 107–15.
https://doi.org/10.1145/3446776.