Literature 📚

Literature overview

This is a collection of interesting papers and thoughts around Trustworthy AI that I am gradually compiling over time. It also serves as my own interactive notebook. Descriptions of papers vary in length and some are very brief. A list of all references linked here can be found at the bottom.

Date	Title	Author
2020	‘How Do i Fool You?’ Manipulating User Trust via Misleading Black Box Explanations.	Lakkaraju and Bastani (2020)
2020	Fooling Lime and Shap: Adversarial Attacks on Post Hoc Explanation Methods.	Slack et al. (2020)
2019	Actionable Recourse in Linear Classification.	Ustun, Spangher, and Liu (2019)
2019	Stop Explaining Black Box Machine Learning Models for High Stakes Decisions and Use Interpretable Models Instead.	Rudin (2019)
2019	Explaining Explanations in AI.	Mittelstadt, Russell, and Wachter (2019)
2017	Counterfactual Explanations Without Opening the Black Box: Automated Decisions and the GDPR.	Wachter, Mittelstadt, and Russell (2017)
2017	A Unified Approach to Interpreting Model Predictions.	Lundberg and Lee (2017)
2016	‘Why Should I Trust You?’ Explaining the Predictions of Any Classifier	Ribeiro, Singh, and Guestrin (2016)

References

Lakkaraju, Himabindu, and Osbert Bastani. 2020. “" How Do i Fool You?" Manipulating User Trust via Misleading Black Box Explanations.” In Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society, 79–85.

Lundberg, Scott M, and Su-In Lee. 2017. “A Unified Approach to Interpreting Model Predictions.” In Proceedings of the 31st International Conference on Neural Information Processing Systems, 4768–77.

Mittelstadt, Brent, Chris Russell, and Sandra Wachter. 2019. “Explaining Explanations in AI.” In Proceedings of the Conference on Fairness, Accountability, and Transparency, 279–88.

Ribeiro, Marco Tulio, Sameer Singh, and Carlos Guestrin. 2016. “"Why Should i Trust You?" Explaining the Predictions of Any Classifier.” In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 1135–44.

Rudin, Cynthia. 2019. “Stop Explaining Black Box Machine Learning Models for High Stakes Decisions and Use Interpretable Models Instead.” Nature Machine Intelligence 1 (5): 206–15.

Slack, Dylan, Sophie Hilgard, Emily Jia, Sameer Singh, and Himabindu Lakkaraju. 2020. “Fooling Lime and Shap: Adversarial Attacks on Post Hoc Explanation Methods.” In Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society, 180–86.

Ustun, Berk, Alexander Spangher, and Yang Liu. 2019. “Actionable Recourse in Linear Classification.” In Proceedings of the Conference on Fairness, Accountability, and Transparency, 10–19.

Wachter, Sandra, Brent Mittelstadt, and Chris Russell. 2017. “Counterfactual Explanations Without Opening the Black Box: Automated Decisions and the GDPR.” Harv. JL & Tech. 31: 841.

Categories

Literature overview

References