Robustness
Background
- Szegedy et al. (2013) were the first to point out the existence of adversarial examples in the image classification domain.
- Goodfellow, Shlens, and Szegedy (2014) argue that the existence of adversarial examples can be explained solely by the locally-linear nature of artificial neural networks. They show how simple linear perturbation through their fast gradient sign method can consistently fool many state-of-the-art neural networks. Adversarial training can improve robustness to some extent, but DNNs are still highly confident with respect to misclassified labels.
- Carlini and Wagner (2017) show that an initially promising method for robustifying DNNs, namely defensive distillation, is in fact insufficient. They argue that their adversarial attacks should serve as a benchmark for evaluating the robustness of DNNs.
Thoughts
- Link to anomaly detection (ING)
- Out-of-distribution detection for time series models (e.g. avoid Covid scenarios leading to model failures (Bholat, Gharbawi, and Thew 2020)).
- If adversarial training affects the success of adversarial attacks, does it also affect success of CE?
- Can we penalize instability much like we penalize complexity in empirical risk minimization?
References
Bholat, D, M Gharbawi, and O Thew. 2020. “The Impact of Covid on Machine Learning and Data Science in UK Banking.” Bank of England Quarterly Bulletin, Q4.
Carlini, Nicholas, and David Wagner. 2017. “Towards Evaluating the Robustness of Neural Networks.” In 2017 Ieee Symposium on Security and Privacy (Sp), 39–57. IEEE.
Goodfellow, Ian J, Jonathon Shlens, and Christian Szegedy. 2014. “Explaining and Harnessing Adversarial Examples.” arXiv Preprint arXiv:1412.6572.
Szegedy, Christian, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. 2013. “Intriguing Properties of Neural Networks.” arXiv Preprint arXiv:1312.6199.