Deep neural networks are currently the most widespread and successful technology in artificial intelligence. However, these systems exhibit bewildering new vulnerabilities: most notably a susceptibility to adversarial examples. Here, I review recent empirical research on adversarial examples that suggests that deep neural networks may be detecting in them features that are predictively useful, though inscrutable to humans. To understand the implications of this research, we should contend with some older philosophical puzzles about scientific reasoning, helping us to determine whether these features are reliable targets of scientific investigation or just the distinctive processing artefacts of deep neural networks.
This is a preview of subscription content
Subscribe to Nature+
Get immediate online access to the entire Nature family of 50+ journals
Subscribe to Journal
Get full journal access for 1 year
only $8.25 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Tax calculation will be finalised during checkout.
Get time limited or full article access on ReadCube.
All prices are NET prices.
Silver, D. et al. Mastering the game of go without human knowledge. Nature 550, 354–359 (2017).
Shallue, C. J. & Vanderburg, A. Identifying exoplanets with deep learning: a five-planet resonant chain around Kepler-80 and an eighth planet around Kepler-90. Astron. J. 155, 94 (2018).
Albertsson, K. et al. Machine learning in high energy physics community white paper. J. Phys. Conf. Ser. 1085, 022008 (2018).
AlQuraishi, M. AlphaFold at CASP13. Bioinformatics 35, 4862–4865 (2019).
Fukushima, K. Neural network model for a mechanism of pattern recognition unaffected by shift in position-Neocognitron. IEICE Techn. Rep. A 62, 658–665 (1979).
Hubel, D. H. & Wiesel, T. N. Cortical and callosal connections concerned with the vertical meridian of visual fields in the cat. J. Neurophysiol. 30, 1561–1573 (1967).
Rajalingham, R. et al. Large-scale, high-resolution comparison of the core visual object recognition behavior of humans, monkeys, and state-of-the-art deep artificial neural networks. J. Neurosci. 38, 7255–7269 (2018).
Guest, O. & Love, B. Levels of representation in a deep learning model of categorization. Preprint at https://doi.org/10.1101/626374 (2019).
Hong, H., Yamins, D. L., Majaj, N. J. & DiCarlo, J. J. Explicit information for category-orthogonal object properties increases along the ventral stream. Nat. Neurosci. 19, 613–622 (2016).
Kriegeskorte, N. Deep neural networks: a new framework for modeling biological vision and brain information processing. Annu. Rev. Vis. Sci. 1, 417–446 (2015).
Yamins, D. L. & DiCarlo, J. J. Using goal-driven deep learning models to understand sensory cortex. Nat. Neurosci. 19, 356–365 (2016).
Buckner, C. Empiricism without magic: transformational abstraction in deep convolutional neural networks. Synthese 195, 5339–5372 (2018).
Goodfellow, I. J., Shlens, J. & Szegedy, C. Explaining and harnessing adversarial examples. Preprint https://arxiv.org/abs/1412.6572 (2014).
Eykholt, K. et al. Robust physical-world attacks on deep learning visual classification. In Proc. IEEE Conf. Computer Vision and Pattern Recognition 1625–1634 (IEEE, 2018).
Sharif, M., Bhagavatula, S., Bauer, L. & Reiter, M. K. Accessorize to a crime: real and stealthy attacks on state-of-the-art face recognition. In Proc. 2016 ACM SIGSAC Conference on Computer and Communications Security 1528–1540 (ACM, 2016).
Yuan, X., He, P., Zhu, Q. & Li, X. Adversarial examples: attacks and defenses for deep learning. IEEE Trans. Neural Netw. Learn. Syst. 30, 2805–2824 (2019).
Szegedy, C. et al. Intriguing properties of neural networks. Preprint at https://arxiv.org/abs/1312.6199 (2013).
Xu, W., Evans, D. & Qi, Y. Feature squeezing: detecting adversarial examples in deep neural networks. Preprint at https://arxiv.org/abs/1704.01155 (2017).
Nguyen, A., Yosinski, J. & Clune, J. Deep neural networks are easily fooled: high confidence. In IEEE Conf. Computer Vision and Pattern Recognition 427–436 (IEEE, 2015).
Elsayed, G. F. et al. Adversarial examples that fool both computer vision and time-limited humans. In Proc. 32nd Int. Conf. Neural Information Processing Systems 3914–3924 (NeurIPS, 2018).
Zhou, Z. & Firestone, C. Humans can decipher adversarial images. Nat. Commun. 10, 1334 (2019).
Ilyas, A. et al. Adversarial examples are not bugs, they are features. Preprint at https://arxiv.org/abs/1905.02175 (2019).
Wallace, E. A Discussion of ‘adversarial examples are not bugs, they are features’: learning from incorrectly labeled data. Distill 4, e00019.6 (2019).
Goodman, N. Fact, Fiction, and Forecast (Harvard Univ. Press, 1983).
Quine, W. V. in Essays in Honor of Carl G. Hempel 5–23 (Springer, 1969).
Boyd, R. Kinds, complexity and multiple realization. Philos. Stud. 95, 67–98 (1999).
Millikan, R. G. Historical kinds and the “special sciences”. Philos. Stud. 95, 45–65 (1999).
Putnam, H. in Vetus Testamentum Vol. 7 (ed. Gunderson, K.) 131–193 (Univ. Minnesota Press, 1975).
Harman, G. & Kulkarni, S. Reliable Reasoning: Induction and Statistical Learning Theory (MIT Press, 2012).
Suppes, P. in Grue! The New Riddle of Induction (ed. Stalker, D.) 263–272 (Open Court, 1994).
Thagard, P. Philosophy and machine learning. Can. J. Philos. 20, 261–276 (1990).
Arango-Muñoz, S. The nature of epistemic feelings. Philos. Psychol. 27, 193–211 (2014).
Khalifa, K. The role of explanation in understanding. Br. J. Philos. Sci. 64, 161–187 (2013).
Potochnik, A. Explanation and understanding. Eur. J. Philos. 1, 29–38 (2011).
Sullivan, E. Understanding from machine learning models. Br. J. Philos. Sci. https://doi.org/10.1093/bjps/axz035 (2019).
Humphreys, P. Emergence: A Philosophical Account (Oxford Univ. Press, 2016).
Theurer, K. L. Complexity-based theories of emergence: criticisms and constraints. Int. Stud. Philos. Sci. 28, 277–301 (2014).
Senior, A. W. et al. Improved protein structure prediction using potentials from deep learning. Nature 577, 706–710 (2020).
Gainza, P. et al. Deciphering interaction fingerprints from protein molecular surfaces using geometric deep learning. Nat. Methods 17, 184–192 (2020).
Goodfellow, I. NIPS 2016 tutorial: generative adversarial networks. Preprint at https://arxiv.org/abs/1701.00160 (2016).
Odena, A., Dumoulin, V. & Olah, C. Deconvolution and checkerboard artifacts. Distill 1, e3 (2016).
Goh, G. A Discussion of ‘adversarial examples are not bugs, they are features’: two examples of useful, non-robust features. Distill 4, e00019.3 (2019).
Denzin, N. K. The Research Act: A Theoretical Introduction to Sociological Methods (Routledge, 2017).
Heesen, R., Bright, L. K. & Zucker, A. Vindicating methodological triangulation. Synthese 196, 3067–3081 (2019).
Allman, D., Reiter, A. & Bell, M. A. L. Photoacoustic source detection and reflection artifact removal enabled by deep learning. IEEE Trans. Med. Imaging 37, 1464–1477 (2018).
Ylikoski, P. & Kuorikoski, J. Dissecting explanatory power. Philos. Stud. 148, 201–219 (2010).
This work has been supported by National Science Foundation grant 2020585.
The authors declare no competing interests.
Peer review information Nature Machine Intelligence thanks the anonymous reviewers for their contribution to the peer review of this work.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Buckner, C. Understanding adversarial examples requires a theory of artefacts for deep learning. Nat Mach Intell 2, 731–736 (2020). https://doi.org/10.1038/s42256-020-00266-y
Minds and Machines (2022)
Minds and Machines (2022)
Towards neural Earth system modelling by integrating artificial intelligence in Earth system science
Nature Machine Intelligence (2021)
AI & SOCIETY (2021)