Abstract
Deep neural networks are currently the most widespread and successful technology in artificial intelligence. However, these systems exhibit bewildering new vulnerabilities: most notably a susceptibility to adversarial examples. Here, I review recent empirical research on adversarial examples that suggests that deep neural networks may be detecting in them features that are predictively useful, though inscrutable to humans. To understand the implications of this research, we should contend with some older philosophical puzzles about scientific reasoning, helping us to determine whether these features are reliable targets of scientific investigation or just the distinctive processing artefacts of deep neural networks.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 digital issues and online access to articles
$119.00 per year
only $9.92 per issue
Buy this article
- Purchase on Springer Link
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
References
Silver, D. et al. Mastering the game of go without human knowledge. Nature 550, 354–359 (2017).
Shallue, C. J. & Vanderburg, A. Identifying exoplanets with deep learning: a five-planet resonant chain around Kepler-80 and an eighth planet around Kepler-90. Astron. J. 155, 94 (2018).
Albertsson, K. et al. Machine learning in high energy physics community white paper. J. Phys. Conf. Ser. 1085, 022008 (2018).
AlQuraishi, M. AlphaFold at CASP13. Bioinformatics 35, 4862–4865 (2019).
Fukushima, K. Neural network model for a mechanism of pattern recognition unaffected by shift in position-Neocognitron. IEICE Techn. Rep. A 62, 658–665 (1979).
Hubel, D. H. & Wiesel, T. N. Cortical and callosal connections concerned with the vertical meridian of visual fields in the cat. J. Neurophysiol. 30, 1561–1573 (1967).
Rajalingham, R. et al. Large-scale, high-resolution comparison of the core visual object recognition behavior of humans, monkeys, and state-of-the-art deep artificial neural networks. J. Neurosci. 38, 7255–7269 (2018).
Guest, O. & Love, B. Levels of representation in a deep learning model of categorization. Preprint at https://doi.org/10.1101/626374 (2019).
Hong, H., Yamins, D. L., Majaj, N. J. & DiCarlo, J. J. Explicit information for category-orthogonal object properties increases along the ventral stream. Nat. Neurosci. 19, 613–622 (2016).
Kriegeskorte, N. Deep neural networks: a new framework for modeling biological vision and brain information processing. Annu. Rev. Vis. Sci. 1, 417–446 (2015).
Yamins, D. L. & DiCarlo, J. J. Using goal-driven deep learning models to understand sensory cortex. Nat. Neurosci. 19, 356–365 (2016).
Buckner, C. Empiricism without magic: transformational abstraction in deep convolutional neural networks. Synthese 195, 5339–5372 (2018).
Goodfellow, I. J., Shlens, J. & Szegedy, C. Explaining and harnessing adversarial examples. Preprint https://arxiv.org/abs/1412.6572 (2014).
Eykholt, K. et al. Robust physical-world attacks on deep learning visual classification. In Proc. IEEE Conf. Computer Vision and Pattern Recognition 1625–1634 (IEEE, 2018).
Sharif, M., Bhagavatula, S., Bauer, L. & Reiter, M. K. Accessorize to a crime: real and stealthy attacks on state-of-the-art face recognition. In Proc. 2016 ACM SIGSAC Conference on Computer and Communications Security 1528–1540 (ACM, 2016).
Yuan, X., He, P., Zhu, Q. & Li, X. Adversarial examples: attacks and defenses for deep learning. IEEE Trans. Neural Netw. Learn. Syst. 30, 2805–2824 (2019).
Szegedy, C. et al. Intriguing properties of neural networks. Preprint at https://arxiv.org/abs/1312.6199 (2013).
Xu, W., Evans, D. & Qi, Y. Feature squeezing: detecting adversarial examples in deep neural networks. Preprint at https://arxiv.org/abs/1704.01155 (2017).
Nguyen, A., Yosinski, J. & Clune, J. Deep neural networks are easily fooled: high confidence. In IEEE Conf. Computer Vision and Pattern Recognition 427–436 (IEEE, 2015).
Elsayed, G. F. et al. Adversarial examples that fool both computer vision and time-limited humans. In Proc. 32nd Int. Conf. Neural Information Processing Systems 3914–3924 (NeurIPS, 2018).
Zhou, Z. & Firestone, C. Humans can decipher adversarial images. Nat. Commun. 10, 1334 (2019).
Ilyas, A. et al. Adversarial examples are not bugs, they are features. Preprint at https://arxiv.org/abs/1905.02175 (2019).
Wallace, E. A Discussion of ‘adversarial examples are not bugs, they are features’: learning from incorrectly labeled data. Distill 4, e00019.6 (2019).
Goodman, N. Fact, Fiction, and Forecast (Harvard Univ. Press, 1983).
Quine, W. V. in Essays in Honor of Carl G. Hempel 5–23 (Springer, 1969).
Boyd, R. Kinds, complexity and multiple realization. Philos. Stud. 95, 67–98 (1999).
Millikan, R. G. Historical kinds and the “special sciences”. Philos. Stud. 95, 45–65 (1999).
Putnam, H. in Vetus Testamentum Vol. 7 (ed. Gunderson, K.) 131–193 (Univ. Minnesota Press, 1975).
Harman, G. & Kulkarni, S. Reliable Reasoning: Induction and Statistical Learning Theory (MIT Press, 2012).
Suppes, P. in Grue! The New Riddle of Induction (ed. Stalker, D.) 263–272 (Open Court, 1994).
Thagard, P. Philosophy and machine learning. Can. J. Philos. 20, 261–276 (1990).
Arango-Muñoz, S. The nature of epistemic feelings. Philos. Psychol. 27, 193–211 (2014).
Khalifa, K. The role of explanation in understanding. Br. J. Philos. Sci. 64, 161–187 (2013).
Potochnik, A. Explanation and understanding. Eur. J. Philos. 1, 29–38 (2011).
Sullivan, E. Understanding from machine learning models. Br. J. Philos. Sci. https://doi.org/10.1093/bjps/axz035 (2019).
Humphreys, P. Emergence: A Philosophical Account (Oxford Univ. Press, 2016).
Theurer, K. L. Complexity-based theories of emergence: criticisms and constraints. Int. Stud. Philos. Sci. 28, 277–301 (2014).
Senior, A. W. et al. Improved protein structure prediction using potentials from deep learning. Nature 577, 706–710 (2020).
Gainza, P. et al. Deciphering interaction fingerprints from protein molecular surfaces using geometric deep learning. Nat. Methods 17, 184–192 (2020).
Goodfellow, I. NIPS 2016 tutorial: generative adversarial networks. Preprint at https://arxiv.org/abs/1701.00160 (2016).
Odena, A., Dumoulin, V. & Olah, C. Deconvolution and checkerboard artifacts. Distill 1, e3 (2016).
Goh, G. A Discussion of ‘adversarial examples are not bugs, they are features’: two examples of useful, non-robust features. Distill 4, e00019.3 (2019).
Denzin, N. K. The Research Act: A Theoretical Introduction to Sociological Methods (Routledge, 2017).
Heesen, R., Bright, L. K. & Zucker, A. Vindicating methodological triangulation. Synthese 196, 3067–3081 (2019).
Allman, D., Reiter, A. & Bell, M. A. L. Photoacoustic source detection and reflection artifact removal enabled by deep learning. IEEE Trans. Med. Imaging 37, 1464–1477 (2018).
Ylikoski, P. & Kuorikoski, J. Dissecting explanatory power. Philos. Stud. 148, 201–219 (2010).
Acknowledgements
This work has been supported by National Science Foundation grant 2020585.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Peer review information Nature Machine Intelligence thanks the anonymous reviewers for their contribution to the peer review of this work.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Buckner, C. Understanding adversarial examples requires a theory of artefacts for deep learning. Nat Mach Intell 2, 731–736 (2020). https://doi.org/10.1038/s42256-020-00266-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s42256-020-00266-y
This article is cited by
-
Percentages and reasons: AI explainability and ultimate human responsibility within the medical field
Ethics and Information Technology (2024)
-
When remediating one artifact results in another: control, confounders, and correction
History and Philosophy of the Life Sciences (2024)
-
An empirical comparison of deep learning explainability approaches for EEG using simulated ground truth
Scientific Reports (2023)
-
On the Philosophy of Unsupervised Learning
Philosophy & Technology (2023)
-
Functional Concept Proxies and the Actually Smart Hans Problem: What’s Special About Deep Neural Networks in Science
Synthese (2023)