Understanding adversarial examples requires a theory of artefacts for deep learning

Buckner, Cameron

doi:10.1038/s42256-020-00266-y

Perspective
Published: 23 November 2020

Understanding adversarial examples requires a theory of artefacts for deep learning

Cameron Buckner ORCID: orcid.org/0000-0003-0611-5354¹

Nature Machine Intelligence volume 2, pages 731–736 (2020)Cite this article

2056 Accesses
27 Citations
109 Altmetric
Metrics details

Subjects

Abstract

Deep neural networks are currently the most widespread and successful technology in artificial intelligence. However, these systems exhibit bewildering new vulnerabilities: most notably a susceptibility to adversarial examples. Here, I review recent empirical research on adversarial examples that suggests that deep neural networks may be detecting in them features that are predictively useful, though inscrutable to humans. To understand the implications of this research, we should contend with some older philosophical puzzles about scientific reasoning, helping us to determine whether these features are reliable targets of scientific investigation or just the distinctive processing artefacts of deep neural networks.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: An ‘impersonation’ attack using ‘adversarial glasses’.**

**Fig. 2: A periodic signal function approximated by a Fourier series.**

Fig. 3: Checkerboard artefacts produced by image deconvolution in GANs.

Explainable neural networks that simulate reasoning

Article 22 September 2021

Paul J. Blazek & Milo M. Lin

Subtle adversarial image manipulations influence both human and machine perception

Article Open access 15 August 2023

Vijay Veerabadran, Josh Goldman, … Gamaleldin F. Elsayed

Off-the-shelf deep learning is not enough, and requires parsimony, Bayesianity, and causality

Article Open access 27 January 2021

Rama K. Vasudevan, Maxim Ziatdinov, … Sergei V. Kalinin

References

Silver, D. et al. Mastering the game of go without human knowledge. Nature 550, 354–359 (2017).
Article Google Scholar
Shallue, C. J. & Vanderburg, A. Identifying exoplanets with deep learning: a five-planet resonant chain around Kepler-80 and an eighth planet around Kepler-90. Astron. J. 155, 94 (2018).
Article Google Scholar
Albertsson, K. et al. Machine learning in high energy physics community white paper. J. Phys. Conf. Ser. 1085, 022008 (2018).
Article Google Scholar
AlQuraishi, M. AlphaFold at CASP13. Bioinformatics 35, 4862–4865 (2019).
Article Google Scholar
Fukushima, K. Neural network model for a mechanism of pattern recognition unaffected by shift in position-Neocognitron. IEICE Techn. Rep. A 62, 658–665 (1979).
Google Scholar
Hubel, D. H. & Wiesel, T. N. Cortical and callosal connections concerned with the vertical meridian of visual fields in the cat. J. Neurophysiol. 30, 1561–1573 (1967).
Article Google Scholar
Rajalingham, R. et al. Large-scale, high-resolution comparison of the core visual object recognition behavior of humans, monkeys, and state-of-the-art deep artificial neural networks. J. Neurosci. 38, 7255–7269 (2018).
Article Google Scholar
Guest, O. & Love, B. Levels of representation in a deep learning model of categorization. Preprint at https://doi.org/10.1101/626374 (2019).
Hong, H., Yamins, D. L., Majaj, N. J. & DiCarlo, J. J. Explicit information for category-orthogonal object properties increases along the ventral stream. Nat. Neurosci. 19, 613–622 (2016).
Article Google Scholar
Kriegeskorte, N. Deep neural networks: a new framework for modeling biological vision and brain information processing. Annu. Rev. Vis. Sci. 1, 417–446 (2015).
Article Google Scholar
Yamins, D. L. & DiCarlo, J. J. Using goal-driven deep learning models to understand sensory cortex. Nat. Neurosci. 19, 356–365 (2016).
Article Google Scholar
Buckner, C. Empiricism without magic: transformational abstraction in deep convolutional neural networks. Synthese 195, 5339–5372 (2018).
Article Google Scholar
Goodfellow, I. J., Shlens, J. & Szegedy, C. Explaining and harnessing adversarial examples. Preprint https://arxiv.org/abs/1412.6572 (2014).
Eykholt, K. et al. Robust physical-world attacks on deep learning visual classification. In Proc. IEEE Conf. Computer Vision and Pattern Recognition 1625–1634 (IEEE, 2018).
Sharif, M., Bhagavatula, S., Bauer, L. & Reiter, M. K. Accessorize to a crime: real and stealthy attacks on state-of-the-art face recognition. In Proc. 2016 ACM SIGSAC Conference on Computer and Communications Security 1528–1540 (ACM, 2016).
Yuan, X., He, P., Zhu, Q. & Li, X. Adversarial examples: attacks and defenses for deep learning. IEEE Trans. Neural Netw. Learn. Syst. 30, 2805–2824 (2019).
Article MathSciNet Google Scholar
Szegedy, C. et al. Intriguing properties of neural networks. Preprint at https://arxiv.org/abs/1312.6199 (2013).
Xu, W., Evans, D. & Qi, Y. Feature squeezing: detecting adversarial examples in deep neural networks. Preprint at https://arxiv.org/abs/1704.01155 (2017).
Nguyen, A., Yosinski, J. & Clune, J. Deep neural networks are easily fooled: high confidence. In IEEE Conf. Computer Vision and Pattern Recognition 427–436 (IEEE, 2015).
Elsayed, G. F. et al. Adversarial examples that fool both computer vision and time-limited humans. In Proc. 32nd Int. Conf. Neural Information Processing Systems 3914–3924 (NeurIPS, 2018).
Zhou, Z. & Firestone, C. Humans can decipher adversarial images. Nat. Commun. 10, 1334 (2019).
Article Google Scholar
Ilyas, A. et al. Adversarial examples are not bugs, they are features. Preprint at https://arxiv.org/abs/1905.02175 (2019).
Wallace, E. A Discussion of ‘adversarial examples are not bugs, they are features’: learning from incorrectly labeled data. Distill 4, e00019.6 (2019).
Article Google Scholar
Goodman, N. Fact, Fiction, and Forecast (Harvard Univ. Press, 1983).
Quine, W. V. in Essays in Honor of Carl G. Hempel 5–23 (Springer, 1969).
Boyd, R. Kinds, complexity and multiple realization. Philos. Stud. 95, 67–98 (1999).
Article Google Scholar
Millikan, R. G. Historical kinds and the “special sciences”. Philos. Stud. 95, 45–65 (1999).
Article Google Scholar
Putnam, H. in Vetus Testamentum Vol. 7 (ed. Gunderson, K.) 131–193 (Univ. Minnesota Press, 1975).
Harman, G. & Kulkarni, S. Reliable Reasoning: Induction and Statistical Learning Theory (MIT Press, 2012).
Suppes, P. in Grue! The New Riddle of Induction (ed. Stalker, D.) 263–272 (Open Court, 1994).
Thagard, P. Philosophy and machine learning. Can. J. Philos. 20, 261–276 (1990).
Article Google Scholar
Arango-Muñoz, S. The nature of epistemic feelings. Philos. Psychol. 27, 193–211 (2014).
Article Google Scholar
Khalifa, K. The role of explanation in understanding. Br. J. Philos. Sci. 64, 161–187 (2013).
Article Google Scholar
Potochnik, A. Explanation and understanding. Eur. J. Philos. 1, 29–38 (2011).
Article Google Scholar
Sullivan, E. Understanding from machine learning models. Br. J. Philos. Sci. https://doi.org/10.1093/bjps/axz035 (2019).
Humphreys, P. Emergence: A Philosophical Account (Oxford Univ. Press, 2016).
Theurer, K. L. Complexity-based theories of emergence: criticisms and constraints. Int. Stud. Philos. Sci. 28, 277–301 (2014).
Article Google Scholar
Senior, A. W. et al. Improved protein structure prediction using potentials from deep learning. Nature 577, 706–710 (2020).
Article Google Scholar
Gainza, P. et al. Deciphering interaction fingerprints from protein molecular surfaces using geometric deep learning. Nat. Methods 17, 184–192 (2020).
Article Google Scholar
Goodfellow, I. NIPS 2016 tutorial: generative adversarial networks. Preprint at https://arxiv.org/abs/1701.00160 (2016).
Odena, A., Dumoulin, V. & Olah, C. Deconvolution and checkerboard artifacts. Distill 1, e3 (2016).
Article Google Scholar
Goh, G. A Discussion of ‘adversarial examples are not bugs, they are features’: two examples of useful, non-robust features. Distill 4, e00019.3 (2019).
Google Scholar
Denzin, N. K. The Research Act: A Theoretical Introduction to Sociological Methods (Routledge, 2017).
Heesen, R., Bright, L. K. & Zucker, A. Vindicating methodological triangulation. Synthese 196, 3067–3081 (2019).
Article MathSciNet Google Scholar
Allman, D., Reiter, A. & Bell, M. A. L. Photoacoustic source detection and reflection artifact removal enabled by deep learning. IEEE Trans. Med. Imaging 37, 1464–1477 (2018).
Article Google Scholar
Ylikoski, P. & Kuorikoski, J. Dissecting explanatory power. Philos. Stud. 148, 201–219 (2010).
Article Google Scholar

Download references

Acknowledgements

This work has been supported by National Science Foundation grant 2020585.

Author information

Authors and Affiliations

Department of Philosophy, The University of Houston, Houston, TX, USA
Cameron Buckner

Authors

Cameron Buckner
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Cameron Buckner.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information Nature Machine Intelligence thanks the anonymous reviewers for their contribution to the peer review of this work.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Buckner, C. Understanding adversarial examples requires a theory of artefacts for deep learning. Nat Mach Intell 2, 731–736 (2020). https://doi.org/10.1038/s42256-020-00266-y

Download citation

Received: 20 March 2020
Accepted: 28 October 2020
Published: 23 November 2020
Issue Date: December 2020
DOI: https://doi.org/10.1038/s42256-020-00266-y

This article is cited by

Percentages and reasons: AI explainability and ultimate human responsibility within the medical field
- Markus Herrmann
- Andreas Wabro
- Eva Winkler
Ethics and Information Technology (2024)
When remediating one artifact results in another: control, confounders, and correction
- David Colaço
History and Philosophy of the Life Sciences (2024)
An empirical comparison of deep learning explainability approaches for EEG using simulated ground truth
- Akshay Sujatha Ravindran
- Jose Contreras-Vidal
Scientific Reports (2023)
On the Philosophy of Unsupervised Learning
- David S. Watson
Philosophy & Technology (2023)
Functional Concept Proxies and the Actually Smart Hans Problem: What’s Special About Deep Neural Networks in Science
- Florian J. Boge
Synthese (2023)