Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Understanding adversarial examples requires a theory of artefacts for deep learning

Abstract

Deep neural networks are currently the most widespread and successful technology in artificial intelligence. However, these systems exhibit bewildering new vulnerabilities: most notably a susceptibility to adversarial examples. Here, I review recent empirical research on adversarial examples that suggests that deep neural networks may be detecting in them features that are predictively useful, though inscrutable to humans. To understand the implications of this research, we should contend with some older philosophical puzzles about scientific reasoning, helping us to determine whether these features are reliable targets of scientific investigation or just the distinctive processing artefacts of deep neural networks.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Fig. 1: An ‘impersonation’ attack using ‘adversarial glasses’.
Fig. 2: A periodic signal function approximated by a Fourier series.
Fig. 3: Checkerboard artefacts produced by image deconvolution in GANs.

References

  1. 1.

    Silver, D. et al. Mastering the game of go without human knowledge. Nature 550, 354–359 (2017).

    Article  Google Scholar 

  2. 2.

    Shallue, C. J. & Vanderburg, A. Identifying exoplanets with deep learning: a five-planet resonant chain around Kepler-80 and an eighth planet around Kepler-90. Astron. J. 155, 94 (2018).

    Article  Google Scholar 

  3. 3.

    Albertsson, K. et al. Machine learning in high energy physics community white paper. J. Phys. Conf. Ser. 1085, 022008 (2018).

    Article  Google Scholar 

  4. 4.

    AlQuraishi, M. AlphaFold at CASP13. Bioinformatics 35, 4862–4865 (2019).

    Article  Google Scholar 

  5. 5.

    Fukushima, K. Neural network model for a mechanism of pattern recognition unaffected by shift in position-Neocognitron. IEICE Techn. Rep. A 62, 658–665 (1979).

    Google Scholar 

  6. 6.

    Hubel, D. H. & Wiesel, T. N. Cortical and callosal connections concerned with the vertical meridian of visual fields in the cat. J. Neurophysiol. 30, 1561–1573 (1967).

    Article  Google Scholar 

  7. 7.

    Rajalingham, R. et al. Large-scale, high-resolution comparison of the core visual object recognition behavior of humans, monkeys, and state-of-the-art deep artificial neural networks. J. Neurosci. 38, 7255–7269 (2018).

    Article  Google Scholar 

  8. 8.

    Guest, O. & Love, B. Levels of representation in a deep learning model of categorization. Preprint at https://doi.org/10.1101/626374 (2019).

  9. 9.

    Hong, H., Yamins, D. L., Majaj, N. J. & DiCarlo, J. J. Explicit information for category-orthogonal object properties increases along the ventral stream. Nat. Neurosci. 19, 613–622 (2016).

    Article  Google Scholar 

  10. 10.

    Kriegeskorte, N. Deep neural networks: a new framework for modeling biological vision and brain information processing. Annu. Rev. Vis. Sci. 1, 417–446 (2015).

    Article  Google Scholar 

  11. 11.

    Yamins, D. L. & DiCarlo, J. J. Using goal-driven deep learning models to understand sensory cortex. Nat. Neurosci. 19, 356–365 (2016).

    Article  Google Scholar 

  12. 12.

    Buckner, C. Empiricism without magic: transformational abstraction in deep convolutional neural networks. Synthese 195, 5339–5372 (2018).

    Article  Google Scholar 

  13. 13.

    Goodfellow, I. J., Shlens, J. & Szegedy, C. Explaining and harnessing adversarial examples. Preprint https://arxiv.org/abs/1412.6572 (2014).

  14. 14.

    Eykholt, K. et al. Robust physical-world attacks on deep learning visual classification. In Proc. IEEE Conf. Computer Vision and Pattern Recognition 1625–1634 (IEEE, 2018).

  15. 15.

    Sharif, M., Bhagavatula, S., Bauer, L. & Reiter, M. K. Accessorize to a crime: real and stealthy attacks on state-of-the-art face recognition. In Proc. 2016 ACM SIGSAC Conference on Computer and Communications Security 1528–1540 (ACM, 2016).

  16. 16.

    Yuan, X., He, P., Zhu, Q. & Li, X. Adversarial examples: attacks and defenses for deep learning. IEEE Trans. Neural Netw. Learn. Syst. 30, 2805–2824 (2019).

    MathSciNet  Article  Google Scholar 

  17. 17.

    Szegedy, C. et al. Intriguing properties of neural networks. Preprint at https://arxiv.org/abs/1312.6199 (2013).

  18. 18.

    Xu, W., Evans, D. & Qi, Y. Feature squeezing: detecting adversarial examples in deep neural networks. Preprint at https://arxiv.org/abs/1704.01155 (2017).

  19. 19.

    Nguyen, A., Yosinski, J. & Clune, J. Deep neural networks are easily fooled: high confidence. In IEEE Conf. Computer Vision and Pattern Recognition 427–436 (IEEE, 2015).

  20. 20.

    Elsayed, G. F. et al. Adversarial examples that fool both computer vision and time-limited humans. In Proc. 32nd Int. Conf. Neural Information Processing Systems 3914–3924 (NeurIPS, 2018).

  21. 21.

    Zhou, Z. & Firestone, C. Humans can decipher adversarial images. Nat. Commun. 10, 1334 (2019).

    Article  Google Scholar 

  22. 22.

    Ilyas, A. et al. Adversarial examples are not bugs, they are features. Preprint at https://arxiv.org/abs/1905.02175 (2019).

  23. 23.

    Wallace, E. A Discussion of ‘adversarial examples are not bugs, they are features’: learning from incorrectly labeled data. Distill 4, e00019.6 (2019).

    Article  Google Scholar 

  24. 24.

    Goodman, N. Fact, Fiction, and Forecast (Harvard Univ. Press, 1983).

  25. 25.

    Quine, W. V. in Essays in Honor of Carl G. Hempel 5–23 (Springer, 1969).

  26. 26.

    Boyd, R. Kinds, complexity and multiple realization. Philos. Stud. 95, 67–98 (1999).

    Article  Google Scholar 

  27. 27.

    Millikan, R. G. Historical kinds and the “special sciences”. Philos. Stud. 95, 45–65 (1999).

    Article  Google Scholar 

  28. 28.

    Putnam, H. in Vetus Testamentum Vol. 7 (ed. Gunderson, K.) 131–193 (Univ. Minnesota Press, 1975).

  29. 29.

    Harman, G. & Kulkarni, S. Reliable Reasoning: Induction and Statistical Learning Theory (MIT Press, 2012).

  30. 30.

    Suppes, P. in Grue! The New Riddle of Induction (ed. Stalker, D.) 263–272 (Open Court, 1994).

  31. 31.

    Thagard, P. Philosophy and machine learning. Can. J. Philos. 20, 261–276 (1990).

    Article  Google Scholar 

  32. 32.

    Arango-Muñoz, S. The nature of epistemic feelings. Philos. Psychol. 27, 193–211 (2014).

    Article  Google Scholar 

  33. 33.

    Khalifa, K. The role of explanation in understanding. Br. J. Philos. Sci. 64, 161–187 (2013).

    Article  Google Scholar 

  34. 34.

    Potochnik, A. Explanation and understanding. Eur. J. Philos. 1, 29–38 (2011).

    Article  Google Scholar 

  35. 35.

    Sullivan, E. Understanding from machine learning models. Br. J. Philos. Sci. https://doi.org/10.1093/bjps/axz035 (2019).

  36. 36.

    Humphreys, P. Emergence: A Philosophical Account (Oxford Univ. Press, 2016).

  37. 37.

    Theurer, K. L. Complexity-based theories of emergence: criticisms and constraints. Int. Stud. Philos. Sci. 28, 277–301 (2014).

    Article  Google Scholar 

  38. 38.

    Senior, A. W. et al. Improved protein structure prediction using potentials from deep learning. Nature 577, 706–710 (2020).

    Article  Google Scholar 

  39. 39.

    Gainza, P. et al. Deciphering interaction fingerprints from protein molecular surfaces using geometric deep learning. Nat. Methods 17, 184–192 (2020).

    Article  Google Scholar 

  40. 40.

    Goodfellow, I. NIPS 2016 tutorial: generative adversarial networks. Preprint at https://arxiv.org/abs/1701.00160 (2016).

  41. 41.

    Odena, A., Dumoulin, V. & Olah, C. Deconvolution and checkerboard artifacts. Distill 1, e3 (2016).

    Article  Google Scholar 

  42. 42.

    Goh, G. A Discussion of ‘adversarial examples are not bugs, they are features’: two examples of useful, non-robust features. Distill 4, e00019.3 (2019).

    Google Scholar 

  43. 43.

    Denzin, N. K. The Research Act: A Theoretical Introduction to Sociological Methods (Routledge, 2017).

  44. 44.

    Heesen, R., Bright, L. K. & Zucker, A. Vindicating methodological triangulation. Synthese 196, 3067–3081 (2019).

    MathSciNet  Article  Google Scholar 

  45. 45.

    Allman, D., Reiter, A. & Bell, M. A. L. Photoacoustic source detection and reflection artifact removal enabled by deep learning. IEEE Trans. Med. Imaging 37, 1464–1477 (2018).

    Article  Google Scholar 

  46. 46.

    Ylikoski, P. & Kuorikoski, J. Dissecting explanatory power. Philos. Stud. 148, 201–219 (2010).

    Article  Google Scholar 

Download references

Acknowledgements

This work has been supported by National Science Foundation grant 2020585.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Cameron Buckner.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information Nature Machine Intelligence thanks the anonymous reviewers for their contribution to the peer review of this work.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Buckner, C. Understanding adversarial examples requires a theory of artefacts for deep learning. Nat Mach Intell 2, 731–736 (2020). https://doi.org/10.1038/s42256-020-00266-y

Download citation

Further reading

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing