Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Perspective
  • Published:

Using goal-driven deep learning models to understand sensory cortex


Fueled by innovation in the computer vision and artificial intelligence communities, recent developments in computational neuroscience have used goal-driven hierarchical convolutional neural networks (HCNNs) to make strides in modeling neural single-unit and population responses in higher visual cortical areas. In this Perspective, we review the recent progress in a broader modeling context and describe some of the key technical innovations that have supported it. We then outline how the goal-driven HCNN approach can be used to delve even more deeply into understanding the development and organization of sensory cortical processing.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Figure 1: HCNNs as models of sensory cortex.
Figure 2: Goal-driven optimization yields neurally predictive models of ventral visual cortex.
Figure 3: The components of goal-driven modeling.

Similar content being viewed by others


  1. DiCarlo, J.J. & Cox, D.D. Untangling invariant object recognition. Trends Cogn. Sci. 11, 333–341 (2007).

    Article  PubMed  Google Scholar 

  2. DiCarlo, J.J., Zoccolan, D. & Rust, N.C. How does the brain solve visual object recognition? Neuron 73, 415–434 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Felleman, D.J. & Van Essen, D.C. Distributed hierarchical processing in the primate cerebral cortex. Cereb. Cortex 1, 1–47 (1991).

    Article  CAS  PubMed  Google Scholar 

  4. Malach, R., Levy, I. & Hasson, U. The topography of high-order human object areas. Trends Cogn. Sci. 6, 176–184 (2002).

    Article  PubMed  Google Scholar 

  5. Carandini, M. et al. Do we know what the early visual system does? J. Neurosci. 25, 10577–10597 (2005).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Sharpee, T.O., Kouh, M. & Reynolds, J.H. Trade-off between curvature tuning and position invariance in visual area V4. Proc. Natl. Acad. Sci. USA 110, 11618–11623 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. David, S.V., Hayden, B.Y. & Gallant, J.L. Spectral receptive field properties explain shape selectivity in area V4. J. Neurophysiol. 96, 3492–3505 (2006).

    Article  PubMed  Google Scholar 

  8. Gallant, J.L., Connor, C.E., Rakshit, S., Lewis, J.W. & Van Essen, D.C. Neural responses to polar, hyperbolic, and Cartesian gratings in area V4 of the macaque monkey. J. Neurophysiol. 76, 2718–2739 (1996).

    Article  CAS  PubMed  Google Scholar 

  9. Rust, N.C., Mante, V., Simoncelli, E.P. & Movshon, J.A. How MT cells analyze the motion of visual patterns. Nat. Neurosci. 9, 1421–1431 (2006).

    CAS  PubMed  Google Scholar 

  10. Hubel, D.H. & Wiesel, T.N. Receptive fields of single neurones in the cat's striate cortex. J. Physiol. (Lond.) 148, 574–591 (1959).

    Article  CAS  Google Scholar 

  11. Fukushima, K. Neocognitron: a self organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biol. Cybern. 36, 193–202 (1980).

    Article  CAS  PubMed  Google Scholar 

  12. Riesenhuber, M. & Poggio, T. Hierarchical models of object recognition in cortex. Nat. Neurosci. 2, 1019–1025 (1999).

    Article  CAS  PubMed  Google Scholar 

  13. Serre, T., Oliva, A. & Poggio, T. A feedforward architecture accounts for rapid categorization. Proc. Natl. Acad. Sci. USA 104, 6424–6429 (2007).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Bengio, Y. Learning Deep Architectures for AI (Now Publishers, 2009).

  15. Pinto, N., Doukhan, D., DiCarlo, J.J. & Cox, D.D. A high-throughput screening approach to discovering good forms of biologically inspired visual representation. PLoS Comput. Biol. 5, e1000579 (2009).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  16. LeCun, Y. & Bengio, Y. Convolutional networks for images, speech, and time series. in The Handbook of Brain Theory and Neural Networks 255–258 (MIT Press, 1995).

  17. Carandini, M. & Heeger, D.J. Normalization as a canonical neural computation. Nat. Rev. Neurosci. 13, 51–62 (2012).

    Article  CAS  Google Scholar 

  18. Yamins, D., Hong, H., Cadieu, C. & Dicarlo, J. Hierarchical modular optimization of convolutional networks achieves representations similar to macaque it and human ventral stream. Adv. Neural Inf. Process. Syst. 26, 3093–3101 (2013).

    Google Scholar 

  19. De Valois, K.K., De Valois, R.L. & Yund, E.W. Responses of striate cortex cells to grating and checkerboard patterns. J. Physiol. (Lond.) 291, 483–505 (1979).

    Article  CAS  Google Scholar 

  20. Jones, J.P. & Palmer, L.A. An evaluation of the two-dimensional Gabor filter model of simple receptive fields in cat striate cortex. J. Neurophysiol. 58, 1233–1258 (1987).

    Article  CAS  PubMed  Google Scholar 

  21. Movshon, J.A., Thompson, I.D. & Tolhurst, D.J. Spatial summation in the receptive fields of simple cells in the cat's striate cortex. J. Physiol. (Lond.) 283, 53–77 (1978).

    Article  CAS  Google Scholar 

  22. Klein, D.J., Simon, J.Z., Depireux, D.A. & Shamma, S.A. Stimulus-invariant processing and spectrotemporal reverse correlation in primary auditory cortex. J. Comput. Neurosci. 20, 111–136 (2006).

    Article  PubMed  Google Scholar 

  23. Barlow, H.B. Possible principles underlying the transformations of sensory messages. in Sensory Communication Vol. 1, 217–234 (1961).

    Google Scholar 

  24. Olshausen, B.A. & Field, D.J. Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature 381, 607–609 (1996).

    Article  CAS  PubMed  Google Scholar 

  25. deCharms, R.C. & Zador, A. Neural representation and the cortical code. Annu. Rev. Neurosci. 23, 613–647 (2000).

    Article  CAS  PubMed  Google Scholar 

  26. Olshausen, B.A., Sallee, P. & Lewicki, M.S. Learning sparse image codes using a wavelet pyramid architecture. Adv. Neural Inf. Process. Syst. 14, 887–893 (2001).

    Google Scholar 

  27. Logothetis, N.K., Pauls, J. & Poggio, T. Shape representation in the inferior temporal cortex of monkeys. Curr. Biol. 5, 552–563 (1995).

    Article  CAS  PubMed  Google Scholar 

  28. Zoccolan, D., Kouh, M., Poggio, T. & DiCarlo, J.J. Trade-off between object selectivity and tolerance in monkey inferotemporal cortex. J. Neurosci. 27, 12292–12307 (2007).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Kriegeskorte, N. Relating population-code representations between man, monkey, and computational models. Front. Neurosci. 3, 363–373 (2009).

    Article  PubMed  PubMed Central  Google Scholar 

  30. Ullman, S. Visual routines. Cognition 18, 97–159 (1984).

    Article  CAS  PubMed  Google Scholar 

  31. Singer, W. & Gray, C.M. Visual feature integration and the temporal correlation hypothesis. Annu. Rev. Neurosci. 18, 555–586 (1995).

    Article  CAS  PubMed  Google Scholar 

  32. Majaj, N.J., Hong, H., Solomon, E.A. & DiCarlo, J.J. Simple learned weighted sums of inferior temporal neuronal firing rates accurately predict human core object recognition performance. J. Neurosci. 35, 13402–13418 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. Yamins, D.L.K. et al. Performance-optimized hierarchical models predict neural responses in higher visual cortex. Proc. Natl. Acad. Sci. USA 111, 8619–8624 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. Cadieu, C.F. et al. Deep neural networks rival the representation of primate IT cortex for core visual object recognition. PLoS Comput. Biol. 10, e1003963 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  35. Khaligh-Razavi, S.M. & Kriegeskorte, N. Deep supervised, but not unsupervised, models may explain IT cortical representation. PLoS Comput. Biol. 10, e1003915 (2014).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  36. Güçlü, U. & van Gerven, M.A. Deep neural networks reveal a gradient in the complexity of neural representations across the ventral stream. J. Neurosci. 35, 10005–10014 (2015).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  37. Yau, J.M., Pasupathy, A., Brincat, S.L. & Connor, C.E. Curvature processing dynamics in macaque area V4. Cereb. Cortex 23, 198–209 (2013).

    Article  PubMed  Google Scholar 

  38. Freeman, J. & Simoncelli, E.P. Metamers of the ventral stream. Nat. Neurosci. 14, 1195–1201 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. Pasupathy, A. & Connor, C.E. Population coding of shape in area V4. Nat. Neurosci. 5, 1332–1338 (2002).

    Article  CAS  PubMed  Google Scholar 

  40. Kell, A., Yamins, D., Norman-Haignere, S. & McDermott, J. Functional organization of auditory cortex revealed by neural networks optimized for auditory tasks. Soc. Neurosci. Abstr. 466.04 (2015).

  41. Razavian, A.S., Azizpour, H., Sullivan, J. & Carlsson, S. CNN features off-the-shelf: an astounding baseline for recognition. in Computer Vision and Pattern Recognition Workshops (CVPRW), IEEE Conference on, 512–519 (IEEE, 2014).

  42. Bottou, L. Large-scale machine learning with stochastic gradient descent. in Proc. COMPSTAT 2010, 177–186 (Springer, 2010).

  43. Krizhevsky, A., Sutskever, I. & Hinton, G. ImageNet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 25, 1097–1105 (2012).

    Google Scholar 

  44. Choudhary, S. et al. Silicon neurons that compute. in Artificial Neural Networks and Machine Learning–ICANN 2012, 121–128 (Springer, 2012).

  45. Snoek, J., Larochelle, H. & Adams, R.P. Practical bayesian optimization of machine learning algorithms. Adv. Neural Inf. Process. Syst. 26, 2951–2959 (2012).

    Google Scholar 

  46. Bergstra, J., Yamins, D. & Cox, D. Making a science of model search: hyperparameter optimization in hundreds of dimensions for vision architectures. In Proc. 30th International Conference on Machine Learning 115–123, (2013).

  47. Griffin, G., Holub, A. & Perona, P. The Caltech-256 object category dataset. Caltech Technical Report, (2007).

  48. Pinto, N., Cox, D.D. & DiCarlo, J.J. Why is real-world visual object recognition hard? PLoS Comput. Biol. 4, e27 (2008).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  49. Deng, J. et al. ImageNet: a large-scale hierarchical image database. in CVPR 2009, IEEE Conference on Computer Vision and Pattern Recognition, 248–288 (IEEE, 2009).

  50. Simonyan, K. & Zisserman, A. Very deep convolutional networks for large-scale image recognition. Preprint at (2014).

  51. Szegedy, C. et al. Going deeper with convolutions. Preprint at (2014).

  52. Halevy, A., Norvig, P. & Pereira, F. The unreasonable effectiveness of data. IEEE Intell. Syst. 24, 8–12 (2009).

    Article  Google Scholar 

  53. Pillow, J.W. et al. Spatio-temporal correlations and visual signalling in a complete neuronal population. Nature 454, 995–999 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  54. Khorrami, P., Paine, T.L. & Huang, T.S. Do deep neural networks learn facial action units when doing expression recognition? Preprint at (2015).

  55. Hinton, G.E., Dayan, P., Frey, B.J. & Neal, R.M. The “wake-sleep” algorithm for unsupervised neural networks. Science 268, 1158–1161 (1995).

    Article  CAS  PubMed  Google Scholar 

  56. Zhu, L.L., Lin, C., Huang, H., Chen, Y. & Yuille, A. Unsupervised structure learning: hierarchical recursive composition, suspicious coincidence and competitive exclusion. in Computer Vision–ECCV 2008, 759–773 (Springer, 2008).

  57. Bengio, Y. Deep learning of representations for unsupervised and transfer learning. In Unsupervised and Transfer Learning: Challenges in Machine Learning Vol. 7 (eds. Guyon, I., Dror, G & Lemaire, V.) 29–41 (Microtome, 2013).

  58. Mante, V., Sussillo, D., Shenoy, K.V. & Newsome, W.T. Context-dependent computation by recurrent dynamics in prefrontal cortex. Nature 503, 78–84 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  59. Stadie, B.C., Levine, S. & Abbeel, P. Incentivizing exploration in reinforcement learning with deep predictive models. Preprint at (2015).

  60. Mnih, V. et al. Human-level control through deep reinforcement learning. Nature 518, 529–533 (2015).

    Article  CAS  PubMed  Google Scholar 

  61. Harvey, C.D., Coen, P. & Tank, D.W. Choice-specific sequences in parietal cortex during a virtual-navigation decision task. Nature 484, 62–68 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  62. Hulbert, J. & Norman, K. Neural differentiation tracks improved recall of competing memories following interleaved study and retrieval practice. Cereb. Cortex 25, 3994–4008 (2015).

    Article  CAS  PubMed  Google Scholar 

  63. Hung, C.P., Kreiman, G., Poggio, T. & DiCarlo, J.J. Fast readout of object identity from macaque inferior temporal cortex. Science 310, 863–866 (2005).

    Article  CAS  PubMed  Google Scholar 

  64. Rust, N.C. & Dicarlo, J.J. Selectivity and tolerance (“invariance”) both increase as visual information propagates from cortical area V4 to IT. J. Neurosci. 30, 12978–12995 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  65. Freedman, D.J., Riesenhuber, M., Poggio, T. & Miller, E.K. Categorical representation of visual stimuli in the primate prefrontal cortex. Science 291, 312–316 (2001).

    Article  CAS  PubMed  Google Scholar 

  66. Pagan, M., Urban, L.S., Wohl, M.P. & Rust, N.C. Signals in inferotemporal and perirhinal cortex suggest an untangling of visual target information. Nat. Neurosci. 16, 1132–1139 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  67. Marder, E. Understanding brains: details, intuition, and big data. PLoS Biol. 13, e1002147 (2015).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  68. Gatys, L.A., Ecker, A.S. & Bethge, M. A neural algorithm of artistic style Preprint at (2015).

  69. Yamane, Y., Carlson, E.T., Bowman, K.C., Wang, Z. & Connor, C.E. A neural code for three-dimensional object shape in macaque inferotemporal cortex. Nat. Neurosci. 11, 1352–1360 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  70. Afraz, A., Boyden, E.S. & DiCarlo, J.J. Optogenetic and pharmacological suppression of spatial clusters of face neurons reveal their causal role in face gender discrimination. Proc. Natl. Acad. Sci. USA 112, 6730–6735 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  71. Marr, D., Poggio, T. & Ullman, S. Vision: A Computational Investigation Into the Human Representation and Processing of Visual Information (MIT Press, 2010).

  72. Hoyle, G. The scope of neuroethology. Behav. Brain Sci. 7, 367–381 (1984).

    Article  Google Scholar 

  73. Szegedy, C. et al. Intriguing properties of neural networks. Preprint at (2013).

  74. Goodfellow, I.J., Shlens, J. & Szegedy, C. Explaining and harnessing adversarial examples. Preprint at (2014).

Download references

Author information

Authors and Affiliations


Corresponding author

Correspondence to Daniel L K Yamins.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yamins, D., DiCarlo, J. Using goal-driven deep learning models to understand sensory cortex. Nat Neurosci 19, 356–365 (2016).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:

This article is cited by


Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing