Article | Published:

Explicit information for category-orthogonal object properties increases along the ventral stream

Nature Neuroscience volume 19, pages 613622 (2016) | Download Citation

Abstract

Extensive research has revealed that the ventral visual stream hierarchically builds a robust representation for supporting visual object categorization tasks. We systematically explored the ability of multiple ventral visual areas to support a variety of 'category-orthogonal' object properties such as position, size and pose. For complex naturalistic stimuli, we found that the inferior temporal (IT) population encodes all measured category-orthogonal object properties, including those properties often considered to be low-level features (for example, position), more explicitly than earlier ventral stream areas. We also found that the IT population better predicts human performance patterns across properties. A hierarchical neural network model based on simple computational principles generates these same cross-area patterns of information. Taken together, our empirical results support the hypothesis that all behaviorally relevant object properties are extracted in concert up the ventral visual hierarchy, and our computational model explains how that hierarchy might be built.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

References

  1. 1.

    , & How does the brain solve visual object recognition? Neuron 73, 415–434 (2012).

  2. 2.

    & Untangling invariant object recognition. Trends Cogn. Sci. 11, 333–341 (2007).

  3. 3.

    & Distributed hierarchical processing in the primate cerebral cortex. Cereb. Cortex 1, 1–47 (1991).

  4. 4.

    Inferotemporal cortex and object vision. Annu. Rev. Neurosci. 19, 109–139 (1996).

  5. 5.

    & Visual object recognition. Annu. Rev. Neurosci. 19, 577–621 (1996).

  6. 6.

    & Activity of inferior temporal neurons during orientation discrimination with successively presented gratings. J. Neurophysiol. 71, 1428–1451 (1994).

  7. 7.

    et al. Do we know what the early visual system does? J. Neurosci. 25, 10577–10597 (2005).

  8. 8.

    , , & Simple learned weighted sums of inferior temporal neuronal firing rates accurately predict human core object recognition performance. J. Neurosci. 35, 13402–13418 (2015).

  9. 9.

    , , & Fast readout of object identity from macaque inferior temporal cortex. Science 310, 863–866 (2005).

  10. 10.

    & Selectivity and tolerance (“invariance”) both increase as visual information propagates from cortical area V4 to IT. J. Neurosci. 30, 12978–12995 (2010).

  11. 11.

    , & Spatial summation in the receptive fields of simple cells in the cat's striate cortex. J. Physiol. (Lond.) 283, 53–77 (1978).

  12. 12.

    The representation of shape in the temporal lobe. Behav. Brain Res. 76, 99–116 (1996).

  13. 13.

    , , & Size and position invariance of neuronal responses in monkey inferotemporal cortex. J. Neurophysiol. 73, 218–226 (1995).

  14. 14.

    & Separate visual pathways for perception and action. Trends Neurosci. 15, 20–25 (1992).

  15. 15.

    & 'What' and 'where' in the human brain. Curr. Opin. Neurobiol. 4, 157–165 (1994).

  16. 16.

    , & Spatial coding of position and orientation in primary visual cortex. Nat. Neurosci. 5, 874–882 (2002).

  17. 17.

    , & Coding of border ownership in monkey visual cortex. J. Neurosci. 20, 6594–6611 (2000).

  18. 18.

    , , & What response properties do individual neurons need to underlie position and clutter “invariant” object recognition? J. Neurophysiol. 102, 360–376 (2009).

  19. 19.

    & Anterior inferotemporal neurons of monkeys engaged in object recognition can be highly sensitive to object retinal position. J. Neurophysiol. 89, 3264–3278 (2003).

  20. 20.

    , & Shape representation in the inferior temporal cortex of monkeys. Curr. Biol. 5, 552–563 (1995).

  21. 21.

    & Joint neuronal tuning for object form and position in the human lateral occipital complex. Neuroimage 63, 1901–1908 (2012).

  22. 22.

    , , & Perceptual gloss parameters are encoded by population responses in the monkey inferior temporal cortex. J. Neurosci. 34, 11143–11151 (2014).

  23. 23.

    & Relating retinotopic and object-selective responses in human lateral occipital cortex. J. Neurophysiol. 100, 249–267 (2008).

  24. 24.

    , & Recovering stimulus locations using populations of eye-position modulated neurons in dorsal and ventral visual streams of non-human primates. Front. Integr. Neurosci. 8, 28 (2014).

  25. 25.

    & Towards structural systematicity in distributed, statically bound visual representations. Cogn. Sci. 27, 73–109 (2003).

  26. 26.

    & Discrimination thresholds for channel-coded systems. Biol. Cybern. 66, 543–551 (1992).

  27. 27.

    , & Distributed representations. in Parallel Distributed Processing, Vol 1 (eds. Rumelhart, D. & McClelland, J.) 77–109 (MIT Press, 1986).

  28. 28.

    & Coarse coding: calculation of the resolution achieved by a population of large receptive field neurons. Biol. Cybern. 76, 357–363 (1997).

  29. 29.

    The binding problem. Curr. Opin. Neurobiol. 6, 171–178 (1996).

  30. 30.

    et al. Performance-optimized hierarchical models predict neural responses in higher visual cortex. Proc. Natl. Acad. Sci. USA 111, 8619–8624 (2014).

  31. 31.

    & Deep supervised, but not unsupervised, models may explain it cortical representation. PLoS Comput. Biol. 10, e1003915 (2014).

  32. 32.

    , & Why is real-world visual object recognition hard? PLoS Comput. Biol. 4, e27 (2008).

  33. 33.

    , & Comparison of object recognition behavior in human and monkey. J. Neurosci. 35, 12127–12136 (2015).

  34. 34.

    & Mechanisms of face perception. Annu. Rev. Neurosci. 31, 411–437 (2008).

  35. 35.

    & Convolutional networks for images, speech, and time series. in The Handbook of Brain Theory and Neural Networks (ed. Arbib, M.A.) 255–258 (MIT Press, 1995).

  36. 36.

    et al. ImageNet: a large-scale hierarchical image database. Proc. IEEE Conf. Comput. Vis. Pattern Recognit. 248–255 (2009).

  37. 37.

    , & Optimal decoding of correlated neural population responses in the primate visual cortex. Nat. Neurosci. 9, 1412–1420 (2006).

  38. 38.

    , & Object vision and spatial vision: two cortical pathways. Trends Neurosci. 6, 414–417 (1983).

  39. 39.

    , , & Trade-off between object selectivity and tolerance in monkey inferotemporal cortex. J. Neurosci. 27, 12292–12307 (2007).

  40. 40.

    et al. A quantitative theory of immediate visual recognition. Prog. Brain Res. 165, 33–56 (2007).

  41. 41.

    & Decision-related activity in sensory neurons may depend on the columnar architecture of cerebral cortex. J. Neurosci. 34, 3579–3585 (2014).

  42. 42.

    , , & What and where: a Bayesian inference theory of attention. Vision Res. 50, 2233–2247 (2010).

  43. 43.

    A model for visual shape recognition. Psychol. Rev. 81, 521–535 (1974).

  44. 44.

    , , & Efficient analysis-by-synthesis in vision: a computational framework, behavioral tests, and modeling neuronal representations. Proc. Annu. Conf. Cogn. Sci. Soc. 471 (2015).

  45. 45.

    , & Object perception as Bayesian inference. Annu. Rev. Psychol. 55, 271–304 (2004).

  46. 46.

    Columns for complex visual object features in the inferotemporal cortex: clustering of cells with similar but slightly different stimulus selectivities. Cereb. Cortex 13, 90–99 (2003).

  47. 47.

    , & Peripheral vision for perception and action. Exp. Brain Res. 165, 97–106 (2005).

  48. 48.

    & Population coding of visual space: comparison of spatial representations in dorsal and ventral pathways. Front. Comput. Neurosci. 4, 159 (2011).

  49. 49.

    , & Independent category and spatial encoding in parietal cortex. Neuron 77, 969–979 (2013).

  50. 50.

    & Preferential encoding of visual categories in parietal cortex compared with prefrontal cortex. Nat. Neurosci. 15, 315–320 (2012).

  51. 51.

    , , & Basic objects in natural categories. Cognit. Psychol. 8, 382–439 (1976).

  52. 52.

    & Inferotemporal representations underlying object recognition in the free viewing monkey. Soc. Neurosci. Abstr. 498.2 (2000).

  53. 53.

    & Clustering by passing messages between data points. Science 315, 972–976 (2007).

  54. 54.

    , & Unsupervised spike detection and sorting with wavelets and superparamagnetic clustering. Neural Comput. 16, 1661–1687 (2004).

  55. 55.

    et al. Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011).

  56. 56.

    , , & How MT cells analyze the motion of visual patterns. Nat. Neurosci. 9, 1421–1431 (2006).

  57. 57.

    et al. SciPy: open source scientific tools for Python (2001–) (15 July 2015).

  58. 58.

    & An Introduction to the Bootstrap (CRC Press, 1994).

  59. 59.

    , & The fusiform face area: a module in human extrastriate cortex specialized for face perception. J. Neurosci. 17, 4302–4311 (1997).

  60. 60.

    , & ImageNet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 25 1106–1114 (2012).

  61. 61.

    , & Making a science of model search: hyperparameter optimization in hundreds of dimensions for vision architectures. Proc. Int. Conf. Mach. Learn. 115–123 (2013).

Download references

Acknowledgements

We are grateful to K. Schmidt and C. Stawarz for key technical support, and to D. Ardila, R. Rajalingham, J. Tenenbaum, S. Gershman, C. Jennings and J. Fan for useful suggestions. The infrastructure needed for this work was supported by DARPA (Neovision2) and the NSF (IIS-0964269). The bulk of the work presented in this manuscript was supported by the US National Institutes of Health (NEI-R01 EY014970), with partial support from the Simons Center for the Global Brain and the Office of Naval Research (MURI). H.H. was supported by a fellowship from the Samsung Scholarship. We thank NVIDIA for a grant of GPU hardware and Amazon for an education grant supporting computational and psychophysical work. Additional computational infrastructure support was provided by the McGovern Institute for Brain Research (OpenMind).

Author information

Author notes

    • Najib J Majaj

    Present address: Center for Neural Science, New York University, New York, New York, USA.

    • Ha Hong
    •  & Daniel L K Yamins

    These authors contributed equally to this work.

Affiliations

  1. Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA.

    • Ha Hong
    • , Daniel L K Yamins
    • , Najib J Majaj
    •  & James J DiCarlo
  2. McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA.

    • Ha Hong
    • , Daniel L K Yamins
    • , Najib J Majaj
    •  & James J DiCarlo
  3. Harvard–Massachusetts Institute of Technology Division of Health Sciences and Technology, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA.

    • Ha Hong

Authors

  1. Search for Ha Hong in:

  2. Search for Daniel L K Yamins in:

  3. Search for Najib J Majaj in:

  4. Search for James J DiCarlo in:

Contributions

H.H., N.J.M. and J.J.D. designed the neurophysiological experiments. H.H. and N.J.M. performed the neurophysiology experiments. D.L.K.Y., H.H. and J.J.D. designed the human psychophysical experiments. D.L.K.Y. performed the human psychophysical experiments. D.L.K.Y. and H.H. performed data analysis. D.L.K.Y. and H.H. performed computational modeling. D.L.K.Y., J.J.D., H.H. and N.J.M. wrote the paper.

Competing interests

The authors declare no competing financial interests.

Corresponding author

Correspondence to James J DiCarlo.

Integrated supplementary information

Supplementary information

PDF files

  1. 1.

    Supplementary Text and Figures

    Supplementary Figures 1–14 and Supplementary Table 1

  2. 2.

    Supplementary Methods Checklist

About this article

Publication history

Received

Accepted

Published

DOI

https://doi.org/10.1038/nn.4247

Further reading