Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Hierarchical models of object recognition in cortex

Abstract

Visual processing in cortex is classically modeled as a hierarchy of increasingly sophisticated representations, naturally extending the model of simple to complex cells of Hubel and Wiesel. Surprisingly, little quantitative modeling has been done to explore the biological feasibility of this class of models to explain aspects of higher-level visual processing such as object recognition. We describe a new hierarchical model consistent with physiological data from inferotemporal cortex that accounts for this complex visual task and makes testable predictions. The model is based on a MAX-like operation applied to inputs to certain cortical neurons that may have a general role in cortical function.

This is a preview of subscription content, access via your institution

Relevant articles

Open Access articles citing this article.

Access options

Buy article

Get time limited or full article access on ReadCube.

$32.00

All prices are NET prices.

Figure 1: Invariance properties of one neuron (modified from Logothetis et al.21).
Figure 2: Sketch of the model.
Figure 3: Highly nonlinear shape-tuning properties of the MAX mechanism.
Figure 4: Responses of a sample model neuron to different transformations of its preferred stimulus.
Figure 5: Average neuronal responses of neurons to scrambled stimuli in the many-feature version of the model.

References

  1. Thorpe, S. Fize, D. & Marlot., C. Speed of processing in the human visual system. Nature 381, 520–522 (1996).

    Article  CAS  PubMed  Google Scholar 

  2. Bruce, C., Desimone, R. & Gross, C. Visual properties of neurons in a polysensory area in the superior temporal sulcus of the macaque. J. Neurophysiol. 46, 369–384 (1981).

    Article  CAS  PubMed  Google Scholar 

  3. Ungerleider, L. & Haxby, J. 'What' and 'where' in the human brain. Curr. Opin. Neurobiol. 4, 157–165 (1994).

    Article  CAS  PubMed  Google Scholar 

  4. Hubel, D. & Wiesel, T. Receptive fields, binocular interaction and functional architecture in the cat's visual cortex. J. Physiol. (Lond.) 160, 106–154 (1962).

    Article  CAS  Google Scholar 

  5. Hubel, D. & Wiesel, T. Receptive fields and functional architecture in two nonstriate visual areas (18 and 19) of the cat. J. Neurophysiol. 28, 229–289 (1965).

    Article  CAS  PubMed  Google Scholar 

  6. Fukushima, K. Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biol. Cybern. 36, 193–202 (1980).

    Article  CAS  PubMed  Google Scholar 

  7. Perrett, D. & Oram, M. Neurophysiology of shape processing. Imaging Vis. Comput. 11, 317–333 (1993).

    Article  Google Scholar 

  8. Wallis, G. & Rolls, E. A model of invariant object recognition in the visual system. Prog. Neurobiol. 51, 167–194 (1997).

    Article  CAS  PubMed  Google Scholar 

  9. Anderson, C. & van Essen, D. Shifter circuits: a computational strategy for dynamic aspects of visual processing. Proc. Nat. Acad. Sci. USA 84, 6297–6301 (1987).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Olshausen, B., Anderson, C. & van Essen, D. A neurobiological model of visual attention and invariant pattern recognition based on dynamic routing of information. J. Neurosci. 13, 4700–4719 (1993).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Salinas, E. & Abbot, L. Invariant visual responses from attentional gain fields. J. Neurophysiol. 77, 3267–3272 (1997).

    Article  CAS  PubMed  Google Scholar 

  12. Riesenhuber, M. & Dayan, P. in Advances in Neural Information Processing Systems Vol. 9 (eds. Mozer, M, Jordan, M. & Petsche, T.) 17–23 (MIT Press, Cambridge, Massachusetts, 1997).

    Google Scholar 

  13. Moran, J. & Desimone, R. Selective attention gates visual processing in the extrastriate cortex. Science 229, 782–784 (1985).

    Article  CAS  PubMed  Google Scholar 

  14. Connor, C., Preddie, D., Gallant, J. & van Essen, D. Spatial attention effects in macaque area V4. J. Neurosci. 17, 3201–3214 (1997).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Poggio, T. & Edelman, S. A network that learns to recognize 3D objects. Nature 343, 263–266 (1990).

    Article  CAS  PubMed  Google Scholar 

  16. Bülthoff, H. & Edelman, S. Psychophysical support for a two-dimensional view interpolation theory of object recognition. Proc. Natl. Acad. Sci. USA 89, 60–64 (1992).

    Article  PubMed  PubMed Central  Google Scholar 

  17. Logothetis, N., Pauls, J., Bülthoff, H. & Poggio, T. Shape representation in the inferior temporal cortex of monkeys. Curr. Biol. 4, 401–414 (1994).

    Article  CAS  PubMed  Google Scholar 

  18. Tarr, M. Rotating objects to recognize them: A case study on the role of viewpoint dependency in the recognition of three-dimensional objects. Psychonom. Bull. Rev. 2, 55–82 (1995).

    Article  CAS  Google Scholar 

  19. Booth, M. and Rolls, E. View-invariant representations of familiar objects by neurons in the inferior temporal visual cortex. Cereb. Cortex 8, 510–523 (1998).

    Article  CAS  PubMed  Google Scholar 

  20. Kobatake, E., Wang, G. & Tanaka, K. Effects of shape-discrimination training on the selectivity of inferotemporal cells in adult monkeys. J. Neurophysiol. 80, 324–330 (1998).

    Article  CAS  PubMed  Google Scholar 

  21. Logothetis, N., Pauls, J. & Poggio, T. Shape representation in the inferior temporal cortex of monkeys. Curr. Biol. 5, 552–563 (1995).

    Article  CAS  PubMed  Google Scholar 

  22. Perrett, D. et al. Viewer-centred and object-centred coding of heads in the macaque temporal cortex. Exp. Brain Res. 86, 159–173 (1991).

    Article  CAS  PubMed  Google Scholar 

  23. Missal, M., Vogels, R. & Orban, G. Responses of macaque inferior temporal neurons to overlapping shapes. Cereb. Cortex 7, 758–767 (1997).

    Article  CAS  PubMed  Google Scholar 

  24. Sato, T. Interactions of visual stimuli in the receptive fields of inferior temporal neurons in awake monkeys. Exp. Brain Res. 77, 23–30 (1989).

    Article  CAS  PubMed  Google Scholar 

  25. Riesenhuber, M. & Poggio, T. in Advances in Neural Information Processing Systems Vol. 10 (eds. Jordan, M., Kearns, M. & Solla, S.) 215–221 (MIT Press, Cambridge, Massachusetts, 1998).

    Google Scholar 

  26. Wang, G., Tanifuji, M. & Tanaka, K. Functional architecture in monkey inferotemporal cortex revealed by in vivo optical imaging. Neurosci. Res. 32, 33–46 (1998).

    Article  CAS  PubMed  Google Scholar 

  27. Logothetis, N. Object vision and visual awareness. Curr. Opin. Neurobiol. 8, 536–544 (1998).

    Article  CAS  PubMed  Google Scholar 

  28. Riesenhuber, M & Poggio, T. Are cortical models really bound by the "binding problem"? Neuron 24, 87–93 (1999).

    Article  CAS  PubMed  Google Scholar 

  29. Rolls, E. & Tovee, M. The responses of single neurons in the temporal visual cortical areas of the macaque when more than one stimulus is present in the receptive field. Exp. Brain Res. 103, 409–420 (1995).

    Article  CAS  PubMed  Google Scholar 

  30. Vogels, R. Categorization of complex visual images by rhesus monkeys. Part 2: single-cell study. Eur. J. Neurosci. 11, 1239–1255 (1999).

    Article  CAS  PubMed  Google Scholar 

  31. Rowley, H., Baluja, S. & Kanade, T. Neural network-based face detection. IEEE PAMI 20, 23–38 (1998).

    Article  Google Scholar 

  32. Sung, K. & Poggio, T. Example-based learning for view-based human face detection. IEEE PAMI 20, 39–51 (1998).

    Article  Google Scholar 

  33. Koch, C. & Ullman, S. Shifts in selective visual attention: towards the underlying neural circuitry. Hum. Neurobiol. 4, 219–227 (1985).

    CAS  PubMed  Google Scholar 

  34. Abbot, L., Varela, J., Sen, K. & Nelson, S. Synaptic depression and cortical gain control. Science 275, 220–224 (1997).

    Google Scholar 

  35. Grossberg, S. Nonlinear neural networks: Principles, mechanisms, and architectures. Neural Net. 1, 17–61 (1988).

    Article  Google Scholar 

  36. Chance, F., Nelson, S. & Abbott, L. Complex cells as cortically amplified simple cells. Nat. Neurosci. 2, 277–282 (1999).

    Article  CAS  PubMed  Google Scholar 

  37. Douglas, R., Koch, C. Mahowald, M., Martin, K. & Suarez, H. Recurrent excitation in neocortical circuits. Science 269, 981–985 (1995).

    Article  CAS  PubMed  Google Scholar 

  38. Reichardt, W., Poggio, T. & Hausen, K. Figure–ground discrimination by relative movement it the visual system of the fly – II: towards the neural circuitry. Biol. Cybern. 46, 1–30 (1983).

    Article  Google Scholar 

  39. Lee, D., Itti, L., Koch, C. & Braun, J. Attention activates winner-take-all competition among visual filters. Nat. Neurosci. 2, 375–381 (1999).

    Article  CAS  PubMed  Google Scholar 

  40. Heeger, D. Normalization of cell responses in cat striate cortex. Vis. Neurosci. 9, 181–197 (1992).

    Article  CAS  PubMed  Google Scholar 

  41. Nowlan, S. & Sejnowski, T. A selection model for motion processing in area MT of primates. J. Neurosci. 15, 1195–1214 (1995).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  42. Mumford, D. On the computational architecture of the neocortex. II. The role of cortico-cortical loops. Biol. Cybern. 66, 241–251 (1992).

    Article  CAS  PubMed  Google Scholar 

  43. Rao, R. & Ballard, D. Predictive coding in the visual cortex: a functional interpretation of some extra-classical receptive-field effects. Nat. Neurosci. 2, 79–87 (1999).

    Article  CAS  PubMed  Google Scholar 

  44. Reynolds, J., Chelazzi, L. & Desimone, R. Competitive mechanisms subserve attention in macaque areas V2 and V4. J. Neurosci. 19, 1736–1753 (1999).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  45. Mel, B. SEEMORE: combining color, shape, and texture histogramming in a neurally inspired approach to visual object recognition. Neural Comput. 9, 777–804 (1997).

    Article  CAS  PubMed  Google Scholar 

  46. Kobatake, E. & Tanaka, K. Neuronal selectivities to complex object features in the ventral visual pathway of the macaque cerebral cortex. J. Neurophysiol. 71, 856–867 (1994).

    Article  CAS  PubMed  Google Scholar 

  47. Földiák, P. Learning invariance from transformation sequences. Neural Comput. 3, 194–200 (1991).

    Article  PubMed  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tomaso Poggio.

Supplementary information

The paper shows that the invariance ranges of view-tuned cells in our model were in good agreement with quantitative experiments using paperclip stimuli. Here we briefly describe preliminary simulations that demonstrate that this performance is not specific to paperclip objects, but rather, also transfers to more natural object classes containing many members of similar shape, such as the class of cars.

To be able to finely control stimulus similarities, we used automatic, three-dimensional, multidimensional morphing software developed by Christian Shelton in our lab1, which allowed us to create a large set of 'intermediate' objects, made by blending characteristics of the different prototype objects spanning the class. This was done by specifying how much of each prototype the object to be created should contain, naturally defining a vector space over the prototype objects. Figure 1 shows several car prototypes and an intermediate car generated as a random combination of these prototypes.

Figure 1

(GIF 9.14 KB)

3D morphing system used to investigate the recognition of real-world object classes. In the example shown, different car prototypes were combined to generate the morphed car (top).

The model of view-tuned cells described in the paper, where each neuron was tightly tuned to a view of a single object, can be extended to object classes with a continuum of members in a straightforward fashion, as model neurons already show tuning in shape space--a group of neurons, each broadly tuned to a different representative, can thus code for the identity of a randomly presented member of the class through the distributed pattern of activation.

In preliminary simulations, we investigated recognition for a representation (in a 2AFC paradigm) based on ten VTUs tuned to randomly chosen cars, with the rest of the network, that is, from the S1 to the C2 layer exactly identical to the 'many feature' version of our model described in the paper. In each trial, the sample stimulus was a standard view (225ƒ, see Fig. 1) of an object picked from a prototype-connecting line in morph space. The match stimulus showed the same object from a rotated viewpoint, whereas the distractor stimulus showed an object chosen a variable distance away on the same prototype-connecting line in morph space, from the same viewpoint as the match object. The network was said to correctly recognize the rotated view of the sample object if the activity pattern over the view-tuned units caused by the match object was more similar (under a Euclidean metric) to the activity pattern associated with the sample object than the activity pattern associated with the distractor object.

Figure 2 shows the recognition performance of the model as a function of distractor similarity and viewpoint difference to sample viewpoint, demonstrating two points. First, recognition rate drops off with increasing rotation from the sample view of the match/non-match objects. Second, correct recognition curves tighten with increasing similarity of distractors to the match object. (Both these points seem qualitatively robust for different choices of the number of neurons involved in the group representation and of the broadness of VTU tuning.)

Note that the invariance ranges for the class of car stimuli is comparable to the results obtained with the paperclip stimuli in psychophysics and modeling2-4, demonstrating the generality of the model over different object classes and representations.

Figure 2

(GIF 25.6 KB)

Average recognition performance in the model for objects chosen from a continuous object class (2AFC paradigm). The two plots show two different views of the same figure. Performance is shown on the z axis, x axis shows viewpoint of match and non-match objects (sample object was always shown in the 225º view). Euclidean distance of sample/match and non-match objects in morph space is shown on the y axis (one unit = , 20 units equal the distance between two prototypes in morph space). The plane shows chance performance for comparison.

References in supplementary information

  1. 1

    Shelton, C. Three-dimensional Correspondence. Master's thesis, MIT (1996).

  2. 2

    Poggio, T. & Edelman, S. A network that learns to recognize 3D objects. Nature 343, 263-266 (1990).

  3. 3

    Bülthoff, H. & Edelman, S. Psychophysical support for a two-dimensional view interpolation theory of object recognition. Proc. Natl. Acad. Sci. USA 89, 60-64 (1992).

  4. 4

    Logothetis, N., Pauls, J., Bülthoff, H. & Poggio, T. T. Shape representation in the inferior temporal cortex of monkeys. Curr. Biol. 4, 401-414 (1994).

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Riesenhuber, M., Poggio, T. Hierarchical models of object recognition in cortex. Nat Neurosci 2, 1019–1025 (1999). https://doi.org/10.1038/14819

Download citation

  • Received:

  • Accepted:

  • Issue Date:

  • DOI: https://doi.org/10.1038/14819

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing