Visual processing in cortex is classically modeled as a hierarchy of increasingly sophisticated representations, naturally extending the model of simple to complex cells of Hubel and Wiesel. Surprisingly, little quantitative modeling has been done to explore the biological feasibility of this class of models to explain aspects of higher-level visual processing such as object recognition. We describe a new hierarchical model consistent with physiological data from inferotemporal cortex that accounts for this complex visual task and makes testable predictions. The model is based on a MAX-like operation applied to inputs to certain cortical neurons that may have a general role in cortical function.
This is a preview of subscription content, access via your institution
Open Access articles citing this article.
Nature Communications Open Access 18 October 2022
Scientific Reports Open Access 25 April 2022
Nature Communications Open Access 26 January 2022
Subscribe to Journal
Get full journal access for 1 year
only $6.58 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Tax calculation will be finalised during checkout.
Get time limited or full article access on ReadCube.
All prices are NET prices.
Thorpe, S. Fize, D. & Marlot., C. Speed of processing in the human visual system. Nature 381, 520–522 (1996).
Bruce, C., Desimone, R. & Gross, C. Visual properties of neurons in a polysensory area in the superior temporal sulcus of the macaque. J. Neurophysiol. 46, 369–384 (1981).
Ungerleider, L. & Haxby, J. 'What' and 'where' in the human brain. Curr. Opin. Neurobiol. 4, 157–165 (1994).
Hubel, D. & Wiesel, T. Receptive fields, binocular interaction and functional architecture in the cat's visual cortex. J. Physiol. (Lond.) 160, 106–154 (1962).
Hubel, D. & Wiesel, T. Receptive fields and functional architecture in two nonstriate visual areas (18 and 19) of the cat. J. Neurophysiol. 28, 229–289 (1965).
Fukushima, K. Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biol. Cybern. 36, 193–202 (1980).
Perrett, D. & Oram, M. Neurophysiology of shape processing. Imaging Vis. Comput. 11, 317–333 (1993).
Wallis, G. & Rolls, E. A model of invariant object recognition in the visual system. Prog. Neurobiol. 51, 167–194 (1997).
Anderson, C. & van Essen, D. Shifter circuits: a computational strategy for dynamic aspects of visual processing. Proc. Nat. Acad. Sci. USA 84, 6297–6301 (1987).
Olshausen, B., Anderson, C. & van Essen, D. A neurobiological model of visual attention and invariant pattern recognition based on dynamic routing of information. J. Neurosci. 13, 4700–4719 (1993).
Salinas, E. & Abbot, L. Invariant visual responses from attentional gain fields. J. Neurophysiol. 77, 3267–3272 (1997).
Riesenhuber, M. & Dayan, P. in Advances in Neural Information Processing Systems Vol. 9 (eds. Mozer, M, Jordan, M. & Petsche, T.) 17–23 (MIT Press, Cambridge, Massachusetts, 1997).
Moran, J. & Desimone, R. Selective attention gates visual processing in the extrastriate cortex. Science 229, 782–784 (1985).
Connor, C., Preddie, D., Gallant, J. & van Essen, D. Spatial attention effects in macaque area V4. J. Neurosci. 17, 3201–3214 (1997).
Poggio, T. & Edelman, S. A network that learns to recognize 3D objects. Nature 343, 263–266 (1990).
Bülthoff, H. & Edelman, S. Psychophysical support for a two-dimensional view interpolation theory of object recognition. Proc. Natl. Acad. Sci. USA 89, 60–64 (1992).
Logothetis, N., Pauls, J., Bülthoff, H. & Poggio, T. Shape representation in the inferior temporal cortex of monkeys. Curr. Biol. 4, 401–414 (1994).
Tarr, M. Rotating objects to recognize them: A case study on the role of viewpoint dependency in the recognition of three-dimensional objects. Psychonom. Bull. Rev. 2, 55–82 (1995).
Booth, M. and Rolls, E. View-invariant representations of familiar objects by neurons in the inferior temporal visual cortex. Cereb. Cortex 8, 510–523 (1998).
Kobatake, E., Wang, G. & Tanaka, K. Effects of shape-discrimination training on the selectivity of inferotemporal cells in adult monkeys. J. Neurophysiol. 80, 324–330 (1998).
Logothetis, N., Pauls, J. & Poggio, T. Shape representation in the inferior temporal cortex of monkeys. Curr. Biol. 5, 552–563 (1995).
Perrett, D. et al. Viewer-centred and object-centred coding of heads in the macaque temporal cortex. Exp. Brain Res. 86, 159–173 (1991).
Missal, M., Vogels, R. & Orban, G. Responses of macaque inferior temporal neurons to overlapping shapes. Cereb. Cortex 7, 758–767 (1997).
Sato, T. Interactions of visual stimuli in the receptive fields of inferior temporal neurons in awake monkeys. Exp. Brain Res. 77, 23–30 (1989).
Riesenhuber, M. & Poggio, T. in Advances in Neural Information Processing Systems Vol. 10 (eds. Jordan, M., Kearns, M. & Solla, S.) 215–221 (MIT Press, Cambridge, Massachusetts, 1998).
Wang, G., Tanifuji, M. & Tanaka, K. Functional architecture in monkey inferotemporal cortex revealed by in vivo optical imaging. Neurosci. Res. 32, 33–46 (1998).
Logothetis, N. Object vision and visual awareness. Curr. Opin. Neurobiol. 8, 536–544 (1998).
Riesenhuber, M & Poggio, T. Are cortical models really bound by the "binding problem"? Neuron 24, 87–93 (1999).
Rolls, E. & Tovee, M. The responses of single neurons in the temporal visual cortical areas of the macaque when more than one stimulus is present in the receptive field. Exp. Brain Res. 103, 409–420 (1995).
Vogels, R. Categorization of complex visual images by rhesus monkeys. Part 2: single-cell study. Eur. J. Neurosci. 11, 1239–1255 (1999).
Rowley, H., Baluja, S. & Kanade, T. Neural network-based face detection. IEEE PAMI 20, 23–38 (1998).
Sung, K. & Poggio, T. Example-based learning for view-based human face detection. IEEE PAMI 20, 39–51 (1998).
Koch, C. & Ullman, S. Shifts in selective visual attention: towards the underlying neural circuitry. Hum. Neurobiol. 4, 219–227 (1985).
Abbot, L., Varela, J., Sen, K. & Nelson, S. Synaptic depression and cortical gain control. Science 275, 220–224 (1997).
Grossberg, S. Nonlinear neural networks: Principles, mechanisms, and architectures. Neural Net. 1, 17–61 (1988).
Chance, F., Nelson, S. & Abbott, L. Complex cells as cortically amplified simple cells. Nat. Neurosci. 2, 277–282 (1999).
Douglas, R., Koch, C. Mahowald, M., Martin, K. & Suarez, H. Recurrent excitation in neocortical circuits. Science 269, 981–985 (1995).
Reichardt, W., Poggio, T. & Hausen, K. Figure–ground discrimination by relative movement it the visual system of the fly – II: towards the neural circuitry. Biol. Cybern. 46, 1–30 (1983).
Lee, D., Itti, L., Koch, C. & Braun, J. Attention activates winner-take-all competition among visual filters. Nat. Neurosci. 2, 375–381 (1999).
Heeger, D. Normalization of cell responses in cat striate cortex. Vis. Neurosci. 9, 181–197 (1992).
Nowlan, S. & Sejnowski, T. A selection model for motion processing in area MT of primates. J. Neurosci. 15, 1195–1214 (1995).
Mumford, D. On the computational architecture of the neocortex. II. The role of cortico-cortical loops. Biol. Cybern. 66, 241–251 (1992).
Rao, R. & Ballard, D. Predictive coding in the visual cortex: a functional interpretation of some extra-classical receptive-field effects. Nat. Neurosci. 2, 79–87 (1999).
Reynolds, J., Chelazzi, L. & Desimone, R. Competitive mechanisms subserve attention in macaque areas V2 and V4. J. Neurosci. 19, 1736–1753 (1999).
Mel, B. SEEMORE: combining color, shape, and texture histogramming in a neurally inspired approach to visual object recognition. Neural Comput. 9, 777–804 (1997).
Kobatake, E. & Tanaka, K. Neuronal selectivities to complex object features in the ventral visual pathway of the macaque cerebral cortex. J. Neurophysiol. 71, 856–867 (1994).
Földiák, P. Learning invariance from transformation sequences. Neural Comput. 3, 194–200 (1991).
The paper shows that the invariance ranges of view-tuned cells in our model were in good agreement with quantitative experiments using paperclip stimuli. Here we briefly describe preliminary simulations that demonstrate that this performance is not specific to paperclip objects, but rather, also transfers to more natural object classes containing many members of similar shape, such as the class of cars.
To be able to finely control stimulus similarities, we used automatic, three-dimensional, multidimensional morphing software developed by Christian Shelton in our lab1, which allowed us to create a large set of 'intermediate' objects, made by blending characteristics of the different prototype objects spanning the class. This was done by specifying how much of each prototype the object to be created should contain, naturally defining a vector space over the prototype objects. Figure 1 shows several car prototypes and an intermediate car generated as a random combination of these prototypes.
(GIF 9.14 KB)
3D morphing system used to investigate the recognition of real-world object classes. In the example shown, different car prototypes were combined to generate the morphed car (top).
The model of view-tuned cells described in the paper, where each neuron was tightly tuned to a view of a single object, can be extended to object classes with a continuum of members in a straightforward fashion, as model neurons already show tuning in shape space--a group of neurons, each broadly tuned to a different representative, can thus code for the identity of a randomly presented member of the class through the distributed pattern of activation.
In preliminary simulations, we investigated recognition for a representation (in a 2AFC paradigm) based on ten VTUs tuned to randomly chosen cars, with the rest of the network, that is, from the S1 to the C2 layer exactly identical to the 'many feature' version of our model described in the paper. In each trial, the sample stimulus was a standard view (225ƒ, see Fig. 1) of an object picked from a prototype-connecting line in morph space. The match stimulus showed the same object from a rotated viewpoint, whereas the distractor stimulus showed an object chosen a variable distance away on the same prototype-connecting line in morph space, from the same viewpoint as the match object. The network was said to correctly recognize the rotated view of the sample object if the activity pattern over the view-tuned units caused by the match object was more similar (under a Euclidean metric) to the activity pattern associated with the sample object than the activity pattern associated with the distractor object.
Figure 2 shows the recognition performance of the model as a function of distractor similarity and viewpoint difference to sample viewpoint, demonstrating two points. First, recognition rate drops off with increasing rotation from the sample view of the match/non-match objects. Second, correct recognition curves tighten with increasing similarity of distractors to the match object. (Both these points seem qualitatively robust for different choices of the number of neurons involved in the group representation and of the broadness of VTU tuning.)
Note that the invariance ranges for the class of car stimuli is comparable to the results obtained with the paperclip stimuli in psychophysics and modeling2-4, demonstrating the generality of the model over different object classes and representations.
(GIF 25.6 KB)
Average recognition performance in the model for objects chosen from a continuous object class (2AFC paradigm). The two plots show two different views of the same figure. Performance is shown on the z axis, x axis shows viewpoint of match and non-match objects (sample object was always shown in the 225º view). Euclidean distance of sample/match and non-match objects in morph space is shown on the y axis (one unit = , 20 units equal the distance between two prototypes in morph space). The plane shows chance performance for comparison.
References in supplementary information
Shelton, C. Three-dimensional Correspondence. Master's thesis, MIT (1996).
Poggio, T. & Edelman, S. A network that learns to recognize 3D objects. Nature 343, 263-266 (1990).
Bülthoff, H. & Edelman, S. Psychophysical support for a two-dimensional view interpolation theory of object recognition. Proc. Natl. Acad. Sci. USA 89, 60-64 (1992).
Logothetis, N., Pauls, J., Bülthoff, H. & Poggio, T. T. Shape representation in the inferior temporal cortex of monkeys. Curr. Biol. 4, 401-414 (1994).
About this article
Cite this article
Riesenhuber, M., Poggio, T. Hierarchical models of object recognition in cortex. Nat Neurosci 2, 1019–1025 (1999). https://doi.org/10.1038/14819
This article is cited by
Scientific Reports (2022)
Nature Communications (2022)
Nature Communications (2022)
Gradient-based learning drives robust representations in recurrent neural networks by balancing compression and expansion
Nature Machine Intelligence (2022)
Artificial Intelligence Review (2022)