Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Review Article
  • Published:

Models of object recognition

Abstract

Understanding how biological visual systems recognize objects is one of the ultimate goals in computational neuroscience. From the computational viewpoint of learning, different recognition tasks, such as categorization and identification, are similar, representing different trade-offs between specificity and invariance. Thus, the different tasks do not require different classes of models. We briefly review some recent trends in computational vision and then focus on feedforward, view-based models that are supported by psychophysical and physiological data.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Learning module schematics.
Figure 2
Figure 3: A class of models of object recognition.

Similar content being viewed by others

References

  1. Rosch, E., Mervis, C., Gray, W., Johnson, D. & Boyes-Braem, P. Basic objects in natural categories. Cogn. Psychol. 8, 382–439 ( 1976).

    Article  Google Scholar 

  2. Logothetis, N. & Sheinberg, D. Visual object recognition. Annu. Rev. Neurosci. 19, 577–621 (1996).

    Article  CAS  Google Scholar 

  3. Ullman, S. High-Level Vision: Object Recognition and Visual Cognition (MIT Press, Cambridge, Massachusetts, 1996).

    Book  Google Scholar 

  4. Edelman, S. Representation and Recognition in Vision (MIT Press, Cambridge, Massachusetts, 1999).

    Book  Google Scholar 

  5. Poggio, T. & Edelman, S. A network that learns to recognize 3D objects. Nature 343, 263– 266 (1990).

    Article  CAS  Google Scholar 

  6. Brunelli, R. & Poggio, T. Face recognition: Features versus templates. IEEE PAMI 15, 1042– 1052 (1993).

    Article  Google Scholar 

  7. Yang, M.-H., Roth, D. & Ahuja, N. A. in Advances in Neural Information Processing Systems Vol. 12 (eds. Solla, S.A., Leen, T.K. & Müller, K.-K.) 855–861 (MIT Press, Cambridge, Massachusetts, 1999).

    Google Scholar 

  8. Schneiderman, H. & Kanade, T. in Proc. IEEE Conference on Computer Vision and Pattern Recognition 45– 51 (IEEE, Los Alamitos, California, 1998).

    Google Scholar 

  9. Oren, M. Papageorgiou, C., Sinha, P., Osuna, E. & Poggio, T. in IEEE Conference on Computer Vision and Pattern Recognition 193–199 (IEEE, Los Alamitos, CA, 1997).

    Book  Google Scholar 

  10. Chen, S., Donoho, D. & Saunders, M. Atomic decomposition by basis pursuit. Technical Report 479 (Dept. of Statistics, Stanford University, 1995).

    Google Scholar 

  11. Tanaka, K. Inferotemporal cortex and object vision. Annu. Rev. Neurosci. 19, 109–139 (1996).

    Article  CAS  Google Scholar 

  12. Mohan, A. Object detection in images by components. AI Memo 1664 (CBCL and AI Lab, MIT, Cambridge, Massachusetts, 1999).

    Google Scholar 

  13. Heisele, B., Poggio, T. & Pontil, M. Face detection in still gray images. AI Memo 1687 (CBCL and AI Lab, MIT, Cambridge, Massachusetts, 2000).

    Google Scholar 

  14. Ullman, S. & Sali, E. in Proceedings of BMCV2000, Vol. 1811 of Lecture Notes in Computer Science (eds. Lee, S.-W., Bülthoff, H. & Poggio, T.) 73–87 (Springer, New York, 2000).

    Google Scholar 

  15. Schneiderman, H. & Kanade, T. A statistical method for 3D object detection applied to faces and cars. in IEEE Conference on Computer Vision and Pattern Recognition (in press).

  16. Marr, D. & Nishihara, H. K. Representation and recognition of the spatial organization of three-dimensional shapes. Proc. R. Soc. Lond. B Biol. Sci. 200, 269– 294 (1978).

    Article  CAS  Google Scholar 

  17. Biederman, I. Recognition-by-components: A theory of human image understanding. Psychol. Rev. 94, 115–147 (1987).

    Article  Google Scholar 

  18. Hummel, J. & Biederman, I. Dynamic binding in a neural network for shape recognition. Psychol. Rev. 99, 480–517 (1992).

    Article  CAS  Google Scholar 

  19. Biederman, I. & Gerhardstein, P. Recognizing depth-rotated objects: evidence and conditions for three-dimensional viewpoint invariance. J. Exp. Psychol. Hum. Percept. Perform. 19, 1162– 1182 (1993).

    Article  CAS  Google Scholar 

  20. Tarr, M. & Bülthoff, H. Is human object recognition better described by geon structural descriptions or by multiple views? Comment on Biederman and Gerhardstein (1993). J. Exp. Psychol. Hum. Percept. Perform. 21, 1494–1505 ( 1995).

    Article  CAS  Google Scholar 

  21. Tarr, M. & Bülthoff, H. Image-based object recognition in man, monkey and machine. Cognition 67, 1–20 (1998).

    Article  CAS  Google Scholar 

  22. Logothetis, N., Pauls, J., Bülthoff, H. & Poggio, T. View-dependent object recognition by monkeys. Curr. Biol. 4, 401–414 (1994).

    Article  CAS  Google Scholar 

  23. Tarr, M., Williams, P., Hayward, W. & Gauthier, I. Three-dimensional object recognition is viewpoint-dependent. Nat. Neurosci. 1, 275–277 ( 1998).

    Article  CAS  Google Scholar 

  24. Logothetis, N., Pauls, J. & Poggio, T. Shape representation in the inferior temporal cortex of monkeys. Curr. Biol. 5, 552– 563 (1995).

    Article  CAS  Google Scholar 

  25. Perrett, D., Hietanen, J., Oram, M. & Benson, P. Organization and functions of cells responsive to faces in the temporal cortex. Phil. Trans. R. Soc. Lond. B Biol. Sci. 335, 23– 30 (1992).

    Article  CAS  Google Scholar 

  26. Ungerleider, L. & Haxby, J. 'What' and 'where' in the human brain. Curr. Opin. Neurobiol. 4, 157–165 (1994).

    Article  CAS  Google Scholar 

  27. Kobatake, E. & Tanaka, K. Neuronal selectivities to complex object features in the ventral visual pathway of the macaque cerebral cortex . J. Neurophysiol. 71, 856– 867 (1994).

    Article  CAS  Google Scholar 

  28. Booth, M. & Rolls, E. View-invariant representations of familiar objects by neurons in the inferior temporal visual cortex. Cereb. Cortex 8, 510–523 (1998).

    Article  CAS  Google Scholar 

  29. Kobatake, E., Wang, G. & Tanaka, K. Effects of shape-discrimination training on the selectivity of inferotemporal cells in adult monkeys. J. Neurophysiol. 80, 324–330 (1998).

    Article  CAS  Google Scholar 

  30. Wang, G., Tanaka, K. & Tanifuji, M. Optical imaging of functional organization in the monkey inferotemporal cortex. Science 272, 1665 –1668 (1996).

    Article  CAS  Google Scholar 

  31. Young, M. & Yamane, S. Sparse population coding of faces in the inferotemporal cortex. Science 256, 1327–1331 (1992).

    Article  CAS  Google Scholar 

  32. Miller, E. The prefrontal cortex and cognitive control. Nat. Rev. Neurosci. 1, 59–65 (2000 ).

    Article  CAS  Google Scholar 

  33. Mumford, D. On the computational architecture of the neocortex. II. The role of corticocortical loops. Biol. Cybern. 66, 241– 251 (1992).

    Article  CAS  Google Scholar 

  34. Rao, R. & Ballard, D. Dynamic model of visual recognition predicts neural response properties in the visual cortex. Neural Comput. 9, 721–763 ( 1997).

    Article  CAS  Google Scholar 

  35. Anderson, C. & van Essen, D. Shifter circuits: a computational strategy for dynamic aspects of visual processing. Proc. Natl. Acad. Sci. USA 84, 6297–6301 (1987).

    Article  CAS  Google Scholar 

  36. Olshausen, B., Anderson, C. & van Essen, D. A neurobiological model of visual attention and invariant pattern recognition based on dynamic routing of information. J. Neurosci. 13, 4700–4719 (1993).

    Article  CAS  Google Scholar 

  37. Gochin, P. Properties of simulated neurons from a model of primate inferior temporal cortex. Cereb. Cortex 5, 532– 543 (1994).

    Article  Google Scholar 

  38. Thorpe, S., Fize, D. & Marlot, C. Speed of processing in the human visual system. Nature 381, 520–522 ( 1996).

    Article  CAS  Google Scholar 

  39. Fukushima, K. Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biol. Cybern. 36, 193–202 (1980).

    Article  CAS  Google Scholar 

  40. Perrett, D. & Oram, M. Neurophysiology of shape processing . Image Vis. Comput. 11, 317– 333 (1993).

    Article  Google Scholar 

  41. Mel, B. SEEMORE: combining color, shape, and texture histogramming in a neurally inspired approach to visual object recognition. Neural Comput. 9, 777–804 (1997).

    Article  CAS  Google Scholar 

  42. Riesenhuber, M. & Poggio, T. Hierarchical models of object recognition in cortex. Nat. Neurosci. 2, 1019–1025 (1999).

    Article  CAS  Google Scholar 

  43. Wallis, G. & Rolls, E. A model of invariant object recognition in the visual system. Prog. Neurobiol. 51, 167–194 (1997).

    Article  CAS  Google Scholar 

  44. Riesenhuber, M. & Poggio, T. Are cortical models really bound by the “binding problem”? Neuron 24, 87–93 (1999).

    Article  CAS  Google Scholar 

  45. Amit, Y. & Geman, D. A computational model for visual selection . Neural Comput. 11, 1691– 1715 (1999).

    Article  CAS  Google Scholar 

  46. Bülthoff, H. & Edelman, S. Psychophysical support for a two-dimensional view interpolation theory of object recognition . Proc. Natl. Acad. Sci. USA 89, 60– 64 (1992).

    Article  Google Scholar 

  47. Riesenhuber, M. & Poggio, T. The individual is nothing, the class everything: Psychophysics and modeling of recognition in object classes. AI Memo 1682, CBCL Paper 185 (MIT AI Lab and CBCL, Cambridge, Massachusetts, 2000).

  48. Edelman, S. Class similarity and viewpoint invariance in the recognition of 3D objects . Biol. Cybern. 72, 207– 220 (1995).

    Article  Google Scholar 

  49. Moses, Y., Ullman, S. & Edelman, S. Generalization to novel images in upright and inverted faces. Perception 25, 443– 462 (1996).

    Article  CAS  Google Scholar 

  50. Riesenhuber, M. & Poggio, T. A note on object class representation and categorical perception. AI Memo 1679, CBCL Paper 183 (MIT AI Lab and CBCL, Cambridge, Massachusetts, 1999).

    Google Scholar 

  51. Hinton, G., Dayan, P., Frey, B. & Neal, R. The wake-sleep algorithm for unsupervised neural networks. Science 268, 1158–1160 (1995).

    Article  CAS  Google Scholar 

  52. Chelazzi, L., Duncan, J., Miller, E. & Desimone, R. Responses of neurons in inferior temporal cortex during memory-guided visual search. J. Neurophysiol. 80, 2918–2940 (1998).

    Article  CAS  Google Scholar 

  53. Haenny, P., Maunsell, J. & Schiller, P. State dependent activity in monkey visual cortex. II. Retinal and extraretinal factors in V4. Exp. Brain Res. 69, 245–259 (1988).

    Article  CAS  Google Scholar 

  54. Miller, E., Erickson, C. & Desimone, R. Neural mechanism of visual working memory in prefrontal cortex of the macaque. J. Neurosci. 16, 5154–5167 (1996).

    Article  CAS  Google Scholar 

  55. Motter, B. Neural correlates of feature selective memory and pop-out in extrastriate area V4. J. Neurosci. 14, 2190– 2199 (1994).

    Article  CAS  Google Scholar 

  56. Olshausen, B. & Field, D. Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature 381, 607–609 ( 1996).

    Article  CAS  Google Scholar 

  57. Hyvärinen, A. & Hoyer, P. Emergence of phase and shift invariant features by decomposition of natural images into independent feature subspaces. Neural Comput. 12, 1705 –1720 (2000).

    Article  Google Scholar 

  58. Földiák, P. Learning invariance from transformation sequences. Neural Comput. 3, 194–200 ( 1991).

    Article  Google Scholar 

  59. Weber, M., Welling, W. & Perona, P. Towards automatic discovery of object categories. in IEEE Conference on Computer Vision and Pattern Recognition (in press).

Download references

Acknowledgements

Supported by grants from ONR, DARPA, NSF, ATR, Honda, a Merck/MIT Fellowship in Bioinformatics, and a McDonnell Pcw award (M.R.). T.P. is supported by the Uncas and Helen Whitaker Chair at the Whitaker College, MIT. For comments and suggestions, we are grateful to Heinrich Bülthoff, Peter Dayan, Shimon Edelman, David Freedman, Christof Koch, Earl Miller, David Perrett, Pawan Sinha and Francis Crick (also for the picture in Fig. 3).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tomaso Poggio.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Riesenhuber, M., Poggio, T. Models of object recognition. Nat Neurosci 3 (Suppl 11), 1199–1204 (2000). https://doi.org/10.1038/81479

Download citation

  • Received:

  • Accepted:

  • Issue Date:

  • DOI: https://doi.org/10.1038/81479

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing