Capturing the objects of vision with neural networks

Peters, Benjamin; Kriegeskorte, Nikolaus

doi:10.1038/s41562-021-01194-6

Review Article
Published: 20 September 2021

Capturing the objects of vision with neural networks

Nature Human Behaviour volume 5, pages 1127–1144 (2021)Cite this article

3614 Accesses
22 Citations
19 Altmetric
Metrics details

Subjects

Abstract

Human visual perception carves a scene at its physical joints, decomposing the world into objects, which are selectively attended, tracked and predicted as we engage our surroundings. Object representations emancipate perception from the sensory input, enabling us to keep in mind that which is out of sight and to use perceptual content as a basis for action and symbolic cognition. Human behavioural studies have documented how object representations emerge through grouping, amodal completion, proto-objects and object files. By contrast, deep neural network models of visual object recognition remain largely tethered to sensory input, despite achieving human-level performance at labelling objects. Here, we review related work in both fields and examine how these fields can help each other. The cognitive literature provides a starting point for the development of new experimental tasks that reveal mechanisms of human object perception and serve as benchmarks driving the development of deep neural network models that will put the object into object recognition.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Stages of untethering human visual object perception from the sensorium.**

**Fig. 3: Neural network mechanisms for untethering.**

**Fig. 4: Space of tasks for untethered object perception.**

Qualitative similarities and differences in visual object representations between brains and deep networks

Article Open access 25 March 2021

Real-world size of objects serves as an axis of object space

Article Open access 27 July 2022

Capturing human categorization of natural images by combining deep networks and cognitive models

Article Open access 27 October 2020

References

Von Helmholtz, H. Handbuch der Physiologischen Optik (Voss, 1867).
Yuille, A. & Kersten, D. Vision as Bayesian inference: analysis by synthesis? Trends Cogn. Sci. 10, 301–308 (2006).
Article PubMed Google Scholar
Pearl, J. Causality (Cambridge Univ. Press, 2009).
Piaget, J. The Construction of Reality in the Child (Basic Books, 1954).
Adelson, E. H. On seeing stuff: the perception of materials by humans and machines. In Human Vision and Electronic Imaging VI (eds. Rogowitz, B. E. & Pappas, T. N.) vol. 4299 1–12 (SPIE, 2001).
Clowes, M. B. On seeing things. Artif. Int. 2, 79–116 (1971).
Article Google Scholar
Julesz, B. Experiments in the visual perception of texture. Sci. Am. 232, 34–43 (1975).
Article CAS PubMed Google Scholar
Simoncelli, E. P. & Olshausen, B. A. Natural image statistics and neural representation. Ann. Rev. Neurosci. 24, 1193–1216 (2001).
Article CAS PubMed Google Scholar
Rosenholtz, R., Li, Y. & Nakano, L. Measuring visual clutter. J. Vis. 7, 17–17 (2007).
Article PubMed Google Scholar
Hoffman, D. D. & Richards, W. A. Parts of recognition. Cognition 18, 65–96 (1984).
Article CAS PubMed Google Scholar
Michotte, A. et al. Les Complements Amodaux des Structures Perceptives (Institut de psychologie de l’Université de Louvain, 1964).
Rensink, R. A. The dynamic representation of scenes. Visual Cogn. 7, 17–42 (2000).
Article Google Scholar
Gregory, R. L. Perceptions as hypotheses. Phil. Trans. R. Soc. Lond. B Biol. Sci. 290, 181–197 (1980).
Article CAS Google Scholar
Rock, I. Indirect Perception (The MIT Press, 1997).
Clark, A. Whatever next? Predictive brains, situated agents, and the future of cognitive science. Behav. Brain Sci. 36, 181–204 (2013).
Article PubMed Google Scholar
Friston, K. J. A theory of cortical responses. Phil. Trans. R. Soc. B Biol. Sci. 360, 815–836 (2005).
Article Google Scholar
van Steenkiste, S., Greff, K. & Schmidhuber, J. A perspective on objects and systematic generalization in model-based RL. Preprint at http://arxiv.org/abs/1906.01035 (2019).
Greff, K., van Steenkiste, S. & Schmidhuber, J. On the binding problem in artificial neural networks. Preprint at http://arxiv.org/abs/2012.05208 (2020).
Spelke, E. S. Principles of object perception. Cogn. Sci. 14, 29–56 (1990).
Article Google Scholar
Scholl, B. J. Object persistence in philosophy and psychology. Mind Lang. 22, 563–591 (2007).
Article Google Scholar
Sarkka, S. Bayesian Filtering and Smoothing (Cambridge Univ. Press, 2013).
Deneve, S., Duhamel, J.-R. & Pouget, A. Optimal sensorimotor integration in recurrent cortical networks: a neural implementation of Kalman filters. J. Neurosci. 27, 5744–5756 (2007).
Article CAS PubMed PubMed Central Google Scholar
LeCun, Y. et al. Backpropagation applied to handwritten zip code recognition. Neural Comput. 1, 541–551 (1989).
Article Google Scholar
Fukushima, K. Neocognitron: a self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biol. Cybern. 36, 193–202 (1980).
Article CAS PubMed Google Scholar
Geirhos, R. et al. ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness. In Proc. 7th International Conference on Learning Representations (OpenReview.net, 2019).
Kansky, K. et al. Schema networks: zero-shot transfer with a generative causal model of intuitive physics. In Proc. 34th International Conference on Machine Learning (eds. Precup, D. & Teh, Y. W.) vol. 70 1809–1818 (PMLR, 2017).
Lake, B. M., Ullman, T. D., Tenenbaum, J. B. & Gershman, S. J. Building machines that learn and think like people. Behav. Brain Sci. 40, e253 (2017).
Article PubMed Google Scholar
Yildirim, I., Wu, J., Kanwisher, N. & Tenenbaum, J. An integrative computational architecture for object-driven cortex. Curr. Opin. Neurobiol. 55, 73–81 (2019).
Article CAS PubMed PubMed Central Google Scholar
Golan, T., Raju, P. C. & Kriegeskorte, N. Controversial stimuli: pitting neural networks against each other as models of human cognition. Proc. Natl Acad. Sci. 117, 29330–29337 (2020).
Article CAS PubMed PubMed Central Google Scholar
Gibson, J. J. The Ecological Approach to Visual Perception: Classic Edition (Houghton Mifflin, 1979).
Knill, D. C. & Pouget, A. The Bayesian brain: the role of uncertainty in neural coding and computation. Trends Neurosci. 27, 712–719 (2004).
Article CAS PubMed Google Scholar
Treisman, A. The binding problem. Curr. Opin. Neurobiol. 6, 171–178 (1996).
Article CAS PubMed Google Scholar
von der Malsburg, C. The Correlation Theory of Brain Function. in Models of Neural Networks: Temporal Aspects of Coding and Information Processing in Biological Systems (eds. Domany, E., van Hemmen, J. L. & Schulten, K.) 95–119 (Springer, 1981).
Duncan, J. Selective attention and the organization of visual information. J. Exp. Psychol. Gen. 113, 501–517 (1984).
Article CAS PubMed Google Scholar
Neisser, U. Cognitive Psychology (Appleton-Century-Crofts, 1967).
Treisman, A. Features and objects in visual processing. Sci. Am. 255, 114–125 (1986).
Article Google Scholar
Baars, B. J. A Cognitive Theory of Consciousness (Cambridge Univ. Press, 1993).
Dehaene, S. & Naccache, L. Towards a cognitive neuroscience of consciousness: basic evidence and a workspace framework. Cognition 79, 1–37 (2001).
Article CAS PubMed Google Scholar
Hubel, D. H. & Wiesel, T. N. Receptive fields and functional architecture in two nonstriate visual areas (18 and 19) of the cat. J. Neurophysiol. 28, 229–289 (1965).
Article CAS PubMed Google Scholar
Riesenhuber, M. & Poggio, T. Hierarchical models of object recognition in cortex. Nat. Neurosci. 2, 1019–1025 (1999).
Article CAS PubMed Google Scholar
Roelfsema, P. R. Cortical algorithms for perceptual grouping. Ann. Rev. Neurosci. 29, 203–227 (2006).
Article CAS PubMed Google Scholar
Field, D. J., Hayes, A. & Hess, R. F. Contour integration by the human visual system: evidence for a local ‘association field’. Vis. Res. 33, 173–193 (1993).
Article CAS PubMed Google Scholar
Geisler, W. S. Visual perception and the statistical properties of natural scenes. Ann. Rev. Psychol. 59, 167–192 (2008).
Article Google Scholar
Bosking, W. H., Zhang, Y., Schofield, B. & Fitzpatrick, D. Orientation selectivity and the arrangement of horizontal connections in tree shrew striate cortex. J. Neurosci. 17, 2112–2127 (1997).
Article CAS PubMed PubMed Central Google Scholar
Koffka, K. Principles of Gestalt Psychology (Harcourt, Brace, 1935).
Rock, I. & Palmer, S. The legacy of gestalt psychology. Sci. Am. 263, 84–91 (1990).
Article CAS PubMed Google Scholar
Wertheimer, M. Untersuchungen zur lehre von der gestalt. Psychol. Forsch. 4, 301–350 (1923).
Article Google Scholar
Nakayama, K. & Shimojo, S. Experiencing and perceiving visual surfaces. Science 257, 1357–1363 (1992).
Article CAS PubMed Google Scholar
Rosenholtz, R., Twarog, N. R., Schinkel-Bielefeld, N. & Wattenberg, M. An intuitive model of perceptual grouping for HCI design. In Proc. SIGCHI Conference on Human Factors in Computing Systems 1331–1340 (ACM, 2009).
Li, Z. A neural model of contour integration in the primary visual cortex. Neural Comput. 10, 903–940 (1998).
Article CAS PubMed Google Scholar
Yen, S.-C. & Finkel, L. H. Extraction of perceptually salient contours by striate cortical networks. Vis. Res. 38, 719–741 (1998).
Article CAS PubMed Google Scholar
Roelfsema, P. R., Lamme, V. A. & Spekreijse, H. Object-based attention in the primary visual cortex of the macaque monkey. Nature 395, 376–381 (1998).
Article CAS PubMed Google Scholar
Nakayama, K. & Silverman, G. H. Serial and parallel processing of visual feature conjunctions. Nature 320, 264–265 (1986).
Article CAS PubMed Google Scholar
Alais, D., Blake, R. & Lee, S.-H. Visual features that vary together over time group together over space. Nat. Neurosci. 1, 160–164 (1998).
Article CAS PubMed Google Scholar
Vecera, S. P. & Farah, M. J. Is visual image segmentation a bottom-up or an interactive process? Percept. Psychophys. 59, 1280–1296 (1997).
Article CAS PubMed Google Scholar
Sekuler, A. & Palmer, S. Perception of partly occluded objects: a microgenetic analysis. J. Exp. Psychol. Gen. 121, 95–111 (1992).
Article Google Scholar
Marr, D., Ullman, S. & Poggio, T. Vision: A Computational Investigation Into the Human Representation and Processing of Visual Information (W. H. Freeman, 1982) .
Michotte, A. & Burke, L. Une nouvelle enigme dans la psychologie de la perception: le’donne amodal’dans l’experience sensorielle. In Proc. XIII Congrés Internationale de Psychologie 179–180 (1951).
Komatsu, H. The neural mechanisms of perceptual filling-in. Nat. Rev. Neurosci. 7, 220–231 (2006).
Article CAS PubMed Google Scholar
Shore, D. I. & Enns, J. T. Shape completion time depends on the size of the occluded region. J. Exp. Psychol. Hum. Percept. Perform. 23, 980–998 (1997).
Article CAS PubMed Google Scholar
He, Z. J. & Nakayama, K. Surfaces versus features in visual search. Nature 359, 231–233 (1992).
Article CAS PubMed Google Scholar
Rensink, R. A. & Enns, J. T. Early completion of occluded objects. Vis. Res. 38, 2489–2505 (1998).
Article CAS PubMed Google Scholar
Kellman, P. J. & Shipley, T. F. A theory of visual interpolation in object perception. Cogn. Psychol. 23, 141–221 (1991).
Article CAS PubMed Google Scholar
Tse, P. U. Volume completion. Cogn. Psychol. 39, 37–68 (1999).
Article CAS PubMed Google Scholar
Buffart, H., Leeuwenberg, E. & Restle, F. Coding theory of visual pattern completion. J. Exp. Psychol. Hum. Percept. Perform. 7, 241–274 (1981).
Article CAS PubMed Google Scholar
Weigelt, S., Singer, W. & Muckli, L. Separate cortical stages in amodal completion revealed by functional magnetic resonance adaptation. BMC Neurosci. 8, 70 (2007).
Article PubMed PubMed Central Google Scholar
Thielen, J., Bosch, S. E., van Leeuwen, T. M., van Gerven, M. A. J. & van Lier, R. Neuroimaging findings on amodal completion: a review. i-Perception 10, 2041669519840047 (2019).
Article PubMed PubMed Central Google Scholar
Mooney, C. M. Age in the development of closure ability in children. Can. J. Psychol. 11, 219–226 (1957).
Article CAS PubMed Google Scholar
Snodgrass, J. G. & Feenan, K. Priming effects in picture fragment completion: support for the perceptual closure hypothesis. J. Exp. Psychol. Gen. 119, 276–296 (1990).
Article CAS PubMed Google Scholar
Treisman, A. & Gelade, G. A feature-integration theory of attention. Cogn. Psychol. 12, 97–136 (1980).
Article CAS PubMed Google Scholar
Pylyshyn, Z. W. Is vision continuous with cognition? The case for cognitive impenetrability of visual perception. Behav. Brain Sci. 22, 341–365 (1999).
Article CAS PubMed Google Scholar
Wolfe, J. M. & Cave, K. R. The psychophysical evidence for a binding problem in human vision. Neuron 24, 11–17 (1999).
Article CAS PubMed Google Scholar
Ullman, S. The interpretation of structure from motion. Proc. R. Soc. Lon. B Biol. Sci. 203, 405–426 (1979).
Article CAS Google Scholar
Flombaum, J. I., Scholl, B. J. & Santos, L. R. Spatiotemporal priority as a fundamental principle of object persistence. In The Origins of Object Knowledge (eds. Hood, B. M. & Santos, L. R.) 135–164 (Oxford University Press, 2009).
Mitroff, S. R. & Alvarez, G. A. Space and time, not surface features, guide object persistence. Psychon. Bull. Rev. 14, 1199–1204 (2007).
Article PubMed Google Scholar
Burke, L. On the tunnel effect. Quart. J. Exp. Psychol. 4, 121–138 (1952).
Article Google Scholar
Flombaum, J. I. & Scholl, B. J. A temporal same-object advantage in the tunnel effect: facilitated change detection for persisting objects. J. Exp. Psychol. Hum. Percept. Perform. 32, 840–853 (2006).
Article PubMed Google Scholar
Hollingworth, A. & Franconeri, S. L. Object correspondence across brief occlusion is established on the basis of both spatiotemporal and surface feature cues. Cognition 113, 150–166 (2009).
Article PubMed PubMed Central Google Scholar
Moore, C. M., Stephens, T. & Hein, E. Features, as well as space and time, guide object persistence. Psychon. Bull. Rev. 17, 731–736 (2010).
Article PubMed PubMed Central Google Scholar
Papenmeier, F., Meyerhoff, H. S., Jahn, G. & Huff, M. Tracking by location and features: object correspondence across spatiotemporal discontinuities during multiple object tracking. J. Exp. Psychol. Hum. Percept. Perform. 40, 159–171 (2014).
Article PubMed Google Scholar
Liberman, A., Zhang, K. & Whitney, D. Serial dependence promotes object stability during occlusion. J. Vis. 16, 16 (2016).
Article PubMed PubMed Central Google Scholar
Fischer, C. et al. Context information supports serial dependence of multiple visual objects across memory episodes. Nat. Commun. 11, 1932 (2020).
Article CAS PubMed PubMed Central Google Scholar
Irwin, D. E. Memory for position and identity across eye movements. J. Exp. Psychol. Learn. Mem. Cogn. 18, 307–317 (1992).
Article Google Scholar
Richard, A. M., Luck, S. J. & Hollingworth, A. Establishing object correspondence across eye movements: flexible use of spatiotemporal and surface feature information. Cognition 109, 66–88 (2008).
Article PubMed PubMed Central Google Scholar
Kahneman, D., Treisman, A. & Gibbs, B. J. The reviewing of object-files: object specific integration of information. Cogn. Psychol. 24, 174–219 (1992).
Article Google Scholar
Pylyshyn, Z. W. The role of location indexes in spatial perception: a sketch of the FINST spatial-index model. Cognition 32, 65–97 (1989).
Article CAS PubMed Google Scholar
Itti, L. & Koch, C. Computational modelling of visual attention. Nat. Rev. Neurosci. 2, 194–203 (2001).
Article CAS PubMed Google Scholar
Cavanagh, P. & Alvarez, G. A. Tracking multiple targets with multifocal attention. Trends Cogn. Sci. 9, 349–354 (2005).
Article PubMed Google Scholar
Bahcall, D. O. & Kowler, E. Attentional interference at small spatial separations. Vis. Res. 39, 71–86 (1999).
Article CAS PubMed Google Scholar
Franconeri, S. L., Alvarez, G. A. & Cavanagh, P. Flexible cognitive resources: competitive content maps for attention and memory. Trends Cogn. Sci. 17, 134–141 (2013).
Article PubMed PubMed Central Google Scholar
Pylyshyn, Z. W. & Storm, R. W. Tracking multiple independent targets: evidence for a parallel tracking mechanism. Spatial Vis. 3, 179–197 (1988).
Article CAS Google Scholar
Intriligator, J. & Cavanagh, P. The spatial resolution of visual attention. Cognit. Psychol. 43, 171–216 (2001).
Article CAS PubMed Google Scholar
Scholl, B. J. & Pylyshyn, Z. W. Tracking multiple items through occlusion: clues to visual objecthood. Cognit. Psychol. 38, 259–290 (1999).
Article CAS PubMed Google Scholar
Yantis, S. Multielement visual tracking: attention and perceptual organization. Cognit. Psychol. 24, 295–340 (1992).
Article CAS PubMed Google Scholar
Vul, E., Alvarez, G., Tenenbaum, J. B. & Black, M. J. Explaining human multiple object tracking as resource-constrained approximate inference in a dynamic probabilistic model. In Proc. Advances in Neural Information Processing Systems 22 (eds Bengio, Y. et al.) 1955–1963 (Curran Associates, 2009).
Alvarez, G. A. & Franconeri, S. L. How many objects can you track? Evidence for a resource-limited attentive tracking mechanism. J. Vis. 7, 14 (2007).
Article PubMed Google Scholar
Flombaum, J. I., Scholl, B. J. & Pylyshyn, Z. W. Attentional resources in visual tracking through occlusion: the high-beams effect. Cognition 107, 904–931 (2008).
Article PubMed Google Scholar
Vecera, S. P. & Farah, M. J. Does visual attention select objects or locations? J. Exp. Psychol. Gen. 123, 146–160 (1994).
Article CAS PubMed Google Scholar
Chen, Z. Object-based attention: a tutorial review. Atten. Percept. Psychophys. 74, 784–802 (2012).
Article PubMed Google Scholar
Egly, R., Driver, J. & Rafal, R. D. Shifting visual attention between objects and locations: evidence from normal and parietal lesion subjects. J. Exp. Psychol. Gen. 123, 161–177 (1994).
Article CAS PubMed Google Scholar
Houtkamp, R., Spekreijse, H. & Roelfsema, P. R. A gradual spread of attention. Percept. Psychophys. 65, 1136–1144 (2003).
Article CAS PubMed Google Scholar
Jeurissen, D., Self, M. W. & Roelfsema, P. R. Serial grouping of 2D-image regions with object-based attention in humans. eLife 5, e14320 (2016).
Article PubMed PubMed Central Google Scholar
Moore, C. M., Yantis, S. & Vaughan, B. Object-based visual selection: evidence from perceptual completion. Psychol. Sci. 9, 104–110 (1998).
Article Google Scholar
Peters, B., Kaiser, J., Rahm, B. & Bledowski, C. Activity in human visual and parietal cortex reveals object-based attention in working memory. J. Neurosci. 35, 3360–3369 (2015).
Article CAS PubMed PubMed Central Google Scholar
Peters, B., Kaiser, J., Rahm, B. & Bledowski, C. Object-based attention prioritizes working memory contents at a theta rhythm. J. Exp. Psychol. Gen. (2020).
Baillargeon, R. Object permanence in 3 1/2- and 4 1/2-month-old infants. Dev. Psychol. 23, 655–664 (1987).
Article Google Scholar
Baillargeon, R., Spelke, E. S. & Wasserman, S. Object permanence in five-month-old infants. Cognition 20, 191–208 (1985).
Article CAS PubMed Google Scholar
Spelke, E. S., Breinlinger, K., Macomber, J. & Jacobson, K. Origins of knowledge. Psychol. Rev. 99, 605–632 (1992).
Article CAS PubMed Google Scholar
Wilcox, T. Object individuation: infants’ use of shape, size, pattern, and color. Cognition 72, 125–166 (1999).
Article CAS PubMed Google Scholar
Rosander, K. & von Hofsten, C. Infants’ emerging ability to represent occluded object motion. Cognition 91, 1–22 (2004).
Article PubMed Google Scholar
Moore, M. K., Borton, R. & Darby, B. L. Visual tracking in young infants: evidence for object identity or object permanence? J. Exp. Child Psychol. 25, 183–198 (1978).
Article CAS PubMed Google Scholar
Freyd, J. J. & Finke, R. A. Representational momentum. J. Exp. Psychol. Learn. Mem. Cognit. 10, 126–132 (1984).
Article Google Scholar
Benguigui, N., Ripoll, H. & Broderick, M. P. Time-to-contact estimation of accelerated stimuli is based on first-order information. J. Exp. Psychol. Hum. Percept. Perform. 29, 1083–1101 (2003).
Article PubMed Google Scholar
Rosenbaum, D. A. Perception and extrapolation of velocity and acceleration. J. Exp. Psychol. Hum. Percept. Perform. 1, 395–403 (1975).
Article CAS PubMed Google Scholar
Franconeri, S. L., Pylyshyn, Z. W. & Scholl, B. J. A simple proximity heuristic allows tracking of multiple objects through occlusion. Atten. Percept. Psychophys. 74, 691–702 (2012).
Article PubMed Google Scholar
Matin, E. Saccadic suppression: a review and an analysis. Psychol. Bull. 81, 899–917 (1974).
Article CAS PubMed Google Scholar
Henderson, J. M. Two representational systems in dynamic visual identification. J. Exp. Psychol. Gen. 123, 410–426 (1994).
Article CAS PubMed Google Scholar
Bahrami, B. Object property encoding and change blindness in multiple object tracking. Visual Cogn. 10, 949–963 (2003).
Article Google Scholar
Pylyshyn, Z. Some puzzling findings in multiple object tracking: I. Tracking without keeping track of object identities. Visual Cogn. 11, 801–822 (2004).
Article Google Scholar
Horowitz, T. S. et al. Tracking unique objects. Percept. Psychophys. 69, 172–184 (2007).
Article PubMed Google Scholar
Fougnie, D. & Marois, R. Distinct capacity limits for attention and working memory: evidence from attentive tracking and visual working memory paradigms. Psychol. Sci. 17, 526–534 (2006).
Article PubMed Google Scholar
Hollingworth, A. & Rasmussen, I. P. Binding objects to locations: the relationship between object files and visual working memory. J. Exp. Psychol. Hum. Percept. Perform. 36, 543–564 (2010).
Article PubMed PubMed Central Google Scholar
Awh, E., Barton, B. & Vogel, E. K. Visual working memory represents a fixed number of items regardless of complexity. Psychol. Sci. 18, 622–628 (2007).
Article PubMed Google Scholar
Cowan, N. The magical number 4 in short-term memory: a reconsideration of mental storage capacity. Behav. Brain Sci. 24, 87–114 (2001).
Article CAS PubMed Google Scholar
Luck, S. J. & Vogel, E. K. The capacity of visual working memory for features and conjunctions. Nature 390, 279–284 (1997).
Article CAS PubMed Google Scholar
Miller, G. A. The magical number seven. Psychol. Rev. 63, 81–97 (1956).
Article CAS PubMed Google Scholar
Bays, P. M., Wu, E. Y. & Husain, M. Storage and binding of object features in visual working memory. Neuropsychologia 49, 1622–1631 (2011).
Article PubMed Google Scholar
Fougnie, D. & Alvarez, G. A. Object features fail independently in visual working memory: evidence for a probabilistic feature-store model. J. Vis. 11, 3 (2011).
Article PubMed Google Scholar
Brady, T. F., Konkle, T. & Alvarez, G. A. A review of visual memory capacity: beyond individual items and toward structured representations. J. Vis. 11, 4 (2011).
Article PubMed Google Scholar
Alvarez, G. A. & Cavanagh, P. The capacity of visual short-term memory is set both by visual information load and by number of objects. Psychol. Sci. 15, 106–111 (2004).
Article CAS PubMed Google Scholar
Bays, P. M. & Husain, M. Dynamic shifts of limited working memory resources in human vision. Science 321, 851–854 (2008).
Article CAS PubMed PubMed Central Google Scholar
Wilken, P. & Ma, W. J. A detection theory account of change detection. J. Vis. 4, 1120–1135 (2004).
Article PubMed Google Scholar
Oberauer, K. & Lin, H.-Y. An interference model of visual working memory. Psychol. Rev. 124, 21–59 (2017).
Article PubMed Google Scholar
Bouchacourt, F. & Buschman, T. J. A flexible model of working memory. Neuron 103, 147–160 (2019).
Article CAS Google Scholar
Baddeley, A. D. & Hitch, G. Working Memory. In Psychology of Learning and Motivation (ed. Bower, G. H.) vol. 8 47–89 (Academic, 1974).
Cowan, N. Evolving conceptions of memory storage, selective attention, and their mutual constraints within the human information-processing system. Psychol. Bull. 104, 163–191 (1988).
Article CAS PubMed Google Scholar
Miyake, A. & Shah, P. Models of Working Memory: Mechanisms of Active Maintenance and Executive Control (Cambridge Univ. Press, 1999).
McCulloch, W. S. & Pitts, W. A logical calculus of the ideas immanent in nervous activity. Bull. Math. Biol. 5, 115–133 (1943).
Google Scholar
O’Reilly, R. C. & Munakata, Y. Computational Explorations in Cognitive Neuroscience: Understanding the Mind by Simulating the Brain (MIT Press, 2000).
Krizhevsky, A., Sutskever, I. & Hinton, G. E. ImageNet classification with deep convolutional neural networks. In Proc. Advances in Neural Information Processing Systems 25 (eds. Pereira, F. et al.) (Curran Associates, 2012).
Rosenblatt, F. Principles of Neurodynamics. Perceptrons and the Theory of Brain Mechanisms Technical Report (Cornell Aeronautical Lab, 1961).
Rumelhart, D. E., Hinton, G. E. & Williams, R. J. Learning representations by back-propagating errors. Nature 323, 533–536 (1986).
Article Google Scholar
Ivakhnenko, A. G. Polynomial theory of complex systems. In Proc. IEEE transactions on Systems, Man, and Cybernetics 364–378 (IEEE, 1971).
Hebb, D. O. The Organization of Behavior: A Neuropsychological Theory (J. Wiley, Chapman & Hall, 1949).
Hopfield, J. J. Neural networks and physical systems with emergent collective computational abilities. Proc. Natl Acad. Sci. USA 79, 2554–2558 (1982).
Article CAS PubMed PubMed Central Google Scholar
Zemel, R. S. & Mozer, M. C. Localist attractor networks. Neural Comput. 13, 1045–1064 (2001).
Article CAS PubMed Google Scholar
Schmidhuber, J. Learning to control fast-weight memories: an alternative to dynamic recurrent networks. Neural Comput. 4, 131–139 (1992).
Article Google Scholar
Olshausen, B. A., Anderson, C. H. & Essen, D. V. A neurobiological model of visual attention and invariant pattern recognition based on dynamic routing of information. J. Neurosci. 13, 4700–4719 (1993).
Article CAS PubMed PubMed Central Google Scholar
Anderson, C. H. & Van Essen, D. C. Shifter circuits: a computational strategy for dynamic aspects of visual processing. Proc. Natl Acad. Sci. USA 84, 6297–6301 (1987).
Article CAS PubMed PubMed Central Google Scholar
Burak, Y., Rokni, U., Meister, M. & Sompolinsky, H. Bayesian model of dynamic image stabilization in the visual system. Proc. Natl Acad. Sci. USA 107, 19525–19530 (2010).
Article CAS PubMed PubMed Central Google Scholar
Salinas, E. & Thier, P. Gain modulation: a major computational principle of the central nervous system. Neuron 27, 15–21 (2000).
Article CAS PubMed Google Scholar
Hochreiter, S. & Schmidhuber, J. Long short-term memory. Neural Comput 9, 1735–1780 (1997).
Article CAS PubMed Google Scholar
Reichert, D. P. & Serre, T. Neuronal synchrony in complex-valued deep networks. In Proc. 2nd International Conference on Learning Representations (2014).
Gray, C. M. & Singer, W. Stimulus-specific neuronal oscillations in orientation columns of cat visual cortex. Proc. Natl Acad. Sci. USA 86, 1698–1702 (1989).
Article CAS PubMed PubMed Central Google Scholar
Hummel, J. E. & Biederman, I. Dynamic binding in a neural network for shape recognition. Psychol. Rev. 99, 480–517 (1992).
Article CAS PubMed Google Scholar
Fries, P. Rhythms for cognition: communication through coherence. Neuron 88, 220–235 (2015).
Article CAS PubMed PubMed Central Google Scholar
Rao, R. P. N. & Ballard, D. H. Predictive coding in the visual cortex: a functional interpretation of some extra-classical receptive-field effects. Nat. Neurosci. 2, 79–87 (1999).
Article CAS PubMed Google Scholar
Higgins, I. et al. Towards a definition of disentangled representations. Preprint at http://arxiv.org/abs/1812.02230 (2018).
Feldman, J. What is a visual object? Trends Cogn. Sci. 7, 252–256 (2003).
Article PubMed Google Scholar
Pouget, A., Beck, J. M., Ma, W. J. & Latham, P. E. Probabilistic brains: knowns and unknowns. Nat. Neurosci. 16, 1170–1178 (2013).
Article CAS PubMed PubMed Central Google Scholar
Lee, T. S. & Mumford, D. Hierarchical Bayesian inference in the visual cortex. JOSA A 20, 1434–1448 (2003).
Article PubMed Google Scholar
Dayan, P., Hinton, G. E., Neal, R. M. & Zemel, R. S. The Helmholtz machine. Neural Comput. 7, 889–904 (1995).
Article CAS PubMed Google Scholar
Stuhlmüller, A., Taylor, J. & Goodman, N. Learning stochastic inverses. In Proc. Advances in Neural Information Processing Systems 26 (eds Burges, C. J. et al.) 3048–3056 (Curran Associates, 2013).
Bergen, R. S. van. & Kriegeskorte, N. Going in circles is the way forward: the role of recurrence in visual inference. Curr. Opinion Neurobiol. 65, 176–193 (2020).
Article CAS Google Scholar
von der Heydt, R., Friedman, H. S. & Zhou, H. Searching for the neural mechanisms of color filling-in. In Filling-in: From perceptual completion to cortical reorganization (eds. Pessoa, L. & De Weerd, P.) 106–127 (Oxford Univ. Press, 2003).
Kogo, N. & Wagemans, J. The ‘side’ matters: how configurality is reflected in completion. Cogn. Neurosci. 4, 31–45 (2013).
Article PubMed Google Scholar
Craft, E., Schütze, H., Niebur, E. & von der Heydt, R. A neural model of figure-ground organization. J. Neurophysiol. 97, 4310–4326 (2007).
Article PubMed Google Scholar
Grossberg, S. & Mingolla, E. Neural dynamics of form perception: boundary completion, illusory figures, and neon color spreading. Psychol. Rev. 92, 173–211 (1985).
Article CAS PubMed Google Scholar
Mingolla, E., Ross, W. & Grossberg, S. A neural network for enhancing boundaries and surfaces in synthetic aperture radar images. Neural Netw. 12, 499–511 (1999).
Article PubMed Google Scholar
Zhaoping, L. Border ownership from intracortical interactions in visual area V2. Neuron 47, 143–153 (2005).
Article PubMed CAS Google Scholar
Fukushima, K. Neural network model for completing occluded contours. Neural Netw. 23, 528–540 (2010).
Article PubMed Google Scholar
Tu, Z. & Zhu, S.-C. Image segmentation by data-driven Markov chain Monte Carlo. IEEE Trans. Pattern Anal. 24, 657–673 (2002).
Article Google Scholar
Fukushima, K. Restoring partly occluded patterns: a neural network model. Neural Netw. 18, 33–43 (2005).
Article PubMed Google Scholar
Lücke, J., Turner, R., Sahani, M. & Henniges, M. Occlusive components analysis. In Proc. Advances in Neural Information Processing Systems 22 (eds Bengio, Y. et al.) 1069–1077 (Curran Associates, 2009).
Johnson, J. S. & Olshausen, B. A. The recognition of partially visible natural objects in the presence and absence of their occluders. Vis. Res. 45, 3262–3276 (2005).
Article PubMed Google Scholar
Koch, C. & Ullman, S. Shifts in Selective Visual Attention: Towards the Underlying Neural Circuitry. In Matters of Intelligence: Conceptual Structures in Cognitive Neuroscience (ed. Vaina, L. M.) 115–141 (Springer, 1987).
Tsotsos, J. K. et al. Modeling visual-attention via selective tuning. Artif. Int. 78, 507–545 (1995).
Article Google Scholar
Walther, D. & Koch, C. Modeling attention to salient proto-objects. Neural Netw. 19, 1395–1407 (2006).
Article PubMed Google Scholar
Kazanovich, Y. & Borisyuk, R. An oscillatory neural model of multiple object tracking. Neural Comput. 18, 1413–1440 (2006).
Article PubMed Google Scholar
Libby, A. & Buschman, T. J. Rotational dynamics reduce interference between sensory and memory representations. Nat. Neurosci. https://doi.org/10.1038/s41593-021-00821-9 (2021).
Barak, O. & Tsodyks, M. Working models of working memory. Curr. Opin. Neurobiol. 25, 20–24 (2014).
Article CAS PubMed Google Scholar
Durstewitz, D., Seamans, J. K. & Sejnowski, T. J. Neurocomputational models of working memory. Nat. Neurosci. 3, 1184–1191 (2000).
Article CAS PubMed Google Scholar
Compte, A. Synaptic mechanisms and network dynamics underlying spatial working memory in a cortical network model. Cerebral Cortex 10, 910–923 (2000).
Article CAS PubMed Google Scholar
Wang, X.-J. Synaptic reverberations underlying mnemonic persistent activity. Trends Neurosci. 24, 455–463 (2001).
Article CAS PubMed Google Scholar
Wimmer, K., Nykamp, D. Q., Constantinidis, C. & Compte, A. Bump attractor dynamics in prefrontal cortex explains behavioral precision in spatial working memory. Nature Neurosci. 17, 431–439 (2014).
Article CAS PubMed Google Scholar
Zenke, F., Agnes, E. J. & Gerstner, W. Diverse synaptic plasticity mechanisms orchestrated to form and retrieve memories in spiking neural networks. Nat. Commun. 6, 6922 (2015).
Article CAS PubMed Google Scholar
Mareschal, D., Plunkett, K. & Harris, P. A computational and neuropsychological account of object-oriented behaviours in infancy. Developmental Science 2, 306–317 (1999).
Article Google Scholar
Munakata, Y., Mcclelland, J. L., Johnson, M. H. & Siegler, R. S. Rethinking infant knowledge: toward an adaptive process account of successes and failures in object permanence tasks. Psychol. Rev. 104, 686–713 (1997).
Article CAS PubMed Google Scholar
Mi, Y., Katkov, M. & Tsodyks, M. Synaptic correlates of working memory capacity. Neuron 93, 323–330 (2017).
Article CAS PubMed Google Scholar
Mongillo, G., Barak, O. & Tsodyks, M. Synaptic theory of working memory. Science 319, 1543–1546 (2008).
Article CAS PubMed Google Scholar
Masse, N. Y., Yang, G. R., Song, H. F., Wang, X.-J. & Freedman, D. J. Circuit mechanisms for the maintenance and manipulation of information in working memory. Nat. Neurosci. 22, 1159–1167 (2019).
Article CAS PubMed PubMed Central Google Scholar
Chatham, C. H. & Badre, D. Multiple gates on working memory. Curr. Opin. Behav. Sci. 1, 23–31 (2015).
Article PubMed PubMed Central Google Scholar
Frank, M. J., Loughry, B. & O’Reilly, R. C. Interactions between frontal cortex and basal ganglia in working memory: a computational model. Cognitive 1, 137–160 (2001).
CAS Google Scholar
Gruber, A. J., Dayan, P., Gutkin, B. S. & Solla, S. A. Dopamine modulation in the basal ganglia locks the gate to working memory. J. Comput. Neurosci. 20, 153–166 (2006).
Article PubMed Google Scholar
O’Reilly, R. C. Biologically based computational models of high-level cognition. Science 314, 91–94 (2006).
Article PubMed CAS Google Scholar
Ciresan, D., Meier, U. & Schmidhuber, J. Multi-column deep neural networks for image classification. In Proc. 2012 IEEE Conference on Computer Vision and Pattern Recognition 3642–3649 (IEEE, 2012).
Schmidhuber, J. Deep learning in neural networks: an overview. Neural Netw. 61, 85–117 (2015).
Article PubMed Google Scholar
LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521, 436–444 (2015).
Article CAS Google Scholar
Zhou, B., Bau, D., Oliva, A. & Torralba, A. Interpreting deep visual representations via network dissection. IEEE Transactions on Pattern Analysis and Machine Intelligence 41, 2131–2145 (2019).
Article PubMed Google Scholar
Richards, B. A. et al. A deep learning framework for neuroscience. Nat. Neurosci. 22, 1761–1770 (2019).
Article CAS PubMed PubMed Central Google Scholar
Körding, K. P. & König, P. Supervised and unsupervised learning with two sites of synaptic integration. J Comput Neurosci 11, 207–215 (2001).
Article PubMed Google Scholar
Guerguiev, J., Lillicrap, T. P. & Richards, B. A. Towards deep learning with segregated dendrites. eLife 6, e22901 (2017).
Article PubMed PubMed Central Google Scholar
Scellier, B. & Bengio, Y. Equilibrium propagation: bridging the gap between energy-based models and backpropagation. Front. Comput. Neurosci. 11, 24 (2017).
Article PubMed PubMed Central Google Scholar
Roelfsema, P. E. & van Ooyen, A. Attention-gated reinforcement learning of internal representations for classification. Neural Comput. 17, 2176–2214 (2005).
Article PubMed Google Scholar
Crick, F. The recent excitement about neural networks. Nature 337, 129–132 (1989).
Article CAS PubMed Google Scholar
Lillicrap, T. P., Santoro, A., Marris, L., Akerman, C. J. & Hinton, G. Backpropagation and the brain. Nat. Rev. Neurosci. 21, 335–346 (2020).
Article CAS PubMed Google Scholar
Szegedy, C., Ioffe, S., Vanhoucke, V. & Alemi, A. A. Inception-v4, inception-ResNet and the impact of residual connections on learning. In Proc. of the Thirty-First AAAI Conference on Artificial Intelligence 4278–4284 (2017).
Khaligh-Razavi, S. M. & Kriegeskorte, N. Deep supervised, but not unsupervised, models may explain IT cortical representation. PLoS Comput. Biol. 10, e1003915 (2014).
Article PubMed PubMed Central CAS Google Scholar
Güçlü, U. & Gerven, M. A. J. V. Deep neural networks reveal a gradient in the complexity of neural representations across the ventral stream. J. Neurosci. 35, 10005–10014 (2015).
Article PubMed PubMed Central CAS Google Scholar
Yamins, D. L. K. et al. Performance-optimized hierarchical models predict neural responses in higher visual cortex. Proc. Natl Acad. Sci. USA 111, 8619–8624 (2014).
Article CAS PubMed PubMed Central Google Scholar
Kriegeskorte, N. Deep neural networks: a new framework for modeling biological vision and brain information processing. Ann. Rev. Vis. Sci. 1, 417–446 (2015).
Article Google Scholar
Yamins, D. L. K. & DiCarlo, J. J. Using goal-driven deep learning models to understand sensory cortex. Nat. Neurosci. 19, 356–365 (2016).
Article CAS PubMed Google Scholar
Baker, N., Lu, H., Erlikhman, G. & Kellman, P. J. Deep convolutional networks do not classify based on global object shape. PLoS Comput. Biol. 14, e1006613 (2018).
Article PubMed PubMed Central CAS Google Scholar
Brendel, W. & Bethge, M. Approximating CNNs with Bag-of-local-Features models works surprisingly well on ImageNet. In Proc. 7th International Conference on Learning Representations (OpenReview.net, 2019).
He, K., Gkioxari, G., Dollar, P. & Girshick, R. Mask R-CNN. In Proc. IEEE International Conference on Computer Vision 2017 2980–2988 (IEEE, 2017).
Pinheiro, P. O., Collobert, R. & Dollar, P. Learning to segment object candidates. In Proc. Advances in Neural Information Processing Systems 28 (eds Cortes, C. et al.) 1990–1998 (Curran Associates, 2015).
Luo, W. et al. Multiple object tracking: a literature review. Artificial Intelligence 293, 103448 (2021).
Article Google Scholar
Girshick, R., Donahue, J., Darrell, T. & Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proc. IEEE Conference on Computer Vision and Pattern Recognition 580–587 (IEEE, 2014).
Bisley, J. W. & Goldberg, M. E. Attention, intention, and priority in the parietal lobe. Ann. Rev. Neurosci. 33, 1–21 (2010).
Article CAS PubMed Google Scholar
Burgess, C. P. et al. MONet: unsupervised scene decomposition and representation. Preprint at http://arxiv.org/abs/1901.11390 (2019).
Locatello, F. et al. Object-centric learning with slot attention. In Proc. Advances in Neural Information Processing Systems 33 (eds. Larochelle, H. et al.) 11525–11538 (Curran Associates, 2020).
Eslami, S. M. A. et al. Attend, infer, repeat: fast scene understanding with generative models. In Proc. Advances in Neural Information Processing Systems 29 (eds. Lee, D. et al.) (Curran Associates, 2016).
Wu, J., Lu, E., Kohli, P., Freeman, B. & Tenenbaum, J. Learning to see physics via visual de-animation. In Proc. Advances in Neural Information Processing Systems 30 (eds. Guyon, I. et al.) (Curran Associates, 2017).
Spoerer, C. J., McClure, P. & Kriegeskorte, N. Recurrent convolutional neural networks: a better model of biological object recognition. Front. Psychology 8, 1551 (2017).
Article Google Scholar
Kubilius, J. et al. Brain-like object recognition with high-performing shallow recurrent ANNs. In Proc. Advances in Neural Information Processing Systems 32 (eds. Wallach, H. et al.) (Curran Associates, 2019).
Kietzmann, T. C. et al. Recurrence is required to capture the representational dynamics of the human visual system. Proc. Natl Acad. Sci. USA 116, 21854–21863 (2019).
Article CAS PubMed PubMed Central Google Scholar
Spoerer, C. J., Kietzmann, T. C., Mehrer, J., Charest, I. & Kriegeskorte, N. Recurrent neural networks can explain flexible trading of speed and accuracy in biological vision. PLoS Comput. Biol. 16, e1008215 (2020).
Article CAS PubMed PubMed Central Google Scholar
O’Reilly, R. C., Wyatte, D., Herd, S., Mingus, B. & Jilk, D. J. Recurrent processing during object recognition. Front. Psychol. 4, 124 (2013).
Article PubMed PubMed Central Google Scholar
Wyatte, D., Jilk, D. J. & O’Reilly, R. C. Early recurrent feedback facilitates visual object recognition under challenging conditions. Front. Psychol. 5, 674 (2014).
Article PubMed PubMed Central Google Scholar
Linsley, D., Kim, J. & Serre, T. Sample-efficient image segmentation through recurrence. Preprint at https://arxiv.org/abs/1811.11356v3 (2018).
Engelcke, M., Kosiorek, A. R., Jones, O. P. & Posner, I. GENESIS: generative scene inference and sampling witho object-centric latent representations. In Proc. 8th International Conference on Learning Representations (OpenReview.net, 2020).
Steenkiste, S. van, Chang, M., Greff, K. & Schmidhuber, J. Relational neural expectation maximization: unsupervised discovery of objects and their interactions. In Proc. 6th International Conference on Learning Representations (OpenReview.net, 2018).
Greff, K. et al. Multi-object representation learning with iterative variational inference. In Proc. 36th International Conference on Machine Learning (eds Chaudhuri, K. & Salakhutdinov, R.) 2424–2433 (PLMR, 2019).
Swan, G. & Wyble, B. The binding pool: a model of shared neural resources for distinct items in visual working memory. Atten. Percept. Psychophys. 76, 2136–2157 (2014).
Article PubMed Google Scholar
Schneegans, S. & Bays, P. M. Neural architecture for feature binding in visual working memory. J. Neurosci. 37, 3913–3925 (2017).
Article CAS PubMed PubMed Central Google Scholar
Matthey, L., Bays, P. M. & Dayan, P. A probabilistic palimpsest model of visual short-term memory. PLoS Comput. Biol. 11, e1004003 (2015).
Article PubMed PubMed Central CAS Google Scholar
Sabour, S., Frosst, N. & Hinton, G. E. Dynamic routing between capsules. In Proc. Advances in Neural Information Processing Systems 30 (eds. Guyon, I. et al.) (Curran Associates, 2017).
Xu, Z. et al. Unsupervised discovery of parts, structure, and dynamics. In Proc. 7th International Conference on Learning Representations (OpenReview.net, 2019).
Kosiorek, A., Sabour, S., Teh, Y. W. & Hinton, G. E. Stacked capsule autoencoders. In Proc. Advances in Neural Information Processing Systems 32 (eds. Wallach, H. et al.) (Curran Associates, 2019).
Pelli, D. G. & Tillman, K. A. The uncrowded window of object recognition. Nat. Neurosci. 11, 1129–1135 (2008).
Article CAS PubMed PubMed Central Google Scholar
Sayim, B., Westheimer, G. & Herzog, M. Gestalt factors modulate basic spatial vision. Psychol. Sci. 21, 641–644 (2010).
Article CAS PubMed Google Scholar
Doerig, A., Schmittwilken, L., Sayim, B., Manassi, M. & Herzog, M. H. Capsule networks as recurrent models of grouping and segmentation. PLoS Computational Biology 16, 1–19 (2020).
Article CAS Google Scholar
Battaglia, P. W. et al. Relational inductive biases, deep learning, and graph networks. Preprint at http://arxiv.org/abs/1806.01261 (2018).
Scarselli, F., Gori, M., Tsoi, A. C., Hagenbuchner, M. & Monfardini, G. The graph neural network model. IEEE Trans. Neural Netw. 20, 61–80 (2009).
Article PubMed Google Scholar
Hsieh, J.-T., Liu, B., Huang, D.-A., Fei-Fei, L. F. & Niebles, J. C. Learning to decompose and disentangle representations for video prediction. In Proc. Advances in Neural Information Processing Systems 31 (eds. Bengio, S. et al.) 515–524 (Curran Associates, 2018).
Whittington, J. C. R. et al. The Tolman-Eichenbaum machine: unifying space and relational memory through generalization in the hippocampal formation. Cell 183, 1249–1263 (2020).
Article CAS PubMed PubMed Central Google Scholar
Sutton, R. S. & Barto, A. G. Reinforcement Learning: An Introduction (MIT press, 2018).
Botvinick, M. et al. Reinforcement learning, fast and slow. Trends Cogn. Sci. 23, 408–422 (2019).
Article PubMed Google Scholar
LeCun, Y. The power and limits of deep learning. Res. Technol. Manage. 61, 22–27 (2018).
Article Google Scholar
Kingma, D. P. & Welling, M. Auto-encoding variational Bayes. In Proc. 2nd International Conference on Learning Representations (eds. Bengio, Y. & LeCun, Y.) (openreview.net, 2014).
Rezende, D. J., Mohamed, S. & Wierstra, D. Stochastic backpropagation and approximate inference in deep generative models. In Proc. 31st International Conference on Machine Learning 32 (eds. Xing, E. P. & Jebara, T.)1278–1286 (PMLR, 2014).
Goodfellow, I. et al. Generative adversarial nets. In Proc. Advances in Neural Information Processing Systems 27 (eds. Ghahramani, Z. et al.) (Curran Associates, 2014).
Weis, M. A. et al. Unmasking the inductive biases of unsupervised object representations for video sequences. Preprint at http://arxiv.org/abs/2006.07034 (2020).
Veerapaneni, R. et al. Entity abstraction in visual model-based reinforcement learning. In Proc. Conference on Robot Learning 100 (eds. Kaelbling, L. P. et al.) 1439–1456 (PMLR, 2020).
Watters, N., Tenenbaum, J. & Jazayeri, M. Modular object-oriented games: a task framework for reinforcement learning, psychology, and neuroscience. Preprint at http://arxiv.org/abs/2102.12616 (2021).
Leibo, J. Z. et al. Psychlab: a psychology laboratory for deep reinforcement learning agents. Preprint at http://arxiv.org/abs/1801.08116 (2018).
Deng, J. et al. Imagenet: a large-scale hierarchical image database. In Proc. 2009 IEEE Conference on Computer Vision and Pattern Recognition 248–255 (IEEE, 2009).
Geiger, A., Lenz, P. & Urtasun, R. Are we ready for autonomous driving? The KITTI vision benchmark suite. In Proc. Conference on Computer Vision and Pattern Recognition (CVPR) (IEEE, 2012).
Sullivan, J., Mei, M., Perfors, A., Wojcik, E. & Frank, M. C. SAYCam: a large, longitudinal audiovisual dataset recorded from the infant’s perspective. Open Mind 5, 20–29 (2021).
Article PubMed PubMed Central Google Scholar
Daw, N. D., Gershman, S. J., Seymour, B., Dayan, P. & Dolan, R. J. Model-based influences on humans’ choices and striatal prediction errors. Neuron 69, 1204–1215 (2011).
Article CAS PubMed PubMed Central Google Scholar
Green, D. M., & Swets, J. A. Signal Detection Theory and Psychophysics (Wiley, 1966).
Rust, N. C. & Movshon, J. A. In praise of artifice. Nat. Neurosci. 8, 1647–1650 (2005).
Article CAS PubMed Google Scholar
Wu, M. C.-K., David, S. V. & Gallant, J. L. Complete functional characterization of sensory neurons by system identification. Ann. Rev. Neurosci. 29, 477–505 (2006).
Article CAS PubMed Google Scholar
Ullman, S. Visual routines. Cognition 18, 97–159 (1984).
Article CAS PubMed Google Scholar
Jolicoeur, P., Ullman, S. & Mackay, M. Curve tracing: a possible basic operation in the perception of spatial relations. Mem. Cogn. 14, 129–140 (1986).
Article CAS Google Scholar
Ballard, D. H., Hayhoe, M. M., Pook, P. K. & Rao, R. P. N. Deictic codes for the embodiment of cognition. Behav. Brain Sci. 20, 723–742 (1997).
Article CAS PubMed Google Scholar
Geirhos, R. et al. Generalisation in humans and deep neural networks. In Proc. Advances in Neural Information Processing Systems 31 (eds. Bengio, S. et al.) (Curran Associates, 2018).
Barbu, A. et al. ObjectNet: a large-scale bias-controlled dataset for pushing the limits of object recognition models. In Proc. Advances in Neural Information Processing Systems 32 (eds. Wallach, H. et al.) (Curran Associates, 2019).
Blaser, E., Pylyshyn, Z. W. & Holcombe, A. O. Tracking an object through feature space. Nature 408, 196–199 (2000).
Article CAS PubMed Google Scholar
Johansson, G. Visual perception of biological motion and a model for its analysis. Percept. Psychophys. 14, 201–211 (1973).
Article Google Scholar
Schrimpf, M. et al. Integrative benchmarking to advance neurally mechanistic models of human intelligence. Neuron 108, 413–423 (2020).
Article CAS PubMed Google Scholar
Judd, T., Durand, F. & Torralba, A. A Benchmark of Computational Models of Saliency to Predict Human Fixations Technical Report (MIT, 2012).
Kümmerer, M., Wallis, T. S. A., Gatys, L. A. & Bethge, M. Understanding low- and high-level contributions to fixation prediction. In Proc. 2017 IEEE International Conference on Computer Vision 4799–4808 (IEEE, 2017).
Ma, W. J. & Peters, B. A neural network walks into a lab: towards using deep nets as models for human behavior. Preprint at http://arxiv.org/abs/2005.02181 (2020).
Peterson, J., Battleday, R., Griffiths, T. & Russakovsky, O. Human uncertainty makes classification more robust. In 2019 IEEE/CVF International Conference on Computer Vision (ICCV) 9616–9625 (IEEE, 2019).
Bakhtin, A., van der Maaten, L., Johnson, J., Gustafson, L. & Girshick, R. PHYRE: a new benchmark for physical reasoning. In Proc. Advances in Neural Information Processing Systems 32 (eds. Wallach, H. et al.) (Curran Associates, 2019).
Yi, K. et al. CLEVRER: collision events for video representation and reasoning. In Proc. 8th International Conference on Learning Representations (OpenReview.net, 2020).
Riochet, R. et al. IntPhys: a framework and benchmark for visual intuitive physics reasoning. CoRR, abs/1803.07616 (2018).
Baradel, F., Neverova, N., Mille, J., Mori, G. & Wolf, C. CoPhy: counterfactual learning of physical dynamics. In Proc. 8th International Conference on Learning Representations (OpenReview.net, 2020).
Girdhar, R. & Ramanan, D. CATER: A diagnostic dataset for compositional actions & temporal reasoning. In Proc. 8th International Conference on Learning Representations (OpenReview.net, 2020).
Allen, K. R., Smith, K. A. & Tenenbaum, J. B. Rapid trial-and-error learning with simulation supports flexible tool use and physical reasoning. Proc. Natl Acad. Sci. USA 117, 29302–29310 (2020).
Article CAS PubMed PubMed Central Google Scholar
Beyret, B. et al. The Animal-AI environment: training and testing animal-like artificial cognition. Preprint at http://arxiv.org/abs/1909.07483 (2019).
Kanizsa, G. Margini quasi-percettivi in campi con stimolazione omogenea. Riv. Psicol. 49, 7–30 (1955).
Google Scholar
Kanizsa, G. Amodale ergänzung und ‘erwartungsfehler’ des gestaltpsychologen. Psychol. Forsch. 33, 325–344 (1970).
Article CAS PubMed Google Scholar
Eslami, S. M. A. et al. Neural scene representation and rendering. Science 360, 1204–1210 (2018).
Article CAS PubMed Google Scholar
Beattie, C. et al. {DeepMind} {Lab}. Preprint at http://arxiv.org/abs/1612.03801 (2016).
Kolve, E. et al. AI2-THOR: an interactive 3D environment for visual AI. Preprint at https://arxiv.org/abs/1712.05474 (2017).
Lin, T.-Y. et al. Microsoft COCO: common objects in context. In Computer Vision—ECCV 2014, Lecture Notes in Computer Science (eds Fleet, D. et al.) 740–755 (Springer, 2014).
Milan, A., Leal-Taixe, L., Reid, I., Roth, S. & Schindler, K. MOT16: a benchmark for multi-object tracking. Preprint at http://arxiv.org/abs/1603.00831 (2016).
Mahler, J. et al. Learning ambidextrous robot grasping policies. Sci. Robot. 4, eaau4984 (2019).
Article PubMed Google Scholar
Pitkow, X. Exact feature probabilities in images with occlusion. J. Vis. 10, 42 (2010).
Article PubMed Google Scholar
O’Reilly, R. C., Busby, R. S. & Soto, R. Three forms of binding and their neural substrates: Alternatives to temporal synchrony. In The unity of consciousness: Binding, integration, and dissociation (ed. Cleeremans, A.) 168–190 (Oxford Univ. Press, 2003).
Hummel, J. E. et al. A solution to the binding problem for compositional connectionism. In AAAI Fall Symposium - Technical Report (eds. Levy, S. D. & Gayler, R.) vol. FS-04-03 31–34 (AAAI Press, 2004).
Hinton, G. E., McClelland, J. L. & Rumelhart, D. E. Distributed Representations. In Parallel Distributed Processing: Explorations in the Microstructure of Cognition: Foundations (eds. Rumelhart, D. E. & McClelland, J. L.) 77–109 (MIT Press, 1987).
Treisman, A. Solutions to the binding problem: progress through controversy and convergence. Neuron 24, 105–125 (1999).
Article CAS PubMed Google Scholar
Ballard, D. H., Hinton, G. E. & Sejnowski, T. J. Parallel visual computation. Nature 306, 21–26 (1983).
Article CAS PubMed Google Scholar
Smolensky, P. Tensor product variable binding and the representation of symbolic structures in connectionist systems. Artif. Int. 46, 159–216 (1990).
Article Google Scholar
Feldman, J. A. Dynamic connections in neural networks. Biol. Cybernet. 46, 27–39 (1982).
Article CAS Google Scholar
Von Der Malsburg, C. Am I thinking assemblies? Brain Theory 161–176 (1986)
Reynolds, J. H. & Desimone, R. The role of neural mechanisms of attention in solving the binding problem. Neuron 24, 19–29 (1999).
Article CAS PubMed Google Scholar
Shadlen, M. N. & Movshon, J. A. Synchrony unbound: a critical evaluation of the temporal binding hypothesis. Neuron 24, 67–77 (1999).
Article CAS PubMed Google Scholar

Download references

Acknowledgements

B.P. has received funding from the EU Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement no. 841578.

Author information

Authors and Affiliations

Mortimer B. Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY, USA
Benjamin Peters & Nikolaus Kriegeskorte
Department of Psychology, Columbia University, New York, NY, USA
Nikolaus Kriegeskorte
Department of Neuroscience, Columbia University, New York, NY, USA
Nikolaus Kriegeskorte
Department of Electrical Engineering, Columbia University, New York, NY, USA
Nikolaus Kriegeskorte

Authors

Benjamin Peters
View author publications
You can also search for this author in PubMed Google Scholar
Nikolaus Kriegeskorte
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Benjamin Peters or Nikolaus Kriegeskorte.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Peters, B., Kriegeskorte, N. Capturing the objects of vision with neural networks. Nat Hum Behav 5, 1127–1144 (2021). https://doi.org/10.1038/s41562-021-01194-6

Download citation

Received: 21 September 2019
Accepted: 06 August 2021
Published: 20 September 2021
Issue Date: September 2021
DOI: https://doi.org/10.1038/s41562-021-01194-6

This article is cited by

Intelligent Recognition Using Ultralight Multifunctional Nano-Layered Carbon Aerogel Sensors with Human-Like Tactile Perception
- Huiqi Zhao
- Yizheng Zhang
- Ya Yang
Nano-Micro Letters (2024)
Dementia in Convolutional Neural Networks: Using Deep Learning Models to Simulate Neurodegeneration of the Visual System
- Jasmine A. Moore
- Anup Tuladhar
- Nils D. Forkert
Neuroinformatics (2023)
Deep learning reveals what vocal bursts express in different cultures
- Jeffrey A. Brooks
- Panagiotis Tzirakis
- Alan S. Cowen
Nature Human Behaviour (2022)
A dolphin-inspired compact sonar for underwater acoustic imaging
- Hari Vishnu
- Matthias Hoffmann-Kuhnt
- Eszter Matrai
Communications Engineering (2022)