Computational neuroscience

Intimate attention

Article metrics

If we are to perceive a visual figure, we need to direct our attention towards it. The more we discover about this ability, the more impressive it seems.

Looking at Edgar Rubin's famous image of a white vase on a dark background — or is it two dark faces on a white background? — we can alter our perception by directing our attention to either vase or face (Fig. 1). The neural basis for this ability to perceive at will some parts of a visual scene as 'figure' and others as 'background' is a hotly debated area of research. On page 196 of this issue1, Blaser and colleagues show that, by directing our attention in this way, we can distinguish even between two intimately associated 'figures' with almost identical characteristics. Their results have profound implications for how figures are represented in the brain's visual cortex.

Figure 1: Is it a vase or two faces?

In Rubin's classic illustration, attention can select either object as the 'figure', thereby altering perception.

The context of this work1 is a debate over whether 'attention' is free to select arbitrary visual locations and attributes (any location, colour or shape of interest to the observer) as the figure, or whether it must select 'visual objects'2,3. Here, a visual object is defined as a cluster of locations and attributes that are linked by the Gestalt rules of visual 'completion'. What these rules mean for the visual cortex is that neuronal responses to a given visual stimulus often depend on the context — that is, on other nearby stimuli4. There are strong arguments that the neuronal architecture of the visual cortex must limit the selective capabilities of attention5,6,7,8. This is because there do not seem to be enough feedback connections to the early visual cortex to allow attention to affect arbitrary combinations of neurons.

To study how an observer divides a visual scene into 'figure' and 'background', Blaser et al.1 build on their earlier work and use 'attentional tracking'9,10. This stratagem, which pushes attentional capabilities to the limit, involves a dynamic visual display in which two or more possible figures change beyond recognition several times. If observers can maintain attention on one particular figure (that is, 'track' it successfully while it changes), they can say which part of the final display is descended from the initial figure.

Blaser et al. have now constructed 'mutable objects' with three independent attributes — orientation, width of stripes, and colour saturation — that change rapidly and continually, each following an independent schedule (Fig. 2). The objects sometimes turn clockwise and sometimes anticlockwise. Stripe width sometimes increases and sometimes decreases. And the degree of colour saturation likewise sometimes increases (towards stripes of red on black) and sometimes decreases (towards stripes of light grey on black).

Figure 2: What does attention select as 'figure' when two objects have almost identical characteristics?

To address this question, Blaser et al.1 superimposed two objects, each with three attributes — orientation, stripe width, and degree of colour saturation. In the experiment, all attributes changed rapidly and continuously over time. In the snapshot shown here, object 1 has narrow stripes that are pink on black and orientated at + 45°, whereas object 2 has broad stripes that are red on black and orientated at − 45°. The objects were superimposed (3) and observers were asked to maintain attention on one of the objects as the attributes changed. Even after several seconds, observers were able to state which object in the changed display had descended from the initial object. Further results suggest that attention may select all three attributes of one object as 'figure', but not a single attribute or a mixture of attributes from both objects.

The authors superimposed two of these mutable objects (Fig. 2). Observers were asked to focus attention on one object, specified by its orientation, and to maintain attention on this object for ten seconds, during which time both objects changed. The impression of the observers was that, most of the time, the two objects seemed to remain perceptually distinct. At the end of the ten-second period, observers were able to report reliably which object — again specified by orientation — had descended from the initial object. In other words, observers were able to select one object as the figure, and to relegate the other to the background.

This ability to distinguish between objects found at the same location, and assuming the same attributes at different times, reveals hitherto unsuspected capabilities of visual attention. But what does attention select as the figure — the tracked object as a whole, comprising its orientation, stripe width and colour saturation? Or does it merely select the task-relevant attribute — in this case, orientation?

To answer this question, Blaser et al. used a modified display of two mutable objects and asked observers to monitor either one or two attributes of the tracked object. During the tracking period, all attributes of both objects showed simultaneous, discontinuous jumps, which observers had to report. Observers were able to monitor any two attributes of the same object about as well as they monitored either attribute alone. The implication is that all attributes of the tracked object are perceived as 'figure'. (If only task-relevant attributes were perceived as figure, performance would be expected to deteriorate when more attributes are monitored.) In a control experiment, observers were asked to monitor one attribute from each object. Here, performance was far worse, showing the difficulty of tracking both objects at once.

In short, attention seems to select all the attributes of one object — even those not immediately relevant to the task in hand — as 'figure'. But it does not seem able to select one attribute of one object, or a mixture of attributes from both objects. This means that attention selects not individual attributes or locations, but rather visual objects as a whole (that is, a set of locations and attributes linked by Gestalt rules). This result is all the more striking because — unlike in some earlier studies — the deck was not stacked in favour of whole-object selection. It would clearly have been advantageous to select a single attribute or a mixture of attributes from each object had it been possible.

How might objects be selected in the visual cortex? And does the apparent restriction to selecting objects — rather than arbitrary sets of attributes — reflect the limitations of neuronal hardware? A conceptually simple scenario is that attention enhances the representation of one object's attributes while attenuating those of the other. But how can attention single out the attributes of just one object? Attributes of the other are encoded in closely overlapping neuronal populations, and the anatomy of the visual cortex seems to support only relatively coarse and unspecific attentional feedback.

Perhaps one (task-relevant) attribute is selected at first, with feedback spreading to other (non-task-relevant) attributes of the same object through the neural equivalent of Gestalt rules. Alternatively, it is possible that the initial selection is based on the location. Although superimposed, the two objects would have been large enough to allow attention initially to select a small part of one object, and then spread to other parts, again through Gestalt rules. To see this point, bear in mind that the objects stimulated, in visual cortical area V1 alone, neurons in a cortical region of some 100 mm2 and several dozens of neuronal 'hypercolumns'.

It remains to be seen exactly how attention can distinguish between objects represented by populations of neurons that are so intimately entwined. But at the very least, the striking capabilities of visual attention revealed by Blaser et al.1 give us new reasons to think hard about how objects are represented in the visual cortex.


  1. 1

    Blaser, E., Pylyshyn, Z. W. & Holcombe, A. O. Nature 408, 196– 199 (2000).

  2. 2

    Roelfsema, P. R. et al. Nature 395, 376–381 (1998).

  3. 3

    O'Craven, E. M. et al. Nature 401, 584–587 (1999).

  4. 4

    Kapadia, M. K. et al. J. Neurophysiol. 84, 2048– 2062 (2000).

  5. 5

    Olshausen, B. A. et al. J. Comp. Neurosci. 2, 45– 62 (1995).

  6. 6

    Deneve, S. et al. Nature Neurosci. 2, 740– 745 (1999).

  7. 7

    Yeshurun, Y. & Carrasco, M. Nature 396, 72–75 (1998).

  8. 8

    Lee, D. K. et al. Nature Neurosci. 2, 375– 381 (1999).

  9. 9

    Pylyshyn, Z. W. & Storm, R. W. Spatial Vis. 3, 179–197 ( 1988).

  10. 10

    Cavanagh, P. Science 11, 1563–1565 ( 1992).

Download references

Author information

Correspondence to Jochen Braun.

Rights and permissions

Reprints and Permissions

About this article

Further reading


By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.