You look up from your desk and instantly recognize the person that has just walked into your office. Unlike the mathematical proof you were busy constructing, recognizing your colleague does not seem to require any mental effort at all. But the apparent ease of this task belies its computational difficulty, and the ability of the primate brain to recognize complex objects, such as faces, in complex environments, such as offices, is a pinnacle of evolutionary achievement. Investigating the brain mechanisms underlying object recognition is also very difficult. However, as they describe on page 692 of this issue, Afraz and colleagues1 have successfully given us new traction on the problem.

Although we still have only a rudimentary understanding of how the brain accomplishes visual object recognition, the work of many groups has uncovered a hierarchically ordered series of brain areas likely to support this remarkable feat2. One of the most intriguing observations is that neurons at the highest levels of this series respond strongly when their 'preferred' object is in view, often regardless of its exact position, size, contrast, pose and even background clutter. Creating this neuronal response property is the computational crux of object recognition, as even small populations of these neurons can categorize objects with remarkable speed3. We know that this neuronal activity is correlated with object perception by the brain's owner4, and even with his or her imagination of objects5. Yet there has been no direct evidence that these neurons actually cause subjects to perceive objects. Afraz and colleagues1 now provide such evidence.

In particular, the authors show that artificial activation of groups of high-level visual neurons that prefer one class of object — faces — causes subjects to tend to perceive faces, even if a face is not present. Working with rhesus monkeys, a well-studied animal model of human vision, the authors focused on the putative highest visual brain area in this species — the inferior temporal cortex (IT) — where neurons respond highly selectively to visually presented faces and other complex objects as outlined above. The researchers used fine microelectrodes to sample the activity of neurons at many locations across IT, and positioned the microelectrode at locations where many neurons responded preferentially to faces — that is, responded with high activity only when images of faces were shown to the animal. To ask if IT neuronal activity somehow causes perception, the authors injected tiny amounts of electrical current through the microelectrode tip; this method is called microstimulation, and is known to activate neurons within several hundred micrometres6.

The prevailing, but previously unsubstantiated view is that IT neurons that respond preferentially to an object are somehow responsible for allowing the subject to perceive and report the presence of that object. So artificial activation of face-preferring neurons by microstimulation might, in theory, lead to face perception.

But how does one ask a monkey what it perceives? To do this, Afraz et al.1 drew on well-established behavioural methods7. They first trained each monkey to look left if a face was presented and to look right if a non-face object was presented. Once the animals had mastered this face-detection task, the authors made the task more challenging by degrading the object images with varying amounts of visual noise (see Fig. 1a of their paper1 on page 692). Averaged over many behavioural trials, this procedure provides a sensitive perceptual assay of the monkey's tendency to report 'seeing' a face rather than a non-face.

Using these methods, the authors showed that, in behavioural trials in which microstimulation was applied to face-preferring neurons, monkeys had a reliably greater tendency to report seeing a face, relative to trials in which no microstimulation was applied. For example, even when the visual stimulus contained no object at all, but only noise, microstimulation increased the chance of the animal 'seeing' a face in that noise. Importantly, these effects were both temporally and spatially specific — they were stronger when microstimulation was applied during a brief time interval in which IT neurons normally respond to visual stimuli, and they did not occur when microstimulation was applied at non-face-preferring IT locations.

Together, these exciting results show that targeted microstimulation of IT induces face perception — probably through the activation of groups of face-selective IT neurons. Because neuronal activity is almost universally believed to underlie conscious perception, it is at one level not surprising that artificial neuronal activation leads to perceptual changes, and this has been shown in other visual areas7. However, Afraz and colleagues' study directly implicates IT neurons as causal in face perception, and also suggests that the millisecond-by-millisecond detail of their activity is not as important as the fact that they are active. Moreover, because the strong microstimulation used activates perhaps thousands of nearby neurons6, the study provides firm support for the idea that nearby IT neurons have similar functions8. Indeed, it is noteworthy that the study used faces, as they are an ethologically important object class, and face-selective neurons may be clustered together to an unusually strong degree9,10. It remains to be seen if artificially induced perceptual shifts can be obtained for other object classes.

Beyond directly implicating IT in object recognition, such work might eventually allow the creation of visual prosthetics to interact with the brain's high-level object memories. For example, one can imagine even very low-density electrode devices that could activate combinations of IT neuronal clusters to produce awareness of objects (such as a face, car, dog, table, cup, and so on) that are in front of a blind user. However, for deep understanding, the key to progress will be to move from the phenomenology of IT responses to underlying computational mechanisms. In particular, we already know that relatively simple, biologically plausible circuits can 'read out' object identity from IT neurons over behaviourally relevant timescales3. But the central problem is understanding how the brain constructs those IT responses in the first place11,12. That is, how do you recognize that person that just walked into your office?