Can we decode a person's brain activity to determine what that person is perceiving? Interest in this question has recently surged as a result of the success and popularity of applying multivariate classification techniques to functional magnetic resonance imaging (fMRI) data1. Standard fMRI analyses average activity across all voxels in a given region of interest and then correlate this activity with stimulus or task conditions. In contrast, classification techniques harness the entire pattern of activity observed across multiple voxels to predict which stimulus or task condition the subject is in. Classification techniques are limited, however, because they can only distinguish among a handful of predetermined states; for example, whether the subject saw a face or a house. Is it possible to overcome this limitation and obtain more detailed information about a person's mental state?

Recent fMRI studies2,3 have advanced beyond classification by using brain activity measurements to identify, out of a set of potential images, the specific image that the subject saw. One study4 even showed that it is possible to reconstruct the actual image that was seen, rather than simply choosing the image from a known set. However, the resolution and accuracy of the reconstructions in this early study were somewhat low. A new study by Miyawaki et al.5 uses sophisticated decoding techniques to achieve high-quality image reconstructions (Fig. 1).

Figure 1: Schematic of visual image reconstruction performed by Miyawaki et al.5
figure 1

Flickering checkerboard patterns arranged on a 10 × 10 grid were shown to each subject while fMRI signals were recorded from early visual cortex. The recorded signals were then used to reconstruct the images that the subjects had seen.

Miyawaki et al.5 began their experiment by constructing contrast-defined images4; these images consisted of a 10 × 10 grid, in which each element was either gray (zero contrast) or filled with a flickering checkerboard pattern (full contrast). The authors presented a large number of these contrast-defined images to each subject while simultaneously recording fMRI signals from early visual areas (V1, V2 and V3). Next, they developed a reconstruction model and fit it to their data. In the first stage of the model, the authors used linear combinations of voxel responses to predict the amount of contrast in local regions of the stimulus. This technique works well because individual voxels in early visual areas reliably signal the amount of contrast in their spatial receptive fields2,4,6. In the second stage, they combined the predicted contrasts for the various stimulus regions into a single image that represents the estimated pattern of contrast the subject saw. Finally, the authors tested their reconstruction model using separate data that was reserved for this purpose. Reconstruction accuracy was quantified by correlating reconstructed images with the actual images seen by each subject.

An interesting aspect of the reconstruction model used by Miyawaki et al.5 is that images are represented in terms of many overlapping regions that occur at different scales. The use of overlapping regions complicates reconstruction because it requires the estimation of the relative contributions of the various regions to the final reconstructed image. On the other hand, the use of multiple scales enhances reconstruction performance because voxels convey contrast information at different scales. For example, peripheral voxels have larger spatial receptive fields than foveal voxels2,6 and therefore convey relatively more information about the contrast of large stimulus regions. By decoding contrast information at different scales, the reconstruction model maximizes the amount of information that is extracted from each voxel.

The work of Miyawaki et al.5 constitutes the latest development in a long series of visual decoding studies that have emerged over the years. Although reconstruction is qualitatively different from identification and classification, all decoding studies are similar in that they establish a systematic mapping between visual stimuli and brain activity (Fig. 2). In some studies5,7,8,9,10, the directionality of the mapping is from brain activity to the stimulus, and decoding is achieved by simply evaluating the mapping. In other studies2,3,4, the directionality of the mapping is from the stimulus to brain activity, and decoding is achieved via an inversion procedure. Which of these strategies yields better performance depends on the situation and one strategy may be easier to implement than the other in some cases. Studies also vary with respect to the type of stimulus representation used. For example, some studies7,8 represent stimuli in terms of labels associated with different object categories, whereas Miyawaki et al.5 represent stimuli in terms of local contrast.

Figure 2: All visual decoding studies2,3,4,5,7,8,9,10 establish a systematic mapping between visual stimuli and brain activity.
figure 2

Some studies learn a direct mapping from brain activity to the stimulus (arrows from right to left) and then perform decoding by simply evaluating the mapping. Other studies first learn a mapping from the stimulus to brain activity (arrows from left to right) and then perform decoding via an inversion procedure. Studies also differ in the type of stimulus representation that they use. Some studies operate on the raw stimulus (that is, pixel luminance values), whereas others operate on various transformations of the stimulus. These transformations include calculating the contrast in local regions of the stimulus, using labels that represent different object categories or grating orientations, and representing the stimulus in terms of a semantic basis set. Finally, studies differ in whether they classify, identify or reconstruct images. Note that ref. 4 evaluated two decoding methods, but this figure depicts only their 'inverse reconstruction' method).

Although the reconstructions achieved by Miyawaki et al.5 are impressive, they are not perfect. One potential way to improve reconstruction accuracy would be to harness information conveyed by voxel responses in higher visual areas, such as V4. This is a challenging endeavor given that we have only a rudimentary understanding of how visual areas beyond V1 represent stimuli. Other ways to improve reconstruction accuracy include increasing the spatial resolution of fMRI signals and reducing measurement noise. Both of these strategies would effectively increase the amount of information available from early visual areas and would not require any change in the decoding method. Moreover, several MRI techniques for increasing spatial resolution and signal-to-noise ratio are already available, such as the use of ultra-high magnetic fields11 and parallel imaging.

Another avenue for future research is to increase the resolution of reconstructed images. However, the importance of resolution should not be overemphasized, as images with a resolution of just 32 × 32 can already convey a vast amount of information12. A more useful extension would be to expand the type of reconstructions that are possible. The simple artificial images used by Miyawaki et al.5 are defined entirely by differences in local contrast, but this is not the case for the natural images that we see in daily life. At this time, no published studies have achieved reconstruction of natural images from fMRI measurements of brain activity. Direct application of the reconstruction model of Miyawaki et al.5 to activity evoked by a natural image would not produce satisfactory results, as reducing a natural image to a 10 × 10 pattern of contrast produces an image that bears little resemblance to the original natural image. If the resolution of the Miyawaki et al.5 model were enhanced, the reconstructed images would probably capture many of the edges in natural images, but other important image features (such as surfaces) would be missed. Accurate reconstruction of natural images from fMRI measurements will likely require new decoding techniques and a better understanding of the statistical structure of natural images.

Brain decoding has many potential applications. For example, a recent study has shown that it is possible to control a prosthetic device using neural activity in motor cortex13. One enticing possibility suggested by the results of Miyawaki et al.5 is to use fMRI measurements to reconstruct the contents of visual imagery or perhaps even dreams. Whether current decoding techniques can be successfully extended to these subjective perceptual states depends on whether the neural processes that mediate these states are similar to those involved in normal perception14. Current evidence suggests that this is more likely to be the case in higher visual areas15, but some success in reconstructing imagined stimuli from activity in lower visual areas has already been demonstrated4. Thus, reconstruction of the subjective contents of human perception may soon be a reality.