Why We See What We Do: An Empirical Theory of Vision

  • Dale Purves &
  • R. Beau Lotto
Sinauer Associates: 2002. 263 pp. $42.95, £27.99
Seeing is believing: in the Cornsweet effect, the different brightness where the two central blocks meet creates the false impression that the rest of the lower block must be lighter than the upper block.

The standard model of vision, which this book ambitiously sets out to replace, begins with the retinal image being transduced into electrical signals by rods and cones. It is then analysed by banks of low-level sensors, such as retinal ganglion cells. To some extent these impose their own structures on the image, causing visual illusions such as Mach bands — the bright or dark regions perceived at the interface between bright and dark areas. The information transformed in this way is then used by higher-level mechanisms to infer the most likely structure of the outside world that caused the retinal images. This reconstruction can never be more than a best guess, however, because it is based on incomplete information from the two-dimensional image.

George Berkeley, in his Essay Towards a New Theory of Vision in 1709, thought the retinal image so unreliable that the true state of affairs had to be established by touch. The nineteenth-century physicist and physiologist Hermann von Helmholtz relented somewhat, and thought that the visual system could to some extent educate itself; he described vision as a process in which we infer the nature of the outside world from its images on the retina.

Fast forward to modern computational vision, as popularized by David Marr in his book Vision (W. H. Freeman, 1982), and we find that vision has been redefined. It is now described as an ill-posed mathematical problem (akin to deriving the number and speed of boats in a busy harbour from the ripples that they cause), but one that can generally be solved satisfactorily by making use of constraints derived from the most likely structure of the world.

Purves and Lotto will have none of it, except for the ambiguous nature of the retinal image, which is their starting proposition. Their 'empirical theory of vision' is that when we have a particular optical image on the retina, we experience a whole set of mental images with which it has been associated in the past, by either contiguity or similarity. We then have a perception that is some sort of combination of these images, either a mixture, a sum or a 'centroid' (it is not always clear which). At least, this is what I think they are saying. Perhaps the example of Mach bands will make the theory clear.

Mach bands arise at the extrema of linear luminance gradients, which we can think of as fuzzy shadows between bright and dark areas. On the bright side, observers see a thin white band; on the dark side they see a dark band. The Austrian physicist Ernst Mach deduced from this that some early filtering mechanism must be extracting the second derivative of the luminance profile, a guess later proved correct for retinal ganglion cells under appropriate conditions of illumination. Purves and Lotto show that these bands can also arise in real luminance profiles, such as those from the edge of a curved-edge cube lit from above and in shadow underneath. This is an interesting observation but doesn't explain why we see the bands when there are none in the image.

The crucial paragraph (and the pivotal section of the book) says, in part, this: “If the visual system evolved to see luminance relationships wholly on the basis of past experience ... then the perceptual response elicited by the Mach stimulus will necessarily incorporate into the ensuing percept the relative contribution of the sources to similar and even more distantly related stimuli in the past, including, in proportion to their frequency of occurrence, the highlights and lowlights in the luminance gradients associated with curved surfaces.”

I read this as saying that some linear luminance gradients have real Mach bands, and therefore, that all linear luminance gradients have some Mach bands. Because they are all similar? Because they are associated in temporal contiguity? The answer is not clear to me. What is sure, however, is that this is indeed a radical new theory. It has nothing to do with inferring the most likely nature of the outside object, as the bands are incidental features of illumination.

This impression is reinforced by analysis of the Chubb illusion, in which observers see a medium-contrast patch of texture apparently reduced in contrast when it is surrounded by texture with higher contrast. There is, as it happens, a plausible physiological explanation of this effect using lateral gain-control pathways in the visual cortex, but — as in the case of Mach bands, which are present in the output of retinal ganglion cells — Purves and Lotto have other ideas. The real explanation, they say, is that the image that evokes the Chubb illusion is similar to one that evokes the impression of translucence. They show that the Chubb illusion is abolished if the image is altered to make translucence a less likely interpretation. Our perception must therefore incorporate a little bit of translucence, making the low-contrast region of the image appear to have even lower contrast. This is the reverse of what we would expect from a reconstructionist theory such as that proposed by Helmholtz, according to which we would correct for the loss of transmission caused by translucence and see the patch more as it really is — as having higher contrast.

The sections of the book on brightness are the most interesting and challenging, with stunning illustrations and thought-provoking demonstrations. I found the other sections less compelling, particularly the bit about geometrical illusions such as the Müller–Lyer and Poggendorf effects. For example, the authors ignore Glass's suggestion that the latter effect is enhanced by optical blur, along with virtually all other relevant psychophysical work. In general, the book is much stronger on polemic than on hypothesis testing.

After reading the book, I wasn't sure that I had understood the theory, so perhaps I have not done it justice. The summary at the end talks of retinal stimuli triggering 'reflex responses' that have been learned over time. There have been several behavioural theories of perception: is this another? Concerning reflexes, it is interesting to think of the celebrated McCullough effect, in which prolonged inspection of tilted, coloured lines causes achromatic lines of the same (but no other) tilt to have the complimentary hue to the 'conditioning' colour. According to the 'empirical theory', shouldn't they have the same colour? Shouldn't the 'waterfall after-effect' go in the same direction as the bayesian-biasing waterfall? Perhaps the theory can be refuted after all.