The auditory system may be less well understood than the visual system, but it is no less remarkable. Vibrations on the eardrums must be analyzed to yield not only pitch, but also speech and music, as well as sound location and its changes through time. Three papers in this issue examine how the human brain analyzes sound, illustrating the complexity of the tasks needed to build up our rich perception of the auditory world.

All auditory processing requires the integration of signals over time. For real-world sounds, this means analyzing multiple time scales, from the milliseconds of pitch to the seconds, even minutes, that define speech and music. Griffiths et al. (on page 422) use PET imaging to ask which brain areas are involved in processing pitch and melody. The popular account of pitch perception is that vibration at a given frequency activates hair cells in a restricted region of the cochlea. This cochlear frequency map is projected (via the brainstem, inferior colliculus and thalamus) to the auditory cortex; thus frequency can be represented through the early stages of the auditory system by a simple place code.

In reality, however, things are much more complex. Pitch perception can also arise from temporal information in a noisy stimulus, even though there is no energy peak at the corresponding frequency, and the simple place-coding model cannot account for the perception of pitch from such stimuli. Instead, temporal regularities in the stimulus must be conveyed to the brain through the timing of action potentials, in other words by a temporal code rather than a place code. Griffiths et al. show that this signal has been decoded by the time it reaches the primary auditory cortex. They then take advantage of these noisy stimuli to identify brain regions that are specifically activated by melody. The beauty of this design is that by varying the amount of temporal information in the stimulus, they can cause the melody to emerge gradually from the background noise. In this way, they identify cortical regions whose activity correlates with the emergence of melody, but not with notes per se. These areas are distinct from the primary auditory cortex, and the authors speculate that they may also be involved in processing speech.

The ability to discriminate speech sounds develops at a very young age. On page 351, Cheour and colleagues describe what is apparently the earliest known neural correlate of this process. Behavioral tests show that early language exposure affects a child's ability to discriminate phonemes; young babies can discriminate a wide range of sounds, but they gradually lose the ability to make discriminations that are not important within their native language. (One example is the inability of many Japanese speakers to distinguish the sounds /l/ and /r/.) This has been described as the 'perceptual magnet effect'; sounds sufficiently similar to a prototypic vowel or consonant are 'captured' and all perceived as examples of that sound, so that differences between them (which might be meaningful in another language) are undetected.

Cheour and colleagues study the emergence of this effect in the first year of life, using scalp electrodes to record the responses of Finnish and Estonian children to sounds that are either common to both languages or unique to Estonian. They take advantage of a phenomenon called mismatch negativity (MMN). When a series of repeated sounds is interrupted by an unexpected 'oddball', it produces a distinctive electrical signal, believed to originate in or near the primary auditory cortex. The size of the MMN response presumably indicates a mismatch between the neural representations of the two sounds. At six months, Finnish children show MMN signals that correlate with the acoustical differences between the different vowels. At one year, however, they show a stronger response to the oddball sound that is a vowel in their native language than to the acoustically more dissimilar vowel that is unique to Estonian. By contrast, Estonian one-year-olds, who have been exposed to both vowels, show responses that correlate with acoustical differences.

Like the visual system, the auditory system analyzes not only 'what' but also 'where'. Humans can localize sounds to within a few degrees, not only in the horizontal but also in the vertical (elevation) dimension. Although horizontal location can be extracted from differences in loudness and timing between the two ears, this cannot explain how we perceive elevation. This complex computation depends on the folded shape of the ears. The outer ear causes an elaborate transformation of the sound spectrum that varies with elevation, as illustrated by Van Opstal and colleagues (page 417) in their Fig. 1. To localize the source of the sound, the brain must compare the actual (transformed) sound against an assumed template.

Ears, though, come in many different varieties; how does the brain know the shape of its owner's ears? Presumably the transformation function must be learned, and the authors have tested this by altering their subjects' ears with plastic implants. Although the implants initially disrupt perception of sound elevation, subjects learn to localize accurately with their new ears within a few weeks. Horizontal localization remains normal throughout, indicating that despite the apparent seamlessness of auditory space, the two axes arise by independent neural mechanisms. Surprisingly, subjects perform normally as soon as the implants are removed. In other words, unlike other forms of sensory adaptation (for instance visual adaptation to prism glasses), the original map is preserved alongside the newly acquired one.

Whereas the visual system of humans resembles that of other primates both physiologically and psychophysically, it is uncertain to what extent this is true for the auditory system. Language is a uniquely human ability, but human sound localization and pitch perception seem to share features in common with other animals. The challenge for the field will be to find ways to appropriately relate human and animal studies, which will be essential if we are to fully understand our ability to hear.