Published online 29 April 1999 | Nature | doi:10.1038/news990429-2


Backwards Bohemian Rhapsody

My favourite party trick is to play my friends a tape of Queen’s 1975 hit Bohemian Rhapsody, especially the part in the middle in which singer Freddie Mercury launches into a his own 30-second Italian opera. The trick is that I play it backwards. Everyone is fooled - but only for about two seconds, after which everyone collapses in the laughter of recognition. For even when played backwards, Freddie Mercury’s mock-operatic chorus of “Oooh, Lellilag! Oooh, Lellilag!” is instantly recognizable as “Galileo! Galileo!”

The recognition of speech is more than a matter of decoding the strings of sounds that make up words. The context of speech - its intonation and the cadences of the words - is just as important. Every comedian knows that it’s not the joke that matters, but the way it is told; any tourist in a strange land knows that gestures and expressions convey a wealth of meaning, and anyone with an infant or a dog knows that meaning can be conveyed as much by the tone of your voice as by what you actually say. If Rover is told sternly enough not to dig up the dahlias, Rover will refrain from doing so (or at least look very guilty.)

The importance of context in speech perception is displayed by an amazing set of results published in the 29 April issue of Nature, in which two researchers from California show that people find spoken sentences intelligible - even if every part of the sentence is played backwards. Moreover, people can learn to perceived time-reversed word-fragments as intelligible words. The secret is that even though each individual part of the sentence is played backwards, the structure of the sentence is preserved. Listeners get the meaning not only from the words, but the overall modulation of the sentence.

The researchers - Kourosh Saberi of Caltech, Pasadena, and David R. Perrott of California State University, Los Angeles, California - recorded a sentence, chopped it into segments 50 milliseconds in length, and played each segment backwards. However, they did not change the order of the segments. So, although the sentence was globally contiguous, it was time-reversed at every point - yet listeners made perfect sense of it.

50 milliseconds (ms) is a rather short interval, so the researchers doubled the length of each segment to 100 ms and repeated the trick. Surprisingly, intelligibility dropped just a few per cent. Even with segments of 100 ms, in which the odd-numbered segments were shuffled 200 ms out of sequence, intelligibility dropped by just 15 per cent.

Remarkably, when people were asked to listen repeatedly to sentences made of 100-ms-long time-reversed segments, they reported that previously unintelligible words became clear. “This type of ‘learning’ is not simply due to an improvement in identification”, say the researchers, “as the subjects say they can now hear actual words”. This phenomenon is similar to the way that people become familiar with words spoken by people with a thick accent - after a while, even the densest dialect becomes intelligible without trouble.

The work shows that the brain works on several different levels when decoding speech. Such redundancy is an advantage. Speech is, after all extremely complex: ordinary spoken sentences comprise a rapid-fire barrage of around twenty distinct sonic particles (or ‘phonemes’) each second, so we need to work with other cues if we happen to miss something. In the well-known ‘cocktail-party effect’, we can perceive someone’s speech quite clearly even against background noise, and if the sentence is interrupted - perhaps by a cough - our brain fills in the gaps. It seems that cues derived from the context could be the source of such interpretations. Now, everyone join in after me - “Oooh, Lellilag! Oooh, Lellilag!”