Main

The different vowel sounds in normal speech are produced by adjusting the position of the tongue, jaw and lips so that the vocal tract resonates at certain specific frequencies (Ri). The vocal folds vibrate at the pitch frequency, f0, which is usually lower than the Ri. Those harmonics of f0 that fall near the Ri produce associated peaks, or formants (Fi), in the speech spectrum1. To oversimplify a little, in Western languages vowels occupy characteristic positions on a 'map' of (F2,F1) or (R2,R1).

Singers who are trained in the Western classical tradition need to make themselves heard in large auditoria, sometimes with loud accompaniment. In the high soprano range, f0 enters the range at which human hearing sensitivity is greatest, so tuning R1 to match f0 is a possibility that could produce a sound with a loudness and timbre that vary less with pitch, and which is louder for constant effort2,3. As vowel identifiability is inevitably compromised when f0 greatly exceeds R1, this further loss of comprehensibility is tolerable. Indeed, sopranos are often taught to lower the jaw or to 'smile' as they ascend a scale; both of these actions increase mouth opening, which in turn increases R1 (refs 4, 5).

We measured the resonances of the tract directly by using a previously described technique6,7. Briefly, an acoustic current source (diameter, 10 mm), synthesized from frequencies spaced at 5.38 Hz over the range 0.2–4.5 kHz, and a microphone (diameter, 8 mm) are placed just below the singer's mouth, in contact with the lower lip. The acoustic current is adjusted so that the pressure spectrum, Pclosed, is independent of frequency when the singer's mouth is closed. The ratio of the spectrum measured with the mouth open to that with the mouth closed shows peaks that indicate the resonances of the tract6 as well the harmonics present in the voice signal (Fig. 1a). The shift in resonance frequency due to the reduction in area for radiation caused by the presence of the apparatus lay within the resolution of these experiments (±11 Hz).

Figure 1: Simultaneous measurements of the resonance frequencies of the vocal tract and the harmonics present in the voice signal.
figure 1

a, The ratio of the spectrum measured with the mouth open, to that with the mouth closed (Popen/Pclosed) when the vowel in the word 'hard' is sung at A4. Several harmonics of the voice signal with fundamental frequency f0 = 440 Hz can be seen. The maxima in the broad band signal corresponding to the resonances R1, R2, R3 and R4 are indicated by arrows. b, The lowest resonance frequency, R1, as a function of the pitch frequency f0. The pitch is indicated in musical terms by the rotated treble clef at the top of the figure.

Eight sopranos (four professional, four advanced students), with an average of nine years' classical training, sang notes sustained for 4 s without vibrato in an ascending diatonic scale indicated by a glockenspiel before each note. They sang one of four vowels presented in writing in the form 'h<vowel>d' ('hard', 'hoard', 'who'd', 'heard'). They sang piano (softly) to avoid saturating the microphone and to improve the signal-to-noise ratio.

When f0 was less than the value of R1 for normal speech, the resonance for each vowel was roughly constant (Fig. 1b). This is consistent with expectations, as (R1,R2) characterizes vowels. However, when f0 exceeded this value of R1, the tuning line R1 = f0 was followed. This trend continued to 1 kHz for the vowels that do not use lip rounding ('hard', 'heard'), but for the vowels that use lip rounding ('hoard', 'who'd'), the data fell below the tuning line near 1 kHz. This may be because, with the lips rounded, it is uncomfortable or anatomically impossible to raise R1 to 1 kHz.

This large shift in R1 at high pitch means not only that vowels are shifted significantly on the vowel plane, but also that they will overlap considerably and converge to separations that are smaller than the characteristic length, λ, at which they become confused with one another6. This helps to explain the well-known difficulty in identifying words sung in the high range by sopranos8,9, and may be one of the reasons why opera houses often use surtitles even for operas sung in the native language of their audience.