Spontaneous variability predicts compensative motor response in vocal pitch control

Tachibana, Ryosuke O.; Xu, Mingdi; Hashimoto, Ryu-ichiro; Homae, Fumitaka; Okanoya, Kazuo

doi:10.1038/s41598-022-22453-0

Download PDF

Article
Open access
Published: 22 October 2022

Spontaneous variability predicts compensative motor response in vocal pitch control

Ryosuke O. Tachibana^1,2,
Mingdi Xu^3,4,
Ryu-ichiro Hashimoto^4,5,
Fumitaka Homae^4,5 &
…
Kazuo Okanoya^1,2,6,7

Scientific Reports volume 12, Article number: 17740 (2022) Cite this article

1307 Accesses
23 Altmetric
Metrics details

Subjects

Abstract

Our motor system uses sensory feedback to keep desired performance. From this view, motor fluctuation is not simply ‘noise’ inevitably caused in the nervous system but would play a role in generating variations to explore better outcomes via sensory feedback. Vocalization system offers a good model for studying such sensory-motor interactions since we regulate vocalization by hearing our own voice. This behavior is typically observed as compensatory responses in vocalized pitch, or fundamental frequency (f_o), when artificial f_o shifts were induced in the auditory feedback. However, the relationship between adaptive regulation and motor exploration in vocalization has remained unclear. Here we investigated behavioral variability in spontaneous vocal f_o and compensatory responses against f_o shifts in the feedback, and demonstrated that larger spontaneous fluctuation correlates with greater compensation in vocal f_o. This correlation was found in slow components (≤ 5 Hz) of the spontaneous fluctuation but not in fast components (between 6 and 30 Hz), and the slow one was amplified during the compensatory responses. Furthermore, the compensatory ratio was reduced when large f_o shifts were applied to the auditory feedback, as if reflecting the range of motor exploration. All these findings consistently suggest the functional role of motor variability in the exploration of better vocal outcomes.

Temporal scaling of motor cortical dynamics reveals hierarchical control of vocal production

Article 30 January 2024

Relationships between vocal pitch perception and production: a developmental perspective

Article Open access 03 March 2020

Multiple processes of vocal sensory-motor interaction in primate auditory cortex

Article Open access 10 April 2024

Introduction

Precise control of vocal pitch, or fundamental frequency (f_o), is essential for human communication since the vocal f_o is a dominant cue for prosodies in speaking or melodies in singing. A key aspect of vocal control is hearing one’s own voice, or the auditory feedback. Talkers regulate their own vocal f_o by canceling out subtle f_o deviations induced in the auditory feedback^1,2,3,4. For example, shifting up vocal f_o in the auditory feedback elicits a response shifting down f_o in the vocalization. Such compensatory vocal response does not always cancel out the shift completely, but rather remains around half or less of the induced shift with large individual differences^5,6,7,8. Investigating mechanisms underlying the compensatory responses for vocal f_o regulation provides opportunities to understand the adaptive audio-vocal system, which plays a critical role in our vocal control.

Recent studies in animal vocalizations, particularly in birdsongs, have suggested that variability in vocal features contributes to vocal adjustment against errors induced in the auditory feedback^{9,10,11,12,13}. Adult songbirds typically vocalize stereotypic songs that contain almost identical acoustical patterns across renditions while exhibiting slight but unignorable variations in their acoustical features, such as f_o. These variations have been reported to contribute to maintaining the song quality^13,14,15. In particular, the f_o shifts in the auditory feedback elicit compensative responses of vocal f_o in birds’ song syllables¹⁶. The amount of this compensation becomes larger when distributions of original and shifted f_o variations are more overlapped^9,12, linking the wider variability with the greater vocal adaptations. It has also been shown that temporal patterns of f_o fluctuation within a brief sound element have a role in keeping and improving the song quality^15,17. Intriguingly, the vocal variability in birdsongs is not simply due to the intrinsic noise in the peripheral motor system, but a certain amount of them is actively generated by a dedicated circuit that is necessary for song learning^17,18,19,20. These findings in songbirds’ vocalization have supported the idea that motor variations contribute to adaptive controls by generating motor exploration^11,21,22. Such mechanisms for songbirds’ vocal control could be shared with humans²³, especially when considering the behavioral and neural parallels between these two species for vocalization development^{24,25,26,27,28}.

In contrast, relationships between variability and adaptability in human vocal control have not been well understood. Variability in the human vocal f_o appears to consist of several components reflecting different sources or mechanisms. These components have been classified according to their dominant frequencies in the modulation spectrum, an amplitude spectrum of f_o changing frequency (modulation frequency). For example, a quasi-periodic f_o fluctuation during singing (or vibrato) has been reported to show a peak around 4–7 Hz on the modulation spectrum, with greater stability in trained singers^29,30,31. In contrast, non-periodic components at relatively higher modulation frequencies at 10–20 Hz, or fine fluctuation^32,33,34, have been reported to be involved in the perception of voice quality both in speaking³² and singing³³. Such aperiodic fast fluctuation is likely due to the physiological instability of peripheral vocal organs³⁵, and hence, is less or not controllable for the central nervous system. These reports lead to a question of whether and to what extent these different components of variability could contribute to vocal regulation.

Here, we assessed associations between vocal compensatory responses against auditory feedback modifications and spontaneous variabilities of different components in vocal f_o trajectories. We tested the idea that the spontaneous variation of motor output plays a role in widening the range of exploration to pursue better performance (i.e., the motor exploration hypothesis). This hypothesis predicts that people who exhibit larger spontaneous variability in vocal f_o will show greater compensation against f_o shifts induced in the auditory feedback. In our experiment, the vocal f_o in the auditory feedback was modified while the participant was vocalizing, and the ratio of compensation in the vocalized f_o was measured for each participant. We quantified individual vocal variability that was spontaneously generated in vocalizations with unmodified feedback after separating the variability components into different modulation frequency bands. Correlation analyses between the variability and the compensation ratio across participants revealed a greater correlation for slowly fluctuating components than fast fluctuations that are likely to be less controllable in the central nervous system. Further analysis showed that the compensatory response shares the same frequency range with that of the slow component in the spontaneous fluctuation. These results are consistent with our hypothesis that spontaneous variability subserves motor explorations to enhance compensatory response against perturbations in the auditory feedback.

Results

Variety of the compensation ratio across participants

In the experiment, participants were asked to continuously produce isolated vowels for 2 s twice while listening to auditory feedback via headphones, and only the second voice was modified in its feedback (Fig. 1A; see “Methods” for detail). We found a clear tendency of compensation (cancelling out) for /a/ trials in vocalized f_o against the artificially induced f_o shifts in auditory feedback (Fig. 1B). The amount of compensation was almost proportional to the amount of seven f_o shifts (0, ± 25, ± 50, or ± 100 cents), as already shown in our previous study³⁶. We calculated the compensation ratio for each participant, which was defined as a sign-inverted slope of a fitted line to compensation amounts as a function of introduced f_o shifts (Fig. 1C). The obtained compensation ratio varied across participants ranging from − 0.13 to 0.82 (0.39 ± 0.21 [mean ± SD]; Fig. 1D). Note that we described results obtained from /a/-vocalize trials at first, and then assessed their generalities with /u/ trials later (see “Influence of perception and other factors” subsection).

Variability in slow component of spontaneous fluctuations correlated with the compensation ratio

To assess to what extent the motor variability is related to the adjustment, we performed correlation analyses between the compensation ratio and several types of f_o variability. Note that we only included participants who showed compensatory responses (i.e., positive value in the compensation ratio), which resulted in excluding two out of forty participants from further analysis. To quantify vocal variability that was spontaneously generated without external perturbations, we calculated the standard deviation (SD) of an original f_o trajectory of the first vocalization (no f_o shift presented) in each trial. The mean of all SDs was defined as the variability of whole frequency components (“whole”) for each participant. This variability ranged from 8.55 to 23.87 (14.19 ± 3.72) cents. We found that the whole variability significantly correlated with the compensation ratio (Fig. 2A; Pearson’s correlation coefficient r = 0.37, sample size n = 38, p = 0.021). Then, we aimed to divide the whole variability into slow and fast fluctuating components according to the modulation spectrum of the spontaneous f_o fluctuation that was calculated by the 1/2-octave-band filter-bank method. The obtained modulation spectrum (Fig. 2B) showed apparent two peaks at modulation frequencies of 2–3 Hz and 6–10 Hz, suggesting two different variability components. None of the participants exhibited a sharp peak around 4–7 Hz corresponding to the presence of the vibrato component^29,30,31. Thus, we defined slowly and rapidly changing components, termed as “slow” and “fast” components as having modulation frequency ranges of ≤ 5 Hz and 6–30 Hz, respectively (Fig. 2C). Obtained variabilities of slow and fast components ranged from 7.99 to 22.52 (13.07 ± 3.72) and from 2.04 to 6.93 (3.50 ± 3.72) cents, respectively.

The correlation analysis between these variabilities and the compensation ratio revealed that the slow component showed a significant correlation (Fig. 2D; r = 0.38, n = 38, p = 0.019), whereas the fast component did not (Fig. 2E; r = 0.20, n = 38, p = 0.231). Moreover, to confirm the relative impact of each modulation frequency band on the compensation, we calculated the correlation coefficients between compensation ratios and variability values in each of the subbands that were derived from the modulation spectrum analysis. This analysis showed consistent results (Fig. 2F) that the slow component (less than 4 Hz in modulation frequency) exhibited a greater correlation with the compensation ratio, but the fast one (higher than 5 Hz) did not.

Increase of slow component in compensatory response

To assess which frequency component in the f_o trajectory the participants used to compensate for the f_o shifts in auditory feedback, we compared variabilities in the second vocalizations (with f_o shifts) with the first one (no shifts). We found significantly larger variability in ± 100-cent shift conditions for the slow component of the second vocalization than the first one (Fig. 3A; paired t-test, t(37) = 9.36, p < 0.001) but not for the fast component (Fig. 3B; paired t-test, t(37) = 0.19, p = 0.851). The variability difference of the second from the first vocalization increased with the increment in the f_o shift amount for the slow component (Fig. 3C) while remaining constant around zero for the fast one (Fig. 3D). These results indicated that the compensatory f_o changes contain the same ranges in modulation frequencies with the slow component of spontaneously generated vocal variability (i.e., without f_o shifts in auditory feedback). Further, we calculated the 2nd-1st variability difference in each subband derived by the modulation filter bank to confirm the modulation frequency of the compensatory f_o movement in response to auditory feedback modifications. The result (Fig. 3E) clearly depicted that the slow modulation component, which was correlated with the compensation ratio in the spontaneous fluctuation (see Fig. 2F), exhibited an extra variability in the compensatory vocal responses. This coincident finding strongly supported the idea that spontaneous variability in the slow components plays a critical role in the compensation.

Compensation decreased with large f _o shift

Based on the motor exploration hypothesis, we predict that the ratio of compensation to induced shift becomes small when the shift is large, as explained as follows. The spontaneous variability would work as the motor exploration. If a target f_o is within the exploration range, then the participant can find the target and adjust his/her voice toward the target. Given a certain amount of spontaneous variability, the originally intended f_o will be outside of the motor exploration range with a large f_o shift (Fig. 4A). This can reduce opportunities to find the correct (intended) f_o during vocalization, and hence, decrease the compensation ratio for such large shifts. We tested this possibility by calculating percent amounts of compensation for each of the three shift magnitudes after pooling data for positive and negative shifts by inverting its sign (Fig. 4B). Then, we found a statistically significant effect of the shift condition factor (one-way repeated ANOVA, F(2,74) = 3.97, p = 0.023). The post hoc analysis showed a significant difference between 50- and 100-cent shifts (Tukey–Kramer test; p = 0.002) and a marginal difference between 25- and 100-cent (p = 0.058), but not between 25- and 50-cent (p = 0.988). While the compensation in 100-cent shifts was less than in others, its correlation with the variability of the slow component was still significant (Fig. 4C; r = 0.37, n = 38, p = 0.022). These results consistently supported the motor exploration hypothesis in vocal control.

Influence of perception and other factors

We also assessed other factors that potentially affected the compensation process, such as the perceptual ability to detect a subtle difference in vocal pitch. For this aim, we estimated participants’ ability to detect the f_o shifts induced in recorded own voices using a dataset from the listening tests performed in our previous study³⁶. In this test, participants were asked to answer whether any pitch modification occurred in the second vocalization compared with the first one in each trial (Fig. 5A). We estimated the detection threshold and accuracy for noticing the presence of f_o modification by fitting a sigmoid curve on the detection rate dataset (Fig. 5B; see “Methods” for details). Obtained detection thresholds and accuracies ranged from 26.91 to 108.25 (54.71 ± 16.69) cents and from 0.87 to 38.30 (14.13 ± 11.48) cents, respectively. We then tested correlations between these perceptual properties and the compensation ratio. The result showed that the compensation ratio did not significantly correlate with either the detection threshold (Fig. 5C; r = − 0.26, n = 38, p = 0.110) or accuracy (Fig. 5D; r = 0.07, n = 38, p = 0.694), suggesting that perceptual ability did not contribute to compensation in this case.

Vocalizing different vowels produced different amounts of compensatory response (Fig. 6A). The compensation ratio for /u/-vocalized trials was significantly smaller than that for /a/ trials (Fig. 6B; difference: − 0.16 ± 0.18; paired t-test: t(39) = 5.77, p < 0.001). Though with a reduced degree, the compensation ratio in /u/ vocalizations exhibited a significant correlation with the spontaneous variability in their slow components (Fig. 6C; r = 0.42, n = 37, p = 0.010) while did not in the fast component (Fig. 6D; r = 0.22, n = 37, p = 0.200). Note that three out of forty participants who showed negative values in the compensation ratio for /u/ vocalizations were excluded from the correlation analysis. This consistent result among different vowels further supports the finding that the larger slow component predicts the greater compensation.

The reduced impact of the f_o shift in the /u/ vocalizations might be caused by their softer loudness of the auditory feedback than that of /a/ trials because of narrower mouth openings. The amplitude level of recorded voices was significantly lower in /u/ than in /a/ trials (Fig. 6E; difference: − 6.5 ± 2.8 dB; paired t-test: t(39) = 14.57, p < 0.001), suggesting that the relative loudness of the auditory feedback (air-conducted sound) compared to the bone-conducted feedback was lower in /u/ than in /a/ vocalization. Moreover, we tested whether the amplitude of vocalization (hence, the loudness level of auditory feedback) affected the compensation ratio. However, the relative amplitude level was not significantly correlated with the compensation ratio (Fig. 6F, G; /a/: r = 0.06, n = 38, p = 0.743; /u/: r = 0.26, n = 37, p = 0.115).

Lastly, we performed a stepwise multiple regression analysis to find the most effective model to explain the variation of the compensation ratio amongst six explanatory variables: variability in slow and fast components, detection threshold and accuracy, voice amplitude, and talker gender. The best statistical model contained only the variability in slow component as an explanatory variable (model: adjusted R² = 0.12, df = 36, SSE = 0.168; slow component factor: t = 2.46, p = 0.019), indicating that the slow component is the main contributor for predicting the compensation ratio.

Discussion

Recent debates on tight links between motor variability and adaptive regulation have been along with the motor exploration hypothesis, showing practical evidence in songbirds’ vocalization^{9,12,13,15,16,37}, and in some other motor actions of humans²¹ or rodents³⁸. Here, we provide further evidence for this debate in human vocalizations by demonstrating that the spontaneous f_o variability is positively correlated with the ratio of compensatory response against f_o shift perturbations induced in the auditory feedback (Fig. 2A). This indicates that individual participants have different intrinsic levels of motor variability, and this individual difference drives how much that person compensates for the perturbation. Our result is consistent with a previous study that used sudden f_o shifts in the auditory feedback in the middle of vocalization⁸, suggesting the robustness of this finding despite methodological differences. Further analyses showed that the slowly fluctuating components but not the fast components had a greater impact on the compensatory response (Fig. 2D, E). In addition, the compensation ratio for the largest shift conditions (± 100 cent) showed a significant decrease comparing to other shift conditions (Fig. 4C), while still exhibiting significant correlation with the spontaneous variability of slow component (Fig. 4D). These findings are consistent with the motor exploration hypothesis, which suggests that spontaneous motor variability promotes motor explorations and contributes to compensative regulation, even in vocalization processes.

Our results further indicated that the slow components of spontaneous variability contributed more to the compensation than the fast one (Fig. 2), and the main component of the compensatory response shared the same frequency range with the slow component (Fig. 3). The fast fluctuation in vocal f_o has been recognized as “microtremor” which is an involuntary fluctuation caused by physical/physiological instability³⁵, suggesting that this component mainly consists of uncontrollable noise sources generated in the peripheral system. Such peripherally derived variability may not be well suited for adjustment-related motor exploration because of its uncontrollable nature²². In contrast, our results indicate that the slow component is controllable in the central nervous system because participants increased the amplitude of f_o fluctuation in the range of the slow component for compensatory responses. Thus, our results indicate that the slow component in spontaneous variability plays a central role in vocal compensation by generating motor exploration.

The present results fit well with the idea that variability in motor production contributes to learning by extending such exploration^21,22,39,40, and provide further evidence supporting the generality of this hypothesis in vocal control. An alternative explanation for the variability-compensation relationship could be possible based on a factor of the perceptual ability to detect f_o changes. A previous study of vocal f_o control reported that children with less sensitive pitch discrimination abilities showed larger compensations in response to sudden induced f_o shifts⁴¹, suggesting a possible impact of the auditory ability on the compensation ratio. However, our results of correlation analysis between perception and compensatory response (Fig. 5) did not support this idea since they were not significantly correlated. Thus, we rule out the influence of auditory abilities but interpret the spontaneous variability as the main factor explaining the individual difference in the compensation ratio. Such dissociation between auditory perception and vocal production has been observed in a substantial population, who sing poorly in pitch but have not any problem in their hearing ability for pitch discrimination⁴².

The compensatory response data were obtained from the time window of 0.8–1.2 s after the vocal onset. Previous studies have dissociated the compensation responses into early (100–150 ms) and late (≥ 300 ms) components according to their response consistency and instruction dependency, and have associated them with “brainstem” and “cortical” pathways, respectively^3,7. According to this dichotomy, our results obtained from the late response (0.8–1.2 s) could be associated with the cortical process. This view is consistent with findings in animal vocalization studies, which have demonstrated that interactions between the basal ganglia and cortex-homolog area play the main role in generating motor exploration and compensation for birds' song maintenance^14,43,44.

More generally, our study suggests a shared strategy in vocal adjustment mechanisms among songbirds and humans. It should be noted that previous songbird studies have focused on variability and adjustment in a trial-by-trial manner wherein researchers assessed updating changes in vocal acoustics of every song rendition^{9,10,12,13,14}. On the other hand, several studies have shown the importance of within-trial variability, or f_o fluctuations in one vocal element, for vocal adaptations^15,17. Our study here measured the variability as the fluctuation in each prolonged vowel production and the adjustment as the compensatory response observed within each trial in human vocalization, while the relationship between the trial-by-trial variability and adaptive learnings over trials should be tested in future studies. Many studies have shown potential parallels in these two species in vocal learning behaviors and their neural circuitries^24,25,27,28. Our results add further evidence of such parallels at the level of not only behavioral analogs but also computation for vocal adjustment.

Methods

Dataset

The dataset used in this study was originally obtained in our previous study³⁶. The present study analyzed this in different ways to elucidate the relationship between the spontaneous variability and compensation behavior in vocal control. In contrast, our previous study had focused on the influences of perceptual awareness on vocal responses against various modifications to acoustical features in the auditory feedback. The different vowel data (/u/-vocalized trials) were newly analyzed in the present study. The data were obtained from forty university students (20 females; 18–26 years old) without any experience in formal music training. All the experimental procedures were approved by the Human Subjects Ethics Committee of Tokyo Metropolitan University. All participants signed informed consent forms, and all experiments were performed following relevant guidelines and regulations.

The experimental procedure was identical to that described in our previous study³⁶. In brief, participants were asked to produce isolated vowels /a/ or /u/ according to the letter displayed on a computer screen while hearing auditory feedback via headphones. The auditory feedback was modified by a voice processor (Voice Worksplus, TC Helicon Vocal Technologies), and fedback to participants with masking pink noise. Participants vocalized the same vowel twice for 2 s each time with 1 s intermission in each trial, and only the second vocalization was modified in its feedback (Fig. 1A). There were 13 conditions in total for the second vocalization: 6 for pitch shifts, 6 for timbre shifts, and 1 for no shift as a control condition. In the pitch-shifted conditions, the voice spectrum was linearly expanded by ± 25, ± 50, or ± 100 cents (100 cents = one semitone), resulting in the shift of the fundamental frequency (f_o). The timbre-shifted conditions expanded only the spectral envelope by ± 3, ± 6, or ± 12 percent without changing f_o. There were 10 trials for each of the 13 conditions for each vowel. The order of the 260 trials was pseudo-randomized. Note that we only focused on vocal responses in the pitch-shifted conditions, but the timbre-shifted conditions were excluded from further analyses in this paper because they exhibited almost no compensative response for f_o (as reported in our previous paper³⁶). We analyzed the dataset for /a/-vowel trials at first and then assessed the generality with /u/-vowel trials since the compensatory responses for /a/ trials were clearer than that for /u/ trials (see Fig. 6A, B).

After vocalization sessions, we performed a listening session (denoted as “subjective test” in our previous paper³⁶) to test whether the participants noticed the sound modifications applied to their voices. We replayed participant’s voices that were recorded in two representative trials during the vocalization experiment. The participant was asked if they could perceive a change in pitch and/or timbre in the second voice compared with the first one. The present study used these responses to assess the participant’s perceptual ability to detect the presence of f_o shifts in the fedback voice.

Preprocessing

The f_o of vocal sound was calculated by Praat 6.1⁴⁵. The f_o calculation was performed by an adapted auto-correlation method implemented in the Praat (“To Pitch (ac)”), with 10-ms step, 40-ms window, and frequency boundaries between 75 and 600 Hz. The extracted f_o traces were converted into cent values in a logarithmic scale and obtained as follows: 1200 log₂ (f_o/f_base), where f_base is a base frequency (we arbitrarily used 55 Hz for the base though this does not change the final results).

We preprocessed the obtained dataset in two steps: alignment and refinement, as described below. We firstly aligned the data by time points of vocal onsets. In this process, the vocal onset and offset were detected from the amplitude envelopes (described below) with a threshold of the background level + 30 dB. The background level was estimated from silent parts of recordings for each participant. Then, we refined the aligned data by detaching or repairing unstable/misdetected data points as follows. Fragmented data points were connected by filling brief temporal gaps (≤ 40 ms) and removing short fragments (≤ 50 ms). Unrealistic frequency jumps that were larger than ± 100 cents at the beginning part of vocalization were searched backwardly from 200-ms after the onset, and were removed. Similarly, unrealistic jumps for the ending parts were also removed by forwardly searching from 300-ms before the offset with the same threshold (± 100 cents). After these removals of unstable onset parts, we re-defined onset times as the beginning point of stable vocalization since those unstable data reflected harsh or aperiodic glottal pulsation in which participants could not sense f_o shifts in the feedback. Additionally, we also repaired the unrealistic jumps at the middle part of vocalization between 210 and 1500 ms from the vocal onset (filled with the value obtained immediately before the jump).

Compensation ratio

To quantify compensatory responses against artificial f_o shifts in the auditory feedback, we first removed participant-specific frequency changes that were unrelated to the response to f_o shifts. For this, a common trend in all trajectories for each participant was removed by subtracting the grand mean of all trials. Moreover, we set the beginning part of each vocalization as zero by subtracting the mean value within a range of 50–150 ms in each trial to measure the responses to f_o shifts only. We defined this subtraction baseline period by visual inspection of outcomes of the grand averaging, and excluded the first 50 ms because of its instability. Then, we calculated the mean value of the late part (800–1200 ms) of data, in which the trajectories fluctuated less and were relatively stable (indicated using a black bar in Fig. 1B). We defined the compensation ratio to quantify how much the participant compensated by lowering or heightening their vocal pitch in the direction against the induced f_o shifts. This ratio was calculated as a sign-inverted slope of a line (linear regression) fitted to the mean amounts of vocal compensations as a function of f_o shifts (Fig. 1C). This measure was used to capture general tendency of the compensatory response for each participant.

Variability assessment

To quantify the motor variability in vocalization, we calculated the standard deviation (SD) of the f_o within a period between 100 and 1200 ms after the voice onset. For this calculation, we collected f_o trajectory data of the first vocalization of each trial, in which no f_o shift was applied. We excluded data from trials that followed immediately after the f_o-shifted trials to avoid contaminations of possible aftereffects. The computed SDs were averaged for each participant to obtain a variability index for the original f_o trajectories (“whole”). We extracted the slow and fast components by a low-pass filter with 5-Hz cutoff, and a band-pass filter with 6- and 30-Hz cutoff frequencies (second-order Butterworth filter), respectively. Then, we computed the mean SD of the filtered signals to obtain the variability index for a slowly fluctuating component (“slow”) or a fast fluctuating one (“fast”). These two frequency bands were defined by visual inspection of the frequency spectrum of f_o trajectories (or modulation spectrum) which is analyzed in the following subsection (Fig. 2B). Before filtering, each trajectory was zero-centered by subtracting the mean value to remove the constant component, and missing data points were filled with zero. We used the zero-phase digital filtering implemented in MATLAB software (‘filtfilt’ function).

Modulation spectrum analysis

To assess a relative amplitude across different modulation frequencies, we calculated the modulation spectrum by a half-octave-band filter bank. We first up-sampled each f_o trajectory to a doubled rate (200 Hz), and then centered the f_o trajectory by subtracting its mean value, and filled the missing data points with zero. We defined the filter bank as a set of multiple band-pass filters that has 1/2-octave bandwidths with center frequencies equally spaced at 1/4-octave step from 0.4 to 50 Hz (second-order Butterworth filter). The amplitude of each subband was calculated as the root-mean-square value of the filtered trajectory.

Voice amplitude calculation

The amplitude envelope of each vocalization was calculated as the root-mean-square values of an A-weighted waveform within 40-ms Hanning window for every 10-ms time step by MATLAB software. The obtained amplitude envelope was converted into a logarithmic scale (dB) by a formula: 20 log₁₀ (x). We calculated the average value of the log-converted amplitude within a period (100–1200 ms) that includes the very beginning part of the compensatory response and the plateau part of vocalization. Then, relative values among subjects were calculated by subtracting an overall average from all participants’ data.

Pitch-shift detection ability

We quantified the participant’s perceptual ability to detect shifts in their modified voice using the dataset obtained from the listening test performed after the vocalization sessions. We pooled trials irrespective of f_o shift directions (minus or plus), and the two vowels (/a/ and /u/) to increase the resolution and obtained 8 repetitions (2 directions × 2 vowels × 2 trials) for each absolute amount of f_o shifts. The detection rate for each absolute f_o shift was approximated by fitting a sigmoid function. For this fitting, we used a cumulative probability density function of the normal distribution as the sigmoid. Then, the detection threshold and accuracy were defined as the absolute shift value at 50% detection rate and the shallowness of fitted sigmoid (corresponding to the mean and standard deviation of the cumulative normal distribution), respectively (Fig. 5B).

Statistical test

We tested the significance of the correlation coefficient with a significance level of α = 0.05. Post hoc power analysis indicated that the power (1−β) was 0.644 with the sample size n = 38 if the hypothesized correlation coefficient was ρ = 0.37 (the smallest number appeared in this paper with statistical significance). We also performed paired t-tests for testing differences in variability indices between the first and second vocalization (see Fig. 4), and between different vowels (see Fig. 6) at a significance level of α = 0.05. The repeated one-way ANOVA was performed to assess the decrease of compensation amount with the increment of f_o shifts (see Fig. 6). To examine the significance of pair-wise difference among conditions, we used the Tukey–Kramer post hoc test. Lastly, to assess the relative impact of all possible factors on the compensation ratio, we performed a stepwise multiple regression analysis. We used variability indices of slow and fast components, detection threshold and accuracy, vocal amplitude, and talker gender as regressors in the model. This analysis was performed by MATLAB program (‘stepwiselm’).

Data availability

Dataset and analysis scripts are available in a public repository (Open Science Framework; DOI: https://doi.org/10.17605/OSF.IO/CXBAU; URL: https://osf.io/cxbau/).

References

Elman, J. L. Effects of frequency-shifted feedback on the pitch of vocal productions. J. Acoust. Soc. Am. 70, 45 (1981).
Article ADS CAS Google Scholar
Kawahara, H. Interactions between speech production and perception under auditory feedback perturbations on fundamental frequencies. J. Acoust. Soc. Jpn. E 15, 201–202 (1994).
Article Google Scholar
Burnett, T. A., Freedland, M. B., Larson, C. R. & Hain, T. C. Voice F0 responses to manipulations in pitch feedback. J. Acoust. Soc. Am. 103, 3153–3161 (1998).
Article ADS CAS Google Scholar
Larson, C. R., Burnett, T. A., Kiran, S. & Hain, T. C. Effects of pitch-shift velocity on voice Fo responses. J. Acoust. Soc. Am. 107, 559–564 (2000).
Article ADS CAS Google Scholar
Liu, H. & Larson, C. R. Effects of perturbation magnitude and voice F0 level on the pitch-shift reflex. J. Acoust. Soc. Am. 122, 3671–3677 (2007).
Article ADS Google Scholar
Liu, P., Chen, Z., Larson, C. R., Huang, D. & Liu, H. Auditory feedback control of voice fundamental frequency in school children. J. Acoust. Soc. Am. 128, 1306–1312 (2010).
Article ADS Google Scholar
Hain, T. C. et al. Instructing subjects to make a voluntary response reveals the presence of two components to the audio-vocal reflex. Exp. Brain Res. 130, 133–141 (2000).
Article CAS Google Scholar
Scheerer, N. E. & Jones, J. A. The relationship between vocal accuracy and variability to the level of compensation to altered auditory feedback. Neurosci. Lett. 529, 128–132 (2012).
Article CAS Google Scholar
Kuebrich, B. D. & Sober, S. J. Variations on a theme: Songbirds, variability, and sensorimotor error correction. Neuroscience 296, 48–54 (2015).
Article CAS Google Scholar
Tachibana, R. O., Takahasi, M., Hessler, N. A. & Okanoya, K. Maturation-dependent control of vocal temporal plasticity in a songbird. Dev. Neurobiol. 77, 995–1006 (2017).
Article Google Scholar
Woolley, S. C. & Kao, M. H. Variability in action: Contributions of a songbird cortical-basal ganglia circuit to vocal motor learning and control. Neuroscience 296, 39–47 (2015).
Article CAS Google Scholar
Sober, S. J. & Brainard, M. S. Vocal learning is constrained by the statistics of sensorimotor experience. Proc. Natl. Acad. Sci. USA 109, 21099–21103 (2012).
Article ADS CAS Google Scholar
Tumer, E. C. & Brainard, M. S. Performance variability enables adaptive plasticity of ‘crystallized’ adult birdsong. Nature 450, 1240–1244 (2007).
Article ADS CAS Google Scholar
Kao, M. H., Doupe, A. J. & Brainard, M. S. Contributions of an avian basal ganglia-forebrain circuit to real-time modulation of song. Nature 433, 638–643 (2005).
Article ADS CAS Google Scholar
Charlesworth, J. D., Tumer, E. C., Warren, T. L. & Brainard, M. S. Learning the microstructure of successful behavior. Nat. Neurosci. 14, 373–380 (2011).
Article CAS Google Scholar
Sober, S. J. & Brainard, M. S. Adult birdsong is actively maintained by error correction. Nat. Neurosci. 12, 927–931 (2009).
Article CAS Google Scholar
Kojima, S., Kao, M. H., Doupe, A. J. & Brainard, M. S. The Avian Basal Ganglia are a source of rapid behavioral variation that enables vocal motor exploration. J. Neurosci. Off. J. Soc. Neurosci. 38, 9635–9647 (2018).
Article CAS Google Scholar
Olveczky, B. P. & Gardner, T. J. A bird’s eye view of neural circuit formation. Curr. Opin. Neurobiol. 21, 124–131 (2011).
Article Google Scholar
Kao, M. H. & Brainard, M. S. Lesions of an avian basal ganglia circuit prevent context-dependent changes to song variability. J. Neurophysiol. 96, 1441–1455 (2006).
Article Google Scholar
Hampton, C. M., Sakata, J. T. & Brainard, M. S. An avian basal ganglia-forebrain circuit contributes differentially to syllable versus sequence variability of adult Bengalese finch song. J. Neurophysiol. 101, 3235–3245 (2009).
Article Google Scholar
Wu, H. G., Miyamoto, Y. R., Gonzalez-Castro, L. N., Ölveczky, B. P. & Smith, M. A. Temporal structure of motor variability is dynamically regulated and predicts motor learning ability. Nat. Neurosci. 17, 312–321 (2014).
Article CAS Google Scholar
Dhawale, A. K., Smith, M. A. & Ölveczky, B. P. The role of variability in motor learning. Annu. Rev. Neurosci. 40, 479–498 (2017).
Article CAS Google Scholar
Hahnloser, R. H. R. & Narula, G. A Bayesian account of vocal adaptation to pitch-shifted auditory feedback. PLoS ONE 12, e0169795 (2017).
Article Google Scholar
Doupe, A. J. & Kuhl, P. K. Birdsong and human speech: Common themes and mechanisms. Annu. Rev. Neurosci. 22, 567–631 (1999).
Article CAS Google Scholar
Kuhl, P. K. Early language acquisition: Cracking the speech code. Nat. Rev. Neurosci. 5, 831–843 (2004).
Article CAS Google Scholar
Prather, J., Okanoya, K. & Bolhuis, J. J. Brains for birds and babies: Neural parallels between birdsong and speech acquisition. Neurosci. Biobehav. Rev. https://doi.org/10.1016/j.neubiorev.2016.12.035 (2017).
Article PubMed Google Scholar
Lipkind, D. et al. Stepwise acquisition of vocal combinatorial capacity in songbirds and human infants. Nature 498, 104–108 (2013).
Article ADS CAS Google Scholar
Tchernichovski, O. & Marcus, G. Vocal learning beyond imitation: Mechanisms of adaptive vocal development in songbirds and human infants. Curr. Opin. Neurobiol. 28, 42–47 (2014).
Article CAS Google Scholar
Sundberg, J. The Science of the Singing Voice (Northern Illinois University Press, 1987).
Google Scholar
Shipp, T., Sundberg, J. & Doherty, E. T. The effect of delayed auditory feedback on vocal vibrato. J. Voice 2, 195–199 (1988).
Article Google Scholar
Howes, P., Callaghan, J., Davis, P., Kenny, D. & Thorpe, W. The relationship between measured vibrato characteristics and perception in Western operatic singing. J. Voice 18, 216–230 (2004).
Article Google Scholar
Akagi, M., Iwaki, M. & Minakawa, T. Fundamental frequency fluctuation in continuous vowel utterance and its perception. In 5th International Conference on Spoken Language Processing, vol. 4 1519–1522 (1998).
Akagi, M. & Kitakaze, H. Perception of synthesized singing voices with fine fluctuations in their fundamental frequency contours. In 6th International Conference on Spoken Language Processing, vol. 3 458–461 (2000).
Saitou, T., Unoki, M. & Akagi, M. Development of an F0 control model based on F0 dynamic characteristics for singing-voice synthesis. Speech Commun. 46, 405–417 (2005).
Article Google Scholar
Schoentgen, J. Modulation frequency and modulation level owing to vocal microtremor. J. Acoust. Soc. Am. 112, 690–700 (2002).
Article ADS Google Scholar
Xu, M. et al. Unconscious and distinctive control of vocal pitch and timbre during altered auditory feedback. Front. Psychol. https://doi.org/10.3389/fpsyg.2020.01224 (2020).
Article PubMed PubMed Central Google Scholar
Andalman, A. S. & Fee, M. S. A basal ganglia-forebrain circuit in the songbird biases motor output to avoid vocal errors. Proc. Natl. Acad. Sci. USA 106, 12518–12523 (2009).
Article ADS CAS Google Scholar
Dhawale, A. K., Miyamoto, Y. R., Smith, M. A. & Ölveczky, B. P. Adaptive regulation of motor variability. Curr. Biol. 29, 3551-3562.e7 (2019).
Article CAS Google Scholar
Renart, A. & Machens, C. K. Variability in neural activity and behavior. Curr. Opin. Neurobiol. 25C, 211–220 (2014).
Article Google Scholar
Faisal, A. A., Selen, L. P. J. & Wolpert, D. M. Noise in the nervous system. Nat. Rev. Neurosci. 9, 292–303 (2008).
Article CAS Google Scholar
Heller Murray, E. S. & Stepp, C. E. Relationships between vocal pitch perception and production: A developmental perspective. Sci. Rep. 10, 3912 (2020).
Article ADS CAS Google Scholar
Pfordresher, P. Q. & Brown, S. Poor-pitch singing in the absence of ‘Tone Deafness’. Music Percept. 25, 95–115 (2007).
Article Google Scholar
Brainard, M. S. & Doupe, A. J. Interruption of a basal ganglia–forebrain circuit prevents plasticity of learned vocalizations. Nature 404, 762–766 (2000).
Article ADS CAS Google Scholar
Tachibana, R. O., Lee, D., Kai, K. & Kojima, S. Performance-dependent consolidation of learned vocal changes in adult songbirds. J. Neurosci. 42, 1974–1986 (2022).
Article CAS Google Scholar
Boersma, P. & Weenink, D. Praat: Doing phonetics by computer [Computer program]. Version 6.1.51, retrieved from https://www.praat.org.

Download references

Acknowledgements

We thank Drs. Satoshi Kojima, Kouta Kanno, Kentaro Ono for valuable comments on the earlier version of this manuscript. This study was supported by Adolescent Mind & Self-Regulation, Grant-in-Aid for Scientific Research on Innovative Areas, MEXT, Japan (#23118003; Adolescent Mind & Self-Regulation) to R.H. and F.H., Grant-in-Aid for Scientific Research on Innovative Areas, MEXT, Japan (#4903; Evolinguistics) to K.O., JST Moonshot Grant No. JPMJMS2292-3-04 to F.H., MEXT/JSPS KAKENHI Grant No. 16H06525 to F.H., 16H06395 and 16H06396 to R.H., and JSPS Postdoctoral Fellowship, Japan (#269362) to R.O.T.

Author information

Authors and Affiliations

Center for Evolutionary Cognitive Sciences, The University of Tokyo, Tokyo, Japan
Ryosuke O. Tachibana & Kazuo Okanoya
Department of Life Sciences, Graduate School of Arts and Sciences, The University of Tokyo, Tokyo, Japan
Ryosuke O. Tachibana & Kazuo Okanoya
Global Research Institute, Keio University, Tokyo, Japan
Mingdi Xu
Department of Language Sciences, Graduate School of Humanities, Tokyo Metropolitan University, Tokyo, Japan
Mingdi Xu, Ryu-ichiro Hashimoto & Fumitaka Homae
Research Center for Language, Brain and Genetics, Tokyo Metropolitan University, Tokyo, Japan
Ryu-ichiro Hashimoto & Fumitaka Homae
RIKEN Center for Brain Science, Saitama, Japan
Kazuo Okanoya
Advanced Comprehensive Research Organization, Teikyo University, Tokyo, Japan
Kazuo Okanoya

Authors

Ryosuke O. Tachibana
View author publications
You can also search for this author in PubMed Google Scholar
Mingdi Xu
View author publications
You can also search for this author in PubMed Google Scholar
Ryu-ichiro Hashimoto
View author publications
You can also search for this author in PubMed Google Scholar
Fumitaka Homae
View author publications
You can also search for this author in PubMed Google Scholar
Kazuo Okanoya
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

R.O., X.M., R.H., and F.H. designed the study; R.O. conducted analyses, prepared all figures, and wrote the main manuscript text; X.M. collected data; all authors reviewed the manuscript.

Corresponding author

Correspondence to Ryosuke O. Tachibana.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Tachibana, R.O., Xu, M., Hashimoto, Ri. et al. Spontaneous variability predicts compensative motor response in vocal pitch control. Sci Rep 12, 17740 (2022). https://doi.org/10.1038/s41598-022-22453-0

Download citation

Received: 12 October 2021
Accepted: 14 October 2022
Published: 22 October 2022
DOI: https://doi.org/10.1038/s41598-022-22453-0

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.