Main

Some of the most complex and expressive behaviors, such as speaking or singing, depend on rapid and precise motor sequences that are learned with reference to internal guides and without reinforcement by external reward or punishment1,2. The neural circuit mechanisms that underlie the internally guided learning of rapid and precise motor sequences are not well understood, but external reinforcement can drive the learning of relatively slow and simple behaviors by modulating the activity of VTA neurons that release dopamine in the basal ganglia (BG)3,4,5,6,7,8,9,10. The rarity of well-documented forms of internally guided learning in nonhuman animals has complicated the analysis of its underlying neural mechanisms. Vocal learning in songbirds shares many parallels with human speech learning11,12, including a developmental sensitive period when juvenile songbirds copy the song of an adult tutor, a process that is internally guided13,14,15,16. Moreover, adult songbirds can modify the fundamental frequency of individual song syllables when subjected to external reinforcement with singing-triggered noise, a process referred to as pitch learning17,18,19. Consequently, studies in songbirds provide a unique opportunity for testing whether a common circuit mechanism involving dopamine-dependent signaling from the VTA to the BG is important to both internally guided and externally reinforced forms of motor learning.

The songbird brain is distinguished by a neural network for singing and song learning20,21, including a specialized basal ganglia region (Area X) that is important to juvenile and adult forms of vocal plasticity22,23,24 (Fig. 1a). Similarly to mammalian BG, Area X is densely innervated by neurons in the VTA and the substantia nigra pars compacta (SNc) that are positive for tyrosine hydroxylase (TH)23, the synthetic enzyme for dopamine (the songbird VTA and SNc form a continuous group of cells that do not differ in their projection targets or innervation25; here we refer to both VTA and SNc cells that project to Area X collectively as VTAX neurons). Notably, recent studies show that VTAX neurons in adult zebra finches subjected to singing-contingent noise function to encode reward prediction error26, a well-described property of mammalian VTA neurons that serves as an essential component in reinforcement learning7,8,9,10. These parallels support a model in which dopamine release from VTAX neurons is crucial to adult pitch learning, an idea that has gained support from a recent study showing that 6-hydroxydopamine (6-OHDA) lesions of TH+ terminals in Area X interfere with this form of song plasticity27.

Fig. 1: Genetically ablating VTAX neurons in adult birds reduces pitch learning.
figure 1

a, Schematic of neural circuit for song, highlighting the VTA and BG (Area X). b, Top: Cre-dependent caspase construct. Bottom left: unilateral VTAX ablation with VTA histology (an exemplar brain section from 1 of 2 birds subjected to unilateral ablation is shown). Bottom right: VTAX cell counts for control (gray) and bilateral VTAX ablation (red) birds (unpaired two-tailed t test: control: 4,423 ± 143 cells, n = 10 birds; experimental: 3,669 ± 156 cells, n = 10 birds; **P = 0.003, t18 = –3.38). Horizontal lines of box plots represent the first quartile, median and third quartile; whiskers of box plot represent the minimum and maximum. Scale bar, 500 µm. All values are shown as mean ± s.e.m. throughout. c, Top: example sonograms during pitch learning. White boxes indicate the targeted syllable. Bottom left: pitch of targeted syllable before (baseline, B1) pitch learning, during the first day of pitch learning (white noise (WN) day 1, WN1) and during the second day of pitch learning (WN2). Black dots, escapes; red dots, ‘hits’. Bottom right: frequency contours and mean of target syllable before (B1; n = 50 syllables) and 2 d after white noise (WN2; n = 50 syllables). Scale bars, 20 ms. d, Experimental design for adult pitch learning with VTAX ablation. e, Pitch distribution of a target syllable before WN (black), after WN early (gray) and after WN late in the viral expression window (red), normalized to the pitch at baseline. f, Percent change in pitch of target syllables (paired two-tailed t test; early: 6.11 ± 0.84%; late: 4.18 ± 0.96%, n = 6 syllables from 6 birds, *P = 0.042, t5 = 2.710). g, Percent of pitch recovered 3 d after discontinuing WN (paired two-tailed t test; early: 75.30 ± 8.67%; late: 83.89 ± 10.24%; n = 6, P = 0.577, t5 = –0.596). h, Coefficient of variation of target syllable pitch (paired two-tailed t test; early: r = 0.022 ± 0.003; late: r = 0.016 ± 0.004; n = 6, P = 0.099; t5 = 2.02). *P < 0.05, **P < 0.01, ***P < 0.001; n.s., nonsignificant. All values shown as mean ± s.e.m.

Despite these important advances in understanding the singing-related properties of VTAX neurons and the role of TH+ terminals in Area X in adult pitch learning, several critical steps are necessary to better understand the cellular effectors and circuit mechanisms underlying internally guided and externally reinforced forms of vocal learning. One step is to confirm that VTAX neurons and the dopamine receptors they activate in Area X are necessary to adult pitch learning, because the 6-OHDA treatment as applied in Area X may damage both dopaminergic and noradrenergic fibers and in any event does not identify the parent cell group of the affected fibers. Another critical step is to establish whether trial-by-trial variations in VTAX terminal activity, as might arise in response to singing-contingent noise, are sufficient to drive adult pitch learning, as expected in a reinforcement learning framework. Finally, beyond testing the necessity and sufficiency of VTAX neurons and dopamine signaling in Area X in adult pitch learning, whether these same cells and signaling pathways are important to juvenile song copying remains unexplored.

Results

VTAX neurons are necessary for externally reinforced learning

We used an intersectional genetic method to selectively ablate VTAX neurons (Fig. 1b), allowing us to test their role in an adult form of vocal learning in which white noise is used to drive changes in the fundamental frequency (i.e., pitch) of a target syllable (‘pitch learning’17, Fig. 1c). We confirmed previous results23 demonstrating that nearly 95% of VTAX cells are likely to release dopamine by injecting retrograde tracer into Area X and finding extensive overlap between retrogradely labeled neurons in the VTA and TH+ cells (345 of 367 retrogradely VTAX cells also TH+; n = 3 example sections through VTA from 3 birds). To ablate VTAX neurons, we injected young adult (n = 6; 95 ± 5 days posthatching (dph); mean ± s.e.m. unless otherwise noted) male zebra finches with an adeno-associated virus (AAV) encoding Cre-dependent caspase (AAV2/1.EF1α.FLEX-Casp3-2A-TEV)28 in the VTA and injected a retrogradely traveling virally encoded Cre in Area X (AAV2/9.CMV.HI.GFP-Cre.SV40; Fig. 1b,d). Several days (5 ± 1 d) after the viral injections, before high levels of viral expression, we targeted a syllable in each bird’s motif with pitch-contingent noise. Briefly, we measured the baseline variation in the target syllable’s pitch and set a threshold within this distribution such that pitch variants falling below the threshold triggered a brief, intense noise burst (Fig. 1c; threshold was set at the 70th percentile of the pitch distribution, i.e., syllables that fell below the 70th percentile of the pitch distribution triggered white noise playback). During this early period following viral injections, birds rapidly shifted the pitch of the target syllable to ‘escape’ noise playback. Following 4 d of noise exposure, during which the threshold was adjusted upwards each day to drive continued pitch learning away from the baseline value, we discontinued noise playback and measured the rate and magnitude of recovery of pitch to the pretreatment baseline. We then repeated these behavioral experiments 1 month later, after viral expression had ablated a proportion of VTAX neurons, as determined by tracer injection into Area X and post hoc quantification of retrogradely labeled neurons in the VTA (Fig. 1b).

Ablating VTAX neurons significantly impaired adult pitch learning. Within-bird comparisons revealed that the maximum amount of pitch learning 1 month after viral injections was significantly less than the maximum amount measured within the first week following these injections (Fig. 1e,f; n = 6 adult birds; P = 0.042). Post hoc histological analysis revealed that the reduction in the rate of pitch learning measured at 1 month versus 1 week after viral injections was inversely correlated with the number of surviving VTAX neurons (Supplementary Fig. 1a) and, in another adult bird, that intersectional VTAX lesions reduced TH+ immunoreactivity in Area X (Supplementary Fig. 1b). Notably, genetically ablating VTAX neurons did not affect the amount of recovery following pitch learning (Fig. 1g) and did not significantly alter the trial-to-trial variability of the target syllable (Fig. 1h). Moreover, genetically ablating VTA neurons that project to a region of the striatum medial to Area X (VTAMSt), a manipulation that spared VTAX neurons, had no effect on pitch learning (Supplementary Fig. 1c; n = 5 adult male zebra finches; mean of 3,957 ± 169 VTAX, cells from four VTAMSt birds versus a mean of 2,795 ± 92 VTAX cells for the six VTAX-lesioned birds; two-tailed t test: P = 0.0003). Furthermore, injections of either only AAV-Cre to Area X or only AAV-caspase to VTA also had no effect on pitch learning (Supplementary Fig. 1c; n = 2 adult male zebra finches). Therefore, a full complement of VTAX neurons is necessary to enable normal levels of noise-induced pitch learning in adult male zebra finches.

Optogenetic stimulation of VTA terminals in Area X is sufficient to drive vocal learning

The behavioral effects of targeted VTAX ablation indicate that these neurons are necessary to support normal levels of adult pitch learning but do not establish whether VTAX activity by itself is sufficient to drive pitch learning in the absence of external reinforcement. To begin to resolve this issue, we bilaterally injected an AAV containing a humanized channelrhodopsin (ChR2) gene29,30,31 (AAV2/9.CAGChR2.mCherry, n = 2 birds, or AAV2/9.CAG-NeurexinChR2.YFP, n = 6 birds) in the VTA of adult male zebra finches. After waiting several months to achieve robust expression of ChR2 in VTA terminals within Area X (VTAX terminals), we bilaterally implanted optical fibers in Area X (Fig. 2a,b; n = 8 animals; mean interval between viral injections in the VTA and fiber implantation in Area X: 137 ± 18 days; mean age at implantation: 252 ± 24 dph). In a subset of these birds (6 of 8), we used optrode recordings in Area X before fiber optic implantation to verify that brief light pulses (50–100 ms; 473 nm) delivered in Area X evoked an increase in multiunit activity (Fig. 2a; following behavioral experiments in all birds, histological methods were used to confirm ChR2 expression in the VTA and Area X (Fig. 2b) and cannula placement over Area X). We then adapted the pitch learning protocol to optogenetically activate VTAX terminals when the pitch of a target syllable fell either above or below a specified threshold. Analysis of unstimulated ‘catch’ trials indicated that such pitch-contingent stimulation of VTAX terminals applied over several days was sufficient to drive lasting changes in the pitch of the target syllable (Fig. 2c; pulse duration, 50 ms at 473 nm; threshold was set to apply stimulation to either the upper or lower 70% of the syllable distribution).

Fig. 2: Pitch-contingent stimulation of VTAX terminals is sufficient to drive pitch learning.
figure 2

a, Top: experimental design. Bottom: activity in Area X during optogenetic stimulation of VTA terminals. Scale bars, 500 ms; 50 spikes per s; n = 20 trials. b, Top left: merged image of TH+ VTA cells (red) with ChR2 (green) expression (shown is an exemplar brain section from 1 of 3 birds in which co-staining was performed). Top right: ChR2 terminals in Area X (an exemplar brain section from 1 of 3 birds in which immunostaining was performed is shown). Scale bar, 100 µm. Bottom row: inset from top left. Scale bar, 50 µm. White arrows indicate co-labeled cells. c, Pitch-contingent optogenetic stimulation of VTAX terminals. B, baseline day; L1, first day with light stimulation; L2, second day with light stimulation, etc. Scale bar, 50 ms. d, Frequency contours and mean of target syllables before (left) and after (right) stimulation (n = 50 syllables). ‘Up’ (ascending pitch; red) and ‘down’ (descending pitch; blue) syllables receive stimulation when bird sings above or below threshold, respectively. e, Z-scored frequency of all syllables before stimulation (n = 8 syllables, 6 birds). f, Z-scored frequency of down syllables (n = 4 syllables from 4 birds) after stimulation. Triangles indicate mean z-score for each syllable. g, Z-scored frequency of up (n = 4 syllables from 4 birds) syllables after stimulation. Triangles indicate mean z-score for each syllable. h, The area under the receiver operator characteristic (auROC) for up and down syllables (n = 10 syllables, 8 birds). Squares correspond to birds that were tested with both 70% (experimental; shown here) and 100% (control; shown in Fig. 4i,j) contingency stimulation. i, Mean change in auROC of target syllable frequency between the last day of baseline and the first day of baseline (0.046 ± 0.009); the last day of pitch-contingent optogenetic stimulation of VTAX terminals in experimental birds (0.211 ± 0.039); and the last day of pitch-contingent optical stimulation in Area X of the various control birds (0.043 ± 0.008; paired two-tailed t test: absolute change in auROC from baseline day 2 versus light day 4: n = 10 syllables from 8 birds, ***P = 0.00005; t18 = –5.247; unpaired two-tailed t test: absolute change in auROC from baseline day 2 versus light day 4 for control birds: n = 7 syllables from 6 birds, P = 0.639; t15 = –0.479; unpaired t test: absolute change in auROC from light day 4 experimental birds versus light day 4 for control birds: n = 7 syllables from 6 birds; ***P = 0.0004; t15 = –4.537); green, GFP; gray, no injection; purple, 100% contingency. j, Mean absolute percent change in pitch frequency between the last day of baseline and the first day of baseline (0.596 ± 0.176%); the last day of pitch-contingent optogenetic stimulation of VTAX terminals in experimental birds (1.929 ± 0.512%); and the last day of pitch-contingent optogenetic stimulation in Area X of control birds (0.302 ± 0.083%; paired two-tailed t test: absolute percent change in pitch from baseline day 2 versus light day 4: n = 10 syllables from 8 birds; **P = 0.008; t18 = –2.989; unpaired t test: absolute percent change in pitch from baseline day 2 versus light day 4 for control birds: n = 7 syllables from 6 birds; P = 0.198; t15 = –1.348; unpaired t test: absolute percent change in pitch from light day 4 experimental birds versus light day 4 for control birds, n = 7 syllables from 6 birds; **P = 0.009; t15 = –2.990). *P < 0.05, **P < 0.01, ***P < 0.001. All values shown as mean ± s.e.m.

In contrast to experiments that used pitch-contingent noise to drive pitch learning17,19 (Fig. 1c), the pitch of the target syllable shifted toward the frequency range that received optogenetic stimulation (n = 10 syllables from 8 birds; Fig. 2c–g and Supplementary Fig. 2a–c). Similarly to noise-driven pitch learning, the change in pitch occurred gradually during the first day of exposure, and the absolute change from baseline continued to increase following daily adjustments of the pitch threshold (Fig. 2c and Supplementary Fig. 2c). The pitch distribution and mean pitch of the target syllables were significantly shifted from baseline following several days of optogenetic stimulation (Fig. 2h–j; 5 ± 1 d, range: 4–10 d), whereas other syllables in the birds’ motifs were unaffected, regardless of their proximity to the target syllable (Supplementary Fig. 3a–c; n = 7 syllables from 5 birds). We also compared the pitch values of the first, middle and last third of the target syllables. We found that the pitch contours were modified differently across birds, with a slight trend toward the largest changes in pitch occurring in the middle and the last third of the syllable (Supplementary Fig. 4a,b). In contrast to these effects on syllable pitch, optogenetic stimulation of VTAX terminals had no acute effects on the pitch or trial-to-trial variability of the target syllable (Supplementary Fig. 4c–e). Moreover, almost all (6 of 8) birds subjected to syllable-triggered optogenetic stimulation of VTAX terminals sang significantly more on the last day of stimulation than on the day before the beginning of light stimulation (P = 0.038, Supplementary Fig. 4f). Therefore, pitch-contingent optogenetic stimulation of VTAX terminals is sufficient to drive pitch learning in adult male zebra finches and also appears to positively reinforce singing more generally.

Although VTAX neurons are TH+ and thus likely to release dopamine, they may also release other transmitters, as described for mammalian VTA terminals in the BG32. Therefore, we combined microdialysis methods to reversibly block D1-type dopamine receptors in Area X with pitch-contingent optogenetic stimulation of VTAX terminals (n = 3 adult male zebra finches). We found that when a D1R antagonist (SCH22390) was infused into Area X, optogenetic stimulation of VTAX terminals induced little or no pitch learning, whereas the same stimulation could drive robust pitch learning when saline was infused into Area X either before or after this drug treatment day (Supplementary Fig. 2d–g). Therefore, the pitch-contingent optogenetic stimulation of VTAX terminals is sufficient to drive pitch learning in adult zebra finches, and microdialysis experiments performed here in a small number of animals suggests that this form of adult learning depends on D1 receptor signaling in Area X.

Notably, the pitch distribution and mean pitch of a target syllable did not shift when VTAX terminals were optogenetically stimulated, regardless of the target syllable’s pitch, consistent with the idea that performance-contingent variations in VTAX terminal activity are necessary to drive pitch learning (Fig. 2i,j; 100% contingency, n = 2 syllables from 2 birds previously described that displayed pitch learning in response to a 70% stimulation contingency). In contrast to birds injected with AAV-ChR2 constructs, syllable-triggered pitch-contingent illumination of GFP-expressing VTAX terminals or of Area X in birds that had not been injected with any virus had no effect on the pitch of the target syllable (Fig. 2i,j; n = 3 syllables from 2 birds injected in the VTA with AAV2/9.CAG-GFP and n = 2 syllables from 2 birds that had not been injected with virus; 70% contingency).

VTAX neurons project almost exclusively to Area X

Taken together, the intersectional cell ablation and optogenetic experiments strongly implicate VTAX terminals in Area X as a critical component of learning-related vocal plasticity. Although VTAX neurons do not provide appreciable input to surrounding striatal regions23, one potential confound is that they may extend collaterals to other song-related brain nuclei, the inadvertent destruction or stimulation of which might account for the learning-related effects we observed. To explore this possibility, we used dual retrograde tracing methods to determine whether VTAX neurons also innervate other forebrain song nuclei that are densely innervated by TH+ fibers33,34 (HVC (used here as a proper name), nucleus interface of the nidopallium (NIf) and the lateral magnocellular nucleus of anterior nidopallium (LMAN); Supplementary Fig. 5). We detected only a small percentage of double-labeled VTAX neurons following these dual tracer injections (percentage of VTAX cells that also project to: HVC: 1.7% (23 of 1,323), 3 hemispheres, 2 birds; NIf: 5.6% (95 of 1,696), 3 hemispheres, 2 birds; LMAN, 4.8% (74 of 1,549), 3 hemispheres, 2 birds). Thus, VTAX neurons likely influence adult pitch learning through their terminals in Area X.

D1-type receptors in Area X are necessary for externally reinforced learning

In mammals, the VTA mediates reinforcement learning by activating dopamine receptors in the BG8,9, and we have shown that pitch-contingent optogenetic stimulation of VTAX terminals in Area X can drive pitch learning through a D1-receptor-dependent mechanism. To determine whether the VTA influences adult pitch learning through dopamine receptors, we used microdialysis methods19,35 to reversibly block different dopamine receptor types in Area X of adult male zebra finches while targeting syllables with pitch-contingent noise (Fig. 3a; n = 6, 99 ± 9 dph). Bilateral infusion of a D1R receptor antagonist (SCH23390)36,37 into Area X prevented pitch learning (Fig. 3b–e) without affecting the trial-to-trial variability of the target syllable’s pitch (Supplementary Fig. 6). Similar treatment with sulpiride, a D2-receptor antagonist36, exerted variable effects on the daily amount of pitch learning but also strongly reduced the total amount of singing, without affecting trial-to-trial song variability (Fig. 3f,g and Supplementary Fig. 6; n = 6 birds; one of these birds sang too infrequently (<20 times per day) to support pitch learning experiments). When we corrected for this reduced amount of singing by estimating the amount of pitch learning per rendition of the target syllable, we found that the rate of pitch learning during sulpiride treatment was either enhanced or unchanged in 4 birds and reduced in the other bird (Fig. 3h), suggesting that VTA terminals may act selectively through D1 receptors in Area X to drive pitch learning in the adult zebra finch. Moreover, although D1 and D2 receptors can be co-expressed in single medium spiny neurons within Area X38,39, the current study indicates that they mediate distinct behavioral functions, reminiscent of the functional segregation observed in the mammalian striatum40,41,42.

Fig. 3: Adult pitch learning requires activation of D1 receptors in Area X.
figure 3

a, Left: schematic. Middle: infusion of muscimol-BODIPY through microdialysis probes to visualize drug spread (shown is an exemplar brain section from 1 of 6 birds in which labeling with muscimol-BODIPY, a fluorescent dye conjugated to muscimol, was visualized). Scale bar, 1 mm. Right: experimental design. b, Left: target syllable pitch distribution before (black) and after WN with saline infusion (gray) from one bird on day 1. Right, cumulative distributions of normalized target syllable pitch during first (black) and last third (gray) of day 1. c, As in b but with infusion of SCH23390 (red) instead of saline. d, Change in auROC of target syllable pitch after WN during saline (black) or SCH23390 (red) infusion (paired two-tailed t test: saline learning (day 1): 0.663 ± 0.034; SCH23390 learning (day 3): 0.512 ± 0.027; n = 6 syllables from 6 birds, **P = 0.004; t5 = 5.091). e, Mean percent change in target syllable pitch after WN during saline (dark gray; day 1, 1.443 ± 0.164%; and light gray; day 5, 1.609 ± 0.225%) or SCH23390 (red; day 3, –0.085 ± 0.410%) infusion (paired two-tailed t test for learning: n = 6, for saline learning vs. SCH23390 learning (day 1 vs. day 3): ***P = 0.006; t5 = 4.626; for SCH23390 learning vs. saline learning after SCH23390 learning (day 3 vs. day 5): ***P = 0.018; t5 = –3.468; for saline learning vs. saline learning after SCH23390 learning (day 1 vs. day 5): P = 0.591; t5 = –0.573). f, Mean percent change from baseline number of songs sung during SCH23390 (red) or D2 antagonist sulpiride (purple) infusion (unpaired two-tailed t test: percent change in number of songs: D1 antagonist: –7.810 ± 23.175%, n = 6 birds; D2 antagonist: –66.12 ± 17.38%, n = 6 birds; P = 0.05; t10 = 2.181). g, Percent change in target syllable pitch after WN during saline (gray) or sulpiride (purple; paired two-tailed t test for learning: saline learning, day 1: 1.282 ± 0.266%; sulpiride learning, day 3 (D2): 1.159 ± 0.725%; n = 5, P = 0.91; t4 = 0.121). h, Mean percent change in target syllable pitch per rendition after WN with saline (gray) or sulpiride (purple; paired two-tailed t test for learning per rendition: saline learning, day 1: 0.00058 ± 0.00008%; sulpiride learning, day 3 (D2): 0.00246 ± 0.001834%; n = 5, P = 0.42; t4 = –0.902). *P < 0.05, **P < 0.01, ***P < 0.001; n.s., nonsignificant. All values shown as mean ± s.e.m.

VTAX cells and D1 receptors in Area X are necessary for internally reinforced learning

Whereas adult pitch learning is driven by exposure to loud noise, an extrinsic cue, juvenile song-copying progresses without any external reinforcement13. Therefore, a remaining issue is whether the mechanisms that underlie adult pitch learning identified here are similar to those that are necessary to juvenile song-copying. Specifically, we tested the importance of VTAX neurons and D1 receptors in Area X to juvenile song-copying. To test the role of VTAX neurons in juvenile song-copying, we used intersectional genetic methods to ablate these neurons during the second month after hatching, a period when juvenile zebra finches are actively modifying their own songs to match those of a tutor14,43 (Fig. 4a). Juveniles were housed from 0–60 dph with an adult male tutor, providing them with abundant auditory experience of a suitable vocal model. Between 20–30 dph, we injected these juveniles (n = 12 birds, 26 ± 1 dph) with AAV2/1.EF1α.FLEX-Casp3-2A-TEV in the VTA and AAV2/9.CMV.HI.GFP-Cre.SV40 in Area X and recorded their songs at monthly intervals (Fig. 4a; songs were recorded at 60, 90 and 120 dph). We also tracked the song development of another cohort of similarly housed juveniles that were siblings of the experimental animals and that were injected either with AAV2/1.EF1α.FLEX-Casp3-2A-TEV in the VTA, AAV2/9.CMV.HI.GFP-Cre.SV40 in Area X or no virus (n = 3 virally injected animals, 20 ± 1 dph at the time of injections; n = 2 animals that were not injected with any virus).

Fig. 4: VTAX neurons and D1 receptor activity in Area X are necessary for accurate song-copying in juvenile zebra finches.
figure 4

a, Experimental design. b, Example sonograms; red indicates VTAX ablation birds and gray indicates control bird injected with only caspase in VTA. Scale bar, 50 ms. White bars (a–d,?) indicate syllables. c, Mean percent similarity to tutor as a function of remaining VTAX neuron number (linear regression: R2 = 0.689, n = 12 experimental birds, P = 0.0008; F11 = 22.2; degrees of freedom: 11). d, Mean percent similarity to tutor (left) and to adult self (right; paired two-tailed t test: percent tutor similarity: control: 83.00 ± 4.87%, n = 5; experimental: 56.25 ± 6.61%, n = 12; *P = 0.027; t15 = –2.442; paired two-tailed t test: percent self-similarity: control: 95.07 ± 1.82%, n = 5; experimental: 90.98 ± 2.76%, n = 12; P = 0.34; t15 = –0.977). e, Mean percent similarity to tutor over the course of sensorimotor learning. Red, experimental birds (n = 6); grey, control birds (n = 4). f, Mean percent similarity to adult self over the course of sensorimotor learning. Red, experimental birds (n = 6); grey, control birds (n = 4). g, Experimental design. h, Percent similarity to tutor 1 d before and 10 d after beginning of daily microdialysis for control (gray, n = 3 implanted birds infused with saline and n = 3 intact birds) and SCH23390 birds (red; two-factor repeated-measures ANOVA: SCH23390 birds: P = 0.002, F1: 19.45; paired two-tailed t test: control birds: saline day –1: 49.04 ± 3.79%; saline day 10: 60.75 ± 2.46%, n = 6, ***P = 0.001; t5 = –6.787; paired two-tailed t test: experimental birds: drug day –1: 49.29 ± 1.72%; drug day 10: 45.78 ± 1.03%, n = 5, P = 0.33; t4 = 1.101). i, Percent similarity to tutor 1 d before, 10 d after and > 40 d after beginning of microdialysis for SCH23390 (red) and control birds (black; unpaired two-tailed t test: control birds: 40+ d after beginning of treatment: 67.08 ± 2.30%, n = 5; experimental birds: 40+ d after beginning of treatment: 54.19 ± 6.40%, n = 4, P = 0.17; t6 = –1.549). *P < 0.05, **P < 0.01, ***P < 0.001; n.s., nonsignificant. All values shown as mean ± s.e.m.

As adults, birds that had been injected with both viruses produced significantly worse copies of their tutor’s song than did control animals (Fig. 4b–d; see Supplementary Fig. 7 for sonograms of all juveniles). Post hoc histological analysis revealed a strong correlation between the number of surviving VTAX neurons in adulthood and the similarity of the experimental bird’s song to his tutor’s song (Fig. 4c; R2 = 0.689, P < 0.001). Moreover, adults (n = 6) with the lowest number ( < 3,500) of surviving VTAX neurons produced poor copies of their tutors’ songs at all timepoints (Fig. 4e), although they sang about as often as control animals (Supplementary Fig. 8c,d), displayed normal levels of song stereotypy (Fig. 4d) and progressed to their final songs in a manner similar to controls (Fig. 4f and Supplementary Figs. 8a,b and 9). Finally, when a subset of these adult males with reduced numbers of VTAX cells were presented with a female, they sang approximately as much as did control males, suggesting that learning deficits did not simply reflect reduced motivation to sing (two-tailed t test: number of motifs sung on first presentation of a female: VTAX cell lesioned birds: 8 ± 2 motifs, n = 3 birds; control birds:7 ± 1 motifs, n = 3 birds; P = 0.8). Therefore, intersectional ablation of VTAX neurons disrupts juvenile song-copying, underscoring that these neurons provide a common cellular foundation for internally guided and externally reinforced forms of vocal learning.

We then used microdialysis methods to infuse a D1 receptor antagonist (SCH23390) into Area X in a cohort of juveniles for a 10-d period during the height of sensorimotor learning (~50–60 dph), when much of the tutor song is normally copied by juvenile zebra finches (Fig. 4g; n = 5 juvenile male zebra finches, 45 ± 1 dph at time of infusion; all juveniles were raised in the presence of an adult tutor before this period). We recorded their songs continuously during this drug treatment, and at the end of this period we flushed the probes with saline and then returned the birds to the colony until they reached early adulthood, at which time we recorded their songs once again (n = 4 birds, 94 ± 4 dph; one animal was killed for analysis at the end of the drug treatment period). We also recorded the songs of another cohort of juveniles that either received saline infusions in Area X (n = 3 birds; 45 ± 2 dph at time of infusion) or were not manipulated (n = 3; 47 ± 0 dph at time of first recording). Compared to these control animals, the songs of juveniles infused with D1 antagonists in Area X showed little or no increase in similarity to the tutor song during the infusion period (Fig. 4h). Despite the lack of copying during the juvenile treatment period, adult experimental and control birds ultimately displayed similar levels of tutor song copying, indicating that juveniles treated with SCH23390 could subsequently compensate for their copying deficit during the post-treatment period (Fig. 4i). In summary, similarly to adult pitch learning, juvenile song-copying depends on VTAX neurons and the activation of D1 receptors in Area X.

Discussion

Here we show that the same VTA–BG circuits and dopamine signaling pathways are necessary to internally guided vocal copying in juvenile songbirds and externally reinforced forms of vocal learning in adults, highlighting a common and developmentally conserved mechanism for these different types of learning. Our findings extend the prior observation that 6-OHDA lesions of TH+ terminals in Area X can impair adult pitch learning27 by localizing the likely source of these terminals to the VTA, highlighting the necessity of D1 receptors in Area X for this form of learning, and by defining a role for the VTA–BG D1 pathway in juvenile song-copying. Beyond establishing the necessity of this pathway to both forms of vocal learning, we found that pitch-contingent optogenetic stimulation of VTAX terminals is sufficient to drive pitch learning, supporting a model in which trial-by-trial variations in VTA activity are the critical reinforcing signal for vocal learning44.

The strong parallels between the avian and mammalian BG include robust dopaminergic projections from the VTA and SNc, bolstering speculation that these inputs play an essential role in birdsong learning26,44,45. Indeed, the recent finding that using 6-OHDA to lesion TH+ terminals in Area X interferes with pitch learning in adult birds provided critical experimental support for this idea. The current findings, that intersectional genetic ablation of VTAX neurons interferes with adult pitch learning, extend this prior observation by localizing the cell bodies that provide the TH+ fibers in Area X to the VTA and thus help to inform an anatomically grounded circuit model. Moreover, by showing that intersectional VTAX lesions disrupt juvenile song-copying, the current study supports the idea that a common mechanism serves both internally guided song-learning in juveniles and externally reinforced vocal plasticity in adults. In both juveniles and adults, the residual learning capacity correlated with the number of surviving VTA neurons, raising the possibility that the relatively large endowment of VTAX neurons (relative to the smaller number of VTA neurons that project to (for example) the medial striatum23) reflects strong selective pressures on vocal learning and its underlying neural circuitry. A broader implication is that evolutionarily ancient circuitry that first arose to enable reinforcement learning in response to aversive and appetitive cues was later co-opted to facilitate forms of motor learning that are internally guided and do not depend on external reinforcement.

Beyond merely extending the earlier study using 6-OHDA lesions in Area X of adult birds, the microdialysis methods used here show that D1-receptor signaling in Area X is a critical effector of both juvenile song-copying and adult pitch learning. The current observations further underscore that a common circuit and signaling pathway mediates these two different forms of learning. Moreover, the strong and reversible effects of D1-receptor blockade on song copying in juveniles with substantial prior tutor song exposure help assign these deleterious effects to disruptions of sensorimotor learning rather than the preceding epoch of sensory learning. This distinction is less readily made with intersectional genetic ablation of VTAX neurons, the onset and time-course of which are variable and relatively slow, and the anatomical consequences of which are only subject to post hoc evaluation. Further insight provided by microdialysis applied in adult birds is that D1- and D2-receptor blockers exert distinct effects on singing: whereas D1-receptor blockade in Area X completely and reversibly abolished adult pitch learning without substantially affecting the amount of singing, D2-receptor blockade strongly and consistently suppressed singing while exerting variable effects on pitch learning. These differential effects raise the possibility that song production is differentially regulated through different dopamine receptor subtypes in Area X, reminiscent of the functionally distinct effects of D1 and D2 signaling in the mammalian striatum on locomotion41.

We found that pitch-contingent optogenetic stimulation of VTA terminals in Area X over the course of hours could induce frequency shifts in target syllable without affecting other syllables in the motif, highly similar to noise-driven pitch learning17,19. One important difference was that optogenetic stimulation of VTAX terminals positively reinforced the pitch of the target syllable, opposite in sign of pitch changes driven by noise. In fact, because singing-triggered noise can depress VTAX neuron activity26, whereas escapes from noise transiently elevates activity in these neurons, the current study provides a causal link between auditory feedback-dependent differences in VTAX activity and long-lasting changes to vocal performance. Given that the caudal auditory forebrain contains neurons that respond selectively to singing-triggered noise and that project to the VTA46, these findings provide further evidence for an error-detection circuit that harnesses singing-related auditory feedback information to modulate VTAX neuron activity on a trial-by-trial basis to affect vocal motor learning. One major goal is to dissect the underlying circuitry that converts singing-triggered excitation in the auditory forebrain into transient suppression of VTAX neuron activity. Another critical step will be to determine whether and how auditory-related afferents to VTAX neurons function during juvenile sensorimotor learning, when singing-related auditory feedback is compared to the memory of the tutor song and motor learning proceeds in the absence of aversive external auditory cues. A distinct possibility is that VTAX neurons supply the instructive signals used to guide this form of imitative learning, although the approaches used here cannot distinguish between permissive and instructive roles for these neurons in juvenile song copying.

One of the many remarkable parallels between birdsong and human speech is that both are acoustical signals where fine temporal modulations on the timescale of 10–100 ms are salient to their communicative functions. Furthermore, juvenile copying and adult pitch learning in songbirds require temporally precise modification of vocal structure, in contrast to slower forms of motor learning involving lever-pressing or licking3,4,5,7. Our findings advance theories of dopamine-dependent synaptic plasticity in the BG as a likely cellular effector of vocal learning44, while raising the question of how relatively slow signaling through G-protein-coupled receptors underlies a form of motor learning that exhibits millisecond precision47. One attractive model involves short-lived synaptic tags on medium spiny neurons in Area X, which are hypothesized to be formed by temporally coincident glutamatergic activity from song premotor regions44. When these patterns of premotor activity produce vocalization-related feedback that stimulates dopamine release from VTAX neurons, subsequent dopamine-receptor activation on these same medium spiny neurons stabilizes these synaptic tags and strengthens the relevant premotor synapses, resulting in an adaptive bias that the BG supplies to the song motor system44. More broadly, by providing evidence of a role for the VTA in avian motor learning, the current study suggests that an evolutionarily conserved circuit mechanism supports different forms of learning across both birds and mammals48, raising the possibility that this mechanism is also central to speech and musical learning in humans, as well as playing a critical role in birdsong learning.

Methods

Animals

Juvenile (18–49 dph) and adult (82–436 dph) male zebra finches were obtained from the Mooney lab breeding colony within the Duke University Medical Center animal facility. Experimental procedures were conducted in accordance with the National Institutes of Health guidelines and were reviewed and approved by the Duke University Medical Center Animal Care and Use Committee. Viral vectors were acquired from University of Pennsylvania Vector Core and University of North Carolina, Chapel Hill Vector Core.

Genetic ablation of VTAX neurons

Male zebra finches (20–25 dph for juvenile experiments, 100–110 dph for adult experiments) were food deprived for 30 min and then anesthetized with 2% isoflurane gas before being placed on top of a small heating pad in a custom stereotaxic apparatus. Rate of breathing and stability of surgical plane were monitored throughout surgery. The feathers over the skull were trimmed and topical anesthetic (0.25% bupivacaine) was applied before an incision was made in the skin from anterior to posterior with a scalpel. After pushing skin from the center of the skull with a cotton swab doused in 70% ethanol, craniotomies were made with a smaller scalpel at a predetermined distance from the bifurcation of the midsagittal sinus (the ‘y-sinus’; coordinates measured from y-sinus: VTA: head angle 37°, 1.65 mm anterior, 0.5 and 1.8 mm lateral, 6.2 mm ventral; Area X: head angle 43°, 5.3 mm anterior, 1.6 mm lateral, 3.2, 2.9 and 2.7 mm ventral). To selectively ablate VTAX cells, a pressure-injection system (Drummond Nanoject II) was used to make bilateral injections of a retrogradely transported Cre construct (AAV2/9.CMV.HI.GFP-Cre.SV40; Penn Vector, a total of 15 injections of 32.2 nL of Cre per hemisphere) into Area X at three different depths. A locally expressed Cre-dependent caspase construct was then injected into the VTA at two different locations along the mediolateral axis (AAV2/1.Ef1α.FLEXCasp3-2A-TEV; construct courtesy of N. Shah, UCSF; 15 injections of 32.2 nL each of Casp3 per site per hemisphere, i.e., a total of four caspase injection sites per bird). After these viral injections, the craniotomies were sealed with bone wax, the incision site was closed with tissue adhesive, and the bird was allowed to recover from anesthesia under a heat lamp. At the endpoint of each experiment and 5 d before perfusion, birds were injected with Alexa Fluor 594 in Area X to retrogradely label VTAX neurons. Five days after these tracer injections, birds were deeply anesthetized with an intraperitoneal injection of pentobarbital solution (Euthasol) and then perfused through the heart with 0.025 M phosphate-buffered saline followed by 4% paraformaldehyde. The brain was then removed from the skull and placed in a cryoprotective formalin sucrose solution (30% sucrose in 4% paraformaldehyde) overnight. The next day consecutive sagittal sections of the cryoprotected brain were cut on a freezing microtome and alternate sections were mounted on glass slides. A subset of alternate sections were treated with an antibody against tyrosine hydroxylase (αTH, 1:1,000, Abcam112) overnight at 4 °C and reacted with secondary antibody (1:500, Abcam) at RT (20–25° C) for 1 h then mounted on slides to visualize TH+ cells in the VTA. A similar process was used to visualize TH+ fibers in Area X. Sections containing VTA (or VTA terminals in Area X) were visualized and imaged under a confocal microscope (Zeiss Axioskop 2). The images were then examined in an image-processing program and the number of fluorescent retrogradely labeled cells in VTA was counted in a semiautomated manner (ImageJ, CellCounter plug-in).

Genetic ablation experiments

Pitch contingent learning

Young adult male birds (100–110 dph) were screened for syllables with clear tonal components and for the amount of song produced. Birds that fit both these criteria were then bilaterally injected with viruses to ablate VTAX cells. After the birds recovered from surgery and began singing readily, their songs were recorded and a template to detect the fundamental frequency (i.e., pitch) of a tonal syllable was made in a custom software program (EvTAF17). The template was designed to detect no less than 75% of the renditions of the targeted syllable with no more than 5 ms jitter in detection onset. After collecting 2 d of ‘baseline’ song, a threshold at the upper 70th percentile of the target syllable’s pitch distribution was set and a 50-ms white noise burst (~70 dB) was played through a nearby speaker to the bird whenever the program template detected that the pitch of the targeted syllable was below this threshold; over hours and days, this manipulation results in an adaptive shift in the pitch of the target syllable. The bird’s pitch for the targeted syllable was measured in the late morning and early evening for the next 4 d, and the threshold was adjusted each morning and early evening to the upper 70th percentile of the bird’s pitch distribution to promote more rapid learning. After 4 d of pitch-contingent white noise experience, the white noise was discontinued and the bird’s song was recorded for the next 3–4 d as the pitch of the targeted syllable recovered toward its baseline value. This entire process was repeated 1 month after viral injections, when VTAX neurons had been ablated by the intersectional viral treatment. At the end of the recovery period from this second pitch learning experiment, birds were injected with dextran in Area X in order to allow the number of VTAX cells remaining to be quantified, as previously described.

Juvenile song copying

Juvenile male zebra finches (20–25 dph) were injected with viruses as previously described to ablate VTAX cells, then isolated with their siblings and father until 60 dph. At 60 dph juveniles were housed with other virally injected birds and isolated temporarily for recording at 60, 90 and 120 dph using a custom song recording program (Sound Analysis Pro 2011 (SAP)). We relied on percent similarity, a measure that combines measures of pitch, amplitude modulation, frequency modulation, Weiner entropy and goodness of pitch, to gauge the similarity of song elements between two sets of songs (i.e., the pupil’s song and that of his tutor). We chose representative motifs (at least 30 ~200- to 500-ms long motifs) from pupils and used the asymmetric time-courses setting to compare the pupil motif to a representative tutor motif that we confirmed was highly similar to the tutor’s other motifs. SAP was also used to measure spectral features of single syllables, such as entropy and entropy variance, over development in a subset of birds. Once birds reached 120 dph, they were injected with dextran in Area X and the number of VTAX cells was quantified as previously described.

Singing-triggered optogenetic stimulation of VTAX terminals

Using surgical methods previously described, young adult male birds (60–90 dph) were bilaterally injected in the VTA with a virus containing a channelrhodopsin construct (2/9.AAV-CAG-ChR2-mCherry or 2/9.AAV-CAG-ChR2-YFP-neurexin) at four different sites (50 injections of 9.2 nl of ChR2 per site, two sites per hemisphere). After waiting 3–6 months to allow for optimal viral expression, birds were anesthetized and placed in a stereotaxic apparatus and craniotomies were made over Area X bilaterally. Six of the eight birds used for these experiments were tested for terminal field optogenetic responses in Area X with a 500-kOhm tungsten electrode (MicroProbes Inc.) coupled to a fiberoptic cable (Thorlabs, 200-µm diameter core) through which 50- to 100-ms pulses of light were delivered and neural activity was recorded simultaneously (Differential A-C Amplifier 1700, A-M Systems). All birds were then implanted bilaterally over Area X with fiberoptic ferrules at an anterior angle to avoid passing through LMAN (coordinates: 43° head angle, mark 5.3 mm rostral; adjust head angle to 72° and move 1.2 mm rostral from previous mark, 1.6 mm lateral, 2.7–3.0 mm ventral). Craniotomies were then sealed with melted bone wax; ferrules were secured in place with MetaBond and then covered with a layer of VetBond. After birds recovered from anesthesia under a heat lamp, fiberoptic cables (Thorlabs, 200-µm core, 0.37 NA) were connected to the newly implanted ferrules by ferrule sleeves. The other ends of the fiberoptic cables were attached to a two-channel optical commutator (FRJ_1×2i_FC-2FC, Doric), allowing the bird to move about its cage freely. The commutator was then connected by a patch cable (Thorlabs) to a DPSS laser (BL473T3-100, Shanghai Lasers).

As described above for adult pitch learning experiments, we created a template that detected no less than 75% of the renditions of the targeted syllable with no more than a 5-ms jitter in detection onset. After collecting two days of baseline song (i.e., produced when the bird was connected to the fiberoptic cables but the laser remained off), a threshold at the upper (or lower) 70th percentile of the target syllable’s pitch distribution was set and a 50-ms pulse of blue light (473 nm, 5–8 mW emitted at each ferrule) was delivered to Area X whenever the program detected that the pitch of the targeted syllable was below (or above) this threshold. The bird’s pitch for the targeted syllable was measured in the late morning and early evening for the next 4 d (for eight of ten syllables, see below) and the threshold was adjusted to the upper 70th percentile of their pitch accordingly. Of the ten syllables targeted, eight were exposed to pitch-contingent optogenetic stimulation for 4 d, one for 6 d and another for 10 d. Light stimulation was then ended and song in the absence of stimulation was recorded for up to 4 d. Birds were then uncoupled from the fiberoptic cables and returned to the colony. Three to five months after stimulation ended, birds were again recorded for 4–5 d before being perfused. Histology was performed as described above, with alternate sections stained against mCherry (Abcam167453) or GFP (Abcam1218) for visualization of the terminal field in Area X. Only birds that had accurate placement of ferrules in the center of Area X and robust labeling of cell bodies in VTA and of axon terminals in Area X were included in our analysis. Exclusion of birds was blind to behavioral results. One bird was excluded from our analysis as both ferrule implantation and viral injection localization were incorrect (i.e., placement over LMAN rather than Area X, viral injection caudal and dorsal to VTA). In sample sections, we counted the numbers of ChR2-YFP+ cells. Comparing the average number of ChR2-YFP+ cells per section to the average number of retrogradely labeled cells from other sections (from other tissue), we can provide a rough estimate that ~40% of VTAX neurons are ChR2+.

Microdialysis experiments

Adult pitch learning experiments

Young adult birds (>80 dph) with clear tonal elements in their song were chosen for implantation of microdialysis probes. Probes were constructed in-house from plastic tubing, which served as a drug reservoir, fitted at the end with a 0.7- to 1.0-mm-long semipermeable membrane, which allowed the drug to slowly diffuse throughout the day (see ref. 35 for probe design). Using the surgical procedures and stereotaxic coordinates described above, craniotomies were made over Area X and neural recordings were made to confirm its depth (Differential A-C Amplifier 1700, A-M Systems). We approached Area X rostrally to avoid LMAN (anterior Area X coordinates: initial head angle 43°, 5.3 mm anterior, marked with scalpel on skull, then adjusted head angle to 72°, 1.2 mm anterior from scalpel mark, 1.7 mm lateral, 2.9–3.2 mm ventral). Probes were then implanted with the tip of the semipermeable membrane placed at the most ventral part of Area X so that the membrane extended through the dorsoventral extent of Area X. The surgical site was covered with melted bone wax, and the probes were secured in place first using MetaBond and then a coating of VetBond. Birds were then removed from the apparatus and recovered under a heat lamp. After recovery, birds were placed in a sound isolation box and their first full day of song was recorded and used to make an EvTAF template to target a tonal syllable as described above. Birds were recorded in the absence of white noise for 2 h the morning after their first full day of song, then infused with saline and recorded in the presence of pitch-contingent white noise for the next 8 h, after which they were again infused with saline and white noise was turned off (‘learning day’, day 1 in Fig. 2a).

This protocol was repeated the following day with the white noise remaining off (recovery day, day 2) The next day the protocol from the saline learning day (day 3) was repeated, but either 5 mM SCH23390 or 0.5 µg/mL sulpiride were infused after 2 h of recording instead of saline and washed out with saline after 8 h of recording. After this drug learning day, the bird underwent a recovery day (day 4) after which followed another learning day (day 5) with saline only. Before perfusion, birds were infused with fluorescent muscimol-BODIPY for 2 h to allow for post hoc visualization of drug diffusion through the semipermeable membrane into Area X. Birds were then perfused and histology was performed as described above to assess correct placement of the dialysis probes in Area X and the extent of drug diffusion. This manner of quantifying the spread of drug from our microdialysis probes may underestimate the amount of drug spread, as muscimol-BODIPY is of a higher molecular weight than SCH23390 or sulpiride and was infused for a shorter period of time than the SCH23390 or sulpiride.

Juvenile song copying experiments

Young (40–49 dph) juvenile male zebra finches that had recognizable syllables but had not yet developed a stereotyped motif were implanted bilaterally with microdialysis probes in Area X in the manner previously described. After they recovered from surgery they were infused with saline until they began singing again. After recording at least 1 d of singing with saline infusion, the birds were infused in the morning ~10–20 min before ‘lights on’ with either SCH23390 (experimental birds) or saline (3 of 6 of the control birds were implanted and infused with saline only; the 3 other birds were not implanted and were recorded continuously for 12 d) each morning for the next 10 d. After 10 d of drug or saline treatment, juveniles were infused with saline and recorded for an additional 2–3 d. They were then placed back in the colony until they reached early adulthood (90–110 dph), when they were again isolated and recorded for 1–2 d. Following perfusion, as described above, the fixed tissue was examined for correct placement of microdialysis probes in Area X. Because the probes clogged 2–3 weeks after implantation, we were not able to infuse tracers to estimate drug diffusion in these birds.

Analysis of song data

Coefficient of variation

The pitch of a small component of the syllable (5–10 ms) was measured for no fewer than 50 catch’ syllables. The s.d. of these small syllable components was then divided by the mean of the small syllable components.

Percent change in pitch

For adult VTAX ablation experiments, channelrhodopsin experiments and adult microdialysis experiments, the percent change in pitch of the targeted syllable after learning was calculated. The pitch of the entire tonal component of the syllable was measured for no fewer than 50 catch syllables on the day before the manipulation began and then again on the last day of the manipulation. The experimental pitch was subtracted from the baseline pitch, divided by the baseline pitch, and then multiplied by 100 to calculate the percent change in pitch.

Change in auROC

To better gauge the change in birds’ pitch distributions after manipulation, the change in the area under the receiver operator characteristic (auROC) of the pitch of the targeted syllable after learning was measured for adult VTAX ablation experiments, channelrhodopsin experiments and adult microdialysis experiments. The pitch of the entire tonal component of the syllable was measured using no fewer than 50 catch syllables on the day before the manipulation began and then again on the last day of the manipulation. The auROC was calculated by taking the integral between the proportion of baseline pitches correctly considered baseline pitches and the proportion of experimental pitches incorrectly considered baseline pitches. The data were then bootstrapped to ensure an unbiased measurement.

Similarity scores

For juvenile VTAX and microdialysis experiments, SAP was used to calculate percent similarity either to the pupil’s tutor’s song (tutor similarity) or to the pupil’s own song (self-similarity). A representative motif from the tutor (for tutor similarity) or the pupil (for self-similarity) was selected then compared to no fewer than 30 of the pupil’s motifs in SAP using asymmetrical time-courses under the ‘similarity’ tab.

Statistics

Data are presented as mean ± s.e.m. unless otherwise noted. Error bars in all figures indicate the s.e.m. All groups with more than eight samples were tested for and passed the Kolmogorov–Smirnov test for normality. Groups with fewer than eight samples were not large enough to detect normality, but parametric tests were still used to detect differences in small samples. P values were calculated from two-tailed t tests between only two groups and listed in the figure legends. For groups of more than two, ANOVA were performed first to check for interaction term significance before t tests were performed. P values of 0.05 or below were considered significant. No statistical methods were used to predetermine sample sizes, but our sample sizes are similar to those reported in previous publications17,19. The experimenters were not blinded to allocation of subjects, and allocation of subjects was not randomized. Automatic detection and calculation of syllable frequencies allowed the experimenter to be blind to conditions before and after viral expression. All data were analyzed with Matlab software.

Life Sciences Reporting Summary

Further information on experimental design is available in the Life Sciences Reporting Summary.

Data and code availability statement

The data and software code that support the findings of this study are available from the corresponding author upon reasonable request.