Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Neural dynamics underlying birdsong practice and performance

Abstract

Musical and athletic skills are learned and maintained through intensive practice to enable precise and reliable performance for an audience. Consequently, understanding such complex behaviours requires insight into how the brain functions during both practice and performance. Male zebra finches learn to produce courtship songs that are more varied when alone and more stereotyped in the presence of females1. These differences are thought to reflect song practice and performance, respectively2,3, providing a useful system in which to explore how neurons encode and regulate motor variability in these two states. Here we show that calcium signals in ensembles of spiny neurons (SNs) in the basal ganglia are highly variable relative to their cortical afferents during song practice. By contrast, SN calcium signals are strongly suppressed during female-directed performance, and optogenetically suppressing SNs during practice strongly reduces vocal variability. Unsupervised learning methods4,5 show that specific SN activity patterns map onto distinct song practice variants. Finally, we establish that noradrenergic signalling reduces vocal variability by directly suppressing SN activity. Thus, SN ensembles encode and drive vocal exploration during practice, and the noradrenergic suppression of SN activity promotes stereotyped and precise song performance for an audience.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: SN ensemble activity is song-specific and variable.
Fig. 2: SN activity drives vocal variability.
Fig. 3: A joint neural–behavioural modelling approach relates SN population activity and song.
Fig. 4: Noradrenergic signalling in the sBG reduces vocal variability by directly suppressing SN activity.

Similar content being viewed by others

Data availability

Core datasets have been posted to the Duke University Library Research Data Repository (https://research.repository.duke.edu). Source data are provided with this paper.

Code availability

Custom code and software are available at https://github.com/pearsonlab/autoencoded-vocal-analysis and https://github.com/pearsonlab/finch-vae.

References

  1. Sossinka, R. & Böhner, J. Song types in the zebra finch Poephila guttata castanotis 1. Zeitschrift für Tierpsychologie 53, 123–132 (1980).

    Article  Google Scholar 

  2. Kao, M. H., Doupe, A. J. & Brainard, M. S. Contributions of an avian basal ganglia–forebrain circuit to real-time modulation of song. Nature 433, 638–643 (2005).

    Article  ADS  CAS  PubMed  Google Scholar 

  3. Jarvis, E. D., Scharff, C., Grossman, M. R., Ramos, J. A. & Nottebohm, F. For whom the bird sings: context-dependent gene expression. Neuron 21, 775–788 (1998).

    Article  CAS  PubMed  Google Scholar 

  4. Goffinet, J., Brudner, S., Mooney, R. & Pearson, J. Low-dimensional learned feature spaces quantify individual and group differences in vocal repertoires. eLife 10, e67855 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  5. Sainburg, T., Thielk, M. & Gentner, T. Q. Finding, visualizing, and quantifying latent structure across diverse animal vocal repertoires. PLoS Comput. Biol. 16, e1008228 (2020).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  6. Woolley, S. C. & Doupe, A. J. Social context-induced song variation affects female behavior and gene expression. PLoS Biol. 6, e62 (2008).

    Article  PubMed  PubMed Central  Google Scholar 

  7. Kao, M. H., Wright, B. D. & Doupe, A. J. Neurons in a forebrain nucleus required for vocal plasticity rapidly switch between precise firing and variable bursting depending on social context. J. Neurosci. 28, 13232–13247 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Woolley, S. C., Rajan, R., Joshua, M. & Doupe, A. J. Emergence of context-dependent variability across a basal ganglia network. Neuron 82, 208–223 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Kojima, S., Kao, M. H., Doupe, A. J. & Brainard, M. S. The avian basal ganglia are a source of rapid behavioral variation that enables vocal motor exploration. J. Neurosci. 38, 9635–9647 (2018).

    Article  CAS  PubMed  Google Scholar 

  10. Hein, A. M., Sridharan, A., Nordeen, K. W. & Nordeen, E. J. Characterization of CaMKII-expressing neurons within a striatal region implicated in avian vocal learning. Brain Res. 1155, 125–133 (2007).

    Article  CAS  PubMed  Google Scholar 

  11. Kozhevnikov, A. A. & Fee, M. S. Singing-related activity of identified HVC neurons in the zebra finch. J. Neurophysiol. 97, 4271–4283 (2007).

    Article  PubMed  Google Scholar 

  12. Hahnloser, R. H. R., Kozhevnikov, A. A. & Fee, M. S. An ultra-sparse code underlies the generation of neural sequences in a songbird. Nature 419, 65–70 (2002).

    Article  ADS  CAS  PubMed  Google Scholar 

  13. Liberti, W. A. 3rd et al. Unstable neurons underlie a stable learned behavior. Nat. Neurosci. 19, 1665–1671 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Kingma D. P. & Welling M. Auto-encoding variational Bayes. Preprint at https://arxiv.org/abs/1312.6114 (2013).

  15. Rezende D. J., Mohamed S. & Wierstra D. Stochastic backpropagation and approximate inference in deep generative models. Preprint at http://arxiv.org/abs/1401.4082 (2014).

  16. Björklund, A. & Dunnett, S. B. Dopamine neuron systems in the brain: an update. Trends Neurosci. 30, 194–202 (2007).

    Article  PubMed  Google Scholar 

  17. Zerbi, V. et al. Rapid reconfiguration of the functional connectome after chemogenetic locus coeruleus activation. Neuron 103, 702–718.e5 (2019).

    Article  CAS  PubMed  Google Scholar 

  18. Castelino, C. B., Diekamp, B. & Ball, G. F. Noradrenergic projections to the song control nucleus area X of the medial striatum in male zebra finches (Taeniopygia guttata). J. Comp. Neurol. 502, 544–562 (2007).

    Article  CAS  PubMed  Google Scholar 

  19. Person, A. L., Gale, S. D., Farries, M. A. & Perkel, D. J. Organization of the songbird basal ganglia, including area X. J. Comp. Neurol. 508, 840–866 (2008).

    Article  PubMed  Google Scholar 

  20. Castelino, C. B. & Ball, G. F. A role for norepinephrine in the regulation of context-dependent ZENK expression in male zebra finches (Taeniopygia guttata). Eur. J. Neurosci. 21, 1962–1972 (2005).

    Article  PubMed  Google Scholar 

  21. Leblois, A., Wendel, B. J. & Perkel, D. J. Striatal dopamine modulates basal ganglia output and regulates social context-dependent behavioral variability through D1 receptors. J. Neurosci. 30, 5730–5743 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Hara, M. et al. Role of adrenoceptors in the regulation of dopamine/DARPP-32 signaling in neostriatal neurons. J. Neurochem. 113, 1046–1059 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Bharati, I. S. & Goodson, J. L. Fos responses of dopamine neurons to sociosexual stimuli in male zebra finches. Neuroscience 143, 661–670 (2006).

    Article  CAS  PubMed  Google Scholar 

  24. Budzillo, A., Duffy, A., Miller, K. E., Fairhall, A. L. & Perkel, D. J. Dopaminergic modulation of basal ganglia output through coupled excitation-inhibition. Proc. Natl Acad. Sci. USA 114, 5713–5718 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Aston-Jones, G. & Cohen, J. D. An integrative theory of locus coeruleus-norepinephrine function: adaptive gain and optimal performance. Annu. Rev. Neurosci. 28, 403–450 (2005).

    Article  CAS  PubMed  Google Scholar 

  26. Breton-Provencher, V. & Sur, M. Active control of arousal by a locus coeruleus GABAergic circuit. Nat. Neurosci. 22, 218–228 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Cooper, B. G. & Goller, F. Physiological insights into the social-context-dependent changes in the rhythm of the song motor program. J. Neurophysiol. 95, 3798–3809 (2006).

    Article  PubMed  Google Scholar 

  28. Wong, A. L., Lindquist, M. A., Haith, A. M. & Krakauer, J. W. Explicit knowledge enhances motor vigor and performance: motivation versus practice in sequence tasks. J. Neurophysiol. 114, 219–232 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Pekny, S. E., Izawa, J. & Shadmehr, R. Reward-dependent modulation of movement variability. J. Neurosci. 35, 4015–4024 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. Jaffe, P. I. & Brainard, M. S. Acetylcholine acts on songbird premotor circuitry to invigorate vocal output. eLife 9, e53288 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  31. Olveczky, B. P., Andalman, A. S. & Fee, M. S. Vocal experimentation in the juvenile songbird requires a basal ganglia circuit. PLoS Biol. 3, e153 (2005).

    Article  PubMed  PubMed Central  Google Scholar 

  32. Sober, S. J., Wohlgemuth, M. J. & Brainard, M. S. Central contributions to acoustic variation in birdsong. J. Neurosci. 28, 10370–10379 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. Sheldon, Z. P. et al. Regulation of vocal precision by noradrenergic modulation of a motor nucleus. J. Neurophysiol. 124, 458–470 (2020).

    Article  PubMed  Google Scholar 

  34. Fee, M. S. & Goldberg, J. H. A hypothesis for basal ganglia-dependent reinforcement learning in the songbird. Neuroscience 198, 152–170 (2011).

    Article  CAS  PubMed  Google Scholar 

  35. Markowitz, J. E. et al. The striatum organizes 3D behavior via moment-to-moment action selection. Cell 174, 44–58.e17 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. Klaus, A. et al. The spatiotemporal organization of the striatum encodes action space. Neuron 95, 1171–1180.e7 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Hisey, E., Kearney, M. G. & Mooney, R. A common neural circuit mechanism for internally guided and externally reinforced forms of motor learning. Nat. Neurosci. 21, 589–597 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. Xiao, L. et al. A basal ganglia circuit sufficient to guide birdsong learning. Neuron 98, 208–221.e5 (2018).

    Article  MathSciNet  CAS  PubMed  PubMed Central  Google Scholar 

  39. Coddington, L. T. & Dudman, J. T. The timing of action determines reward prediction signals in identified midbrain dopamine neurons. Nat. Neurosci. 21, 1563–1573 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  40. Ghosh, K. K. et al. Miniaturized integration of a fluorescence microscope. Nat. Methods 8, 871–878 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  41. Zhou, P. et al. Efficient and accurate extraction of in vivo calcium signals from microendoscopic video data. eLife 7, e28728 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  42. Pisanello, M. et al. Tailoring light delivery for optogenetics by modal demultiplexing in tapered optical fibers. Sci. Rep. 8, 4467 (2018).

    Article  ADS  PubMed  PubMed Central  Google Scholar 

  43. Murphy, K. P. Machine Learning: A Probabilistic Perspective (MIT Press, 2012).

  44. Wu, M. & Goodman, N. Multimodal generative models for scalable weakly-supervised learning. Adv. Neural Info. Process. Syst. 31, 5575–5585 (2018).

    Google Scholar 

  45. Farries, M. A., Ding, L. & Perkel, D. J. Evidence for “direct” and “indirect” pathways through the song system basal ganglia. J. Comp. Neurol. 484, 93–104 (2005).

    Article  PubMed  Google Scholar 

Download references

Acknowledgements

The authors thank M. Booze for animal husbandry and K. Franks, F. Wang and D. Purves for editorial comments on an earlier version of this manuscript. This work was supported by NIH R01 NS099288 (R.M.), R01 NS118424 (R.M., J.P. and T.G.), the George Barth Geller Fund (R.M.), a Broad Predoctoral Fellowship (J.S.A.) and NIH Predoctoral Fellowship F31 DC017879 (V.M.).

Author information

Authors and Affiliations

Authors

Contributions

J.S.A. and R.M. designed all experiments except the HVC miniscope imaging experiments, which were designed by W.L. and T.G.; J.G. and J.P. developed VAE methods to analyse acoustic and neural data; J.S.A. performed all in vivo imaging and behavioural experiments and analysed all related data, except for HVC miniscope imaging experiments, which were executed by W.L. and T.G.; V.M. performed in vitro recordings and V.M. and J.S.A. analysed resulting data; J.S.A. and J.H. performed and analysed histological experiments; J.G. analysed acoustic and neural data using VAEs; J.S.A., J.G., J.P. and R.M. wrote the manuscript; J.S.A., J.G., W.L., T.G., J.P. and R.M. edited the manuscript.

Corresponding authors

Correspondence to John Pearson or Richard Mooney.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information Nature thanks David Robbe, Kazuhiro Wada and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data figures and tables

Extended Data Fig. 1 Targeting and characterization of SN activity.

a) CaMKII promoter strategy selectively labels SNs in the sBG. Left: Overlap of CaMKII-GCaMP and the SN marker DARPP-32. Middle: Superimposed image reveals no overlap between retrogradely labeled globus pallidus internus neurons and CaMKII-GFP (0/41 Tracer(+) neurons were co-labeled with GFP, n = 2 birds).  Right: Superimposed images reveal almost no overlap between parvalbumin (PV) and CaMKII-GFP (5/251 PV(+) neurons were co-labeled with GFP, n = 2 birds). Scale bars = 100μm. b) Specificity and sensitivity of HVC PNs and sBG SNs. c) Median autocorrelation for all recorded SNs and HVC PNs (median HVC autocorrelation: 0.71, SNs: 0.16). d) Shared fraction of active ensemble for SNs and HVC PNs across song renditions (median HVC shared fraction: 0.86, SNs: 0.28). For (bd), SNs: n = 529 neurons from 7 birds, HVC: n = 165 neurons from 5 birds. e) Example of mean SN activity aligned to body velocity; orange shading denotes singing periods.  Data are displayed as mean + s.e.m. f) Detected movement initiations (1311 detected initiations from 1 recording session, top) aligned to SN activity from photometry recordings (bottom). g) Group data comparing mean SN activity during singing vs. non-singing locomotion (Student’s one-sided paired t-test, t3=2.464, *p = 0.0453; n = 4 birds). h) Group data comparing mean SN activity during singing vs. playback of the bird’s own song (Student’s one-sided paired t-test; t3= 3.31, *p = 0.0226; n = 4 birds). i) Disrupted auditory feedback during singing does not acutely affect SN activity. A random 50% of song renditions were targeted for syllable-triggered white noise (top). Participation probability was not affected by the playback of white noise (Student’s two-sided paired t-test; p = 0.91; n = 184 neurons from 3 birds). j) Example traces shown for 4 SNs comparing activity during normal singing and during singing-triggered noise. t = 0 denotes target syllable onset, dashed line is white noise onset. Only song renditions in which the cell participated were included. 0/184 neurons were found to be significantly modulated by white noise (two-sided Mann-Whitney U-test with Hochberg correction, 0/184 significantly modulated neurons from 3 birds). All error bars denote mean + s.e.m.

Source data

Extended Data Fig. 2 Example song-related SN activity.

a) Representative motif spectrogram (top) aligned to sample activity traces from the first 6 undirected song renditions for 5 ROIs, aligned to song motif onset (vertical dashed line; a-g, syllables; i, introductory notes). b) Same representative motif as (a), with activity heatmaps for all 171 trials collected throughout the day, along with the corresponding values for all-to-all correlation and sensitivity. Color scale represents z-scored fluorescence. c) Fluorescence trace for one neuron showing two example calcium events (top). Event-triggered probability of song syllable for the 5 neurons (bottom, see methods). All detected calcium events in the time series (27.7 minutes of concatenated recordings, 4.9 minutes with vocalizations) were used to generate the average spectrogram, which is visually represented in terms of the probability of occurrence for each syllable.

Extended Data Fig. 3 Supplemental analyses of song, movement, and neural activity during directed and undirected song.

a) Example frequency contours of syllable ‘d’ in undirected (blue) and directed (red) renditions. b) Birds with head-mounted miniscopes exhibit typical directed song features in addition to decreased pitch variability, such as faster directed motifs. Left: Cumulative distribution plot for motif durations in 1 bird. Right: group data for 6 birds (Student’s two-sided paired t-test, t5 = 1.87 p = 0.12, n = 6 birds). c) Directed motifs are preceded by more introductory notes than undirected motifs. (Student’s two-sided paired t-test, t5 = −7.69, ***p = 0.00094, n = 6 birds). d) Top: mean activity of 53 ROIs during directed and undirected singing from one bird. Bottom: mean SN population activity aligned to song onset. e) Heatmap of mean population activity for interleaved undirected and directed singing. Dashed line = onset of first syllable in motif. f) Left: Mean z-scored activity in undirected and directed conditions, plotted for all ROIs that were collected in directed and undirected conditions, averaged across all collected songs Right: Similar to left, but using only trials in which each neuron had a detected event (n = 215 neurons from 6 birds).g) Relationship between ROI signal (peak of averaged active trials) and the ratio between its directed and undirected activity (n = 215 neurons from 6 birds). Dashed line indicates no modulation (D/U = 1). hPhotometry (top) and velocity (bottom) color-matched traces aligned to undirected (n = 13) and directed (n = 11) songs. Dashed line indicates the onset of the first motif syllable. i) R values between average locomotion during song (500 ms time window) and DF/F for one bird, computed from data in (h) (f). jLeft: Group data showing R values comparing average song-related neural activity to movement in two conditions: averaging locomotion values over a window of 500 ms before motif onset (pre-song) or 500 ms after motif onset (during-song). Right: Corresponding p values. k)  Representative histology of photometry recordings. Left: Histology of AAV 2/9 AxGCaMP6m.p2a.nls.tdTomato injection into HVC. Middle: HVC axons in sBG from the same bird.  Right: Local injection of AAV 2/9.CaMKII.GCaMP6s into sBG. Scale bar = 50 μm. l) Sample recording session for dual recordings from HVC and HVCsBG axons. Undirected singing, (blue) female presentation and directed singing (red) are collected in the same session. m) Same as (l), but for SN photometry. All error bars denote mean + s.e.m.

Source data

Extended Data Fig. 4 Additional analyses of optogenetic suppression experiments.

a) Experimental approach with representative histology from a CAG.ArchT injection into the sBG, both shown in sagittal view. b) Optrode recording in a CAG-ArchT bird showing suppressive  effect of green light illumination  on spontaneous action potential activity of a single sBG unit. c) Group data showing suppressive effects of green light illumination across neurons (CamKII, 5 neurons, CAG, 5 neurons). d) Sagittal schematic for coinjections of Pan-neuronal (CAG) and SN (CaMKII) fluorescent proteins  into the sBG. e) CAG-driven expression of TdTomato (magenta) and CaMKII-driven expression of GFP (green) shown superimposed in the sBG (left) and in separate green (middle) and magenta (right) channels in the pallido-recipient thalamic nucleus DLM. Scale bar = 250 µm. f) Experimental approach for syllable-triggered optogenetic inhibition.   g) Group data showing pitch variability during directed and laser-stimulated singing normalized to undirected singing for pan-neuronal inhibition. Mixed effects model, 2-sided permutation test. Laser effect size (relative to baseline: -17.8%, ***p = 0.0011, n = 14 syllables from 6 birds. Directed singing effect size: 13.8%, *p = 0.01, n = 12 syllables from 5 birds.h) Pitch variability group data (same data as Fig. 2j and Extended Data Fig. 4g), non-normalized, comparing values during undirected song versus either undirected + laser (L) or directed (D) conditions. i) Intrasyllabic variability data normalized to undirected levels. Mixed effects, 2-sided permutation test. Model fit to non-normalized data, comparing undirected and experimental (undirected + laser (green), or directed (red)) conditions (for model output details, see Tables 1 and 2 in Supplementary Information for model details, in all cases significance was assessed using a two-sided permutation test). Pan-Neuronal: Estimated laser effect size: −0.00071 (−10.11% of baseline) + 0.0002, *p = 0.015. Estimated directed singing effect size: -0.0013 (-20.65%) + 6.94, **p = 0.0098. SNs: Estimated laser effect size: −0.00034 (-4.09%) + 0.00011, **p = 0.007.  Estimated directed singing effect size: -0.0033 (−22.18%) + 0.00063.  GFP: Estimated laser effect size: −0.000054 (0.60%) + 0.00043, p = 0.80. Estimated directed singing effect size = −0.0027 (−36.00%) + 0.00050, ***p = 0.000082. Pan-neuronal Laser n = 14 syllables from 6 birds, directed n = 12 syllables from 5 birds; SNs: Laser n = 16 syllables from 6 birds, directed n = 12 syllables from 5 birds; GFP: Laser n = 15 syllables from 5 birds for laser, directed n = 10 syllables from 4 birds. j) Mean syllable frequency group data normalized to undirected levels. Pan-Neuronal: Estimated laser effect size: 3.55 + 3.14 Hz, p = 0.27. Estimated directed singing effect size: 9.91 + 6.94 Hz, p = 0.17. SNs: Estimated laser effect size: - 8.28 + 4.54 Hz, p = 0.079.  Estimated directed singing effect size: −13.95 + 9.13 Hz, p = 0.14.  GFP: Estimated laser effect size: 0.60 + 0.37 Hz, p = 0.12. Estimated directed singing effect size = 3.041 + 2.46 Hz, p = 0.23. Pan-neuronal laser n = 14 syllables from 6 birds, directed n = 12 syllables from 6 birds; SNs: Laser n = 16 syllables from 6 birds, directed n = 12 syllables from 5 birds; GFP: Laser N = 15 syllables from 5 birds for laser, directed n = 10 syllables from 4 birds. k) Mean syllable duration group data normalized to undirected levels. Pan-Neuronal: Estimated laser effect size: -0.58 + 0.23 ms, *p = 0.016. Estimated directed singing effect size: −0.65 + 0.28 msec, *p =0.035. SNs: Estimated laser effect size: -0.82 + 0.36 ms, **p = 0.029. Estimated directed singing effect size −2.76 + 0.62 ms, ***p = 0.00016.  GFP: Estimated laser effect size: -0.84 + 0.63 ms, p = 0.19. Estimated directed singing effect size = −4.37 + 0.80 ms, ***p = 0.000030. Pan-neuronal: Laser N = 14 syllables from 6 birds, directed N = 12 syllables from 6 birds; SNs: Laser N = 16 syllables from 6 birds, directed n = 12 syllables from 5 birds; GFP: Laser n = 15 syllables from 5 birds for laser, directed n = 10 syllables from 4 birds. Data are displayed as mean + sem. All error bars denote mean + s.e.m. Pan-neuronal: Laser N = 14 syllables from 6 birds, Directed N = 12 syllables from 6 birds; SNs: Laser N = 16 syllables from 6 birds, Dir N = 12 syllables from 5 birds; GFP: Laser N = 15 syllables from 5 birds for laser, Dir N = 10 syllables from 4 birds. Data are displayed as mean + sem. All error bars denote mean + s.e.m.

Source data

Extended Data Fig. 5 Joint encoding model details and comparison to alternate models.

a) Schematic for learning low dimensional latent features of motif spectrograms using a variational autoencoder (VAE) approach. The model learns a compressed representation of the data that is sufficient to reconstruct the original. b) Cumulative distribution of pairwise song distances in VAE latent space, grouped by the similarity of the associated neural patterns (neural correlation percentiles, yellow to blue). For more dissimilar neural activity (yellow), songs are farther apart in VAE space, while more highly correlated neural activity (blue) shifts the distribution to the left, implying overall more similar songs, as indicated by smaller VAE vocal latent distances. c) Group data showing median VAE distance (relative to the mean) within each neural correlation decile for all 7 sessions from 5 birds; pairs of trials with highly correlated neural activity patterns are closer in VAE latent space. Marker shape denotes bird identity. d) Schematic of the joint modeling approach. Acoustic data is modeled using a VAE as before (boxed region) and a second VAE is used to model the neural data. A global latent variable is then used to capture shared variation in the two modalities. e) Schematic of model training and validation. VAE models were trained using sevenfold cross-validation. Within each fold, data were partitioned into seven tranches, five for VAE model training (magenta), one for VAE model validation and hyperparameter selection (cyan), and one for assessing model performance (yellow). For the VAE model, average performance on the yellow test set across the seven cross-validation folds is reported. For predictive models trained to predict one set of latents from another, a “leave-one-out” strategy on the yellow data set (right) was used to select predictive model hyperparameters and assess performance. f) Joint encoding outperforms a collection of control models. The shuffle control randomly pairs spectrograms and ROI activity vectors. The time control uses time-in-session to predict the joint encoding model’s neural latents (left) and vocal latents (right). The linear model comprises independently trained neural and vocal variational autoencoders (as in Fig. 3a without the global latent), with emission and recognition networks restricted to linear mappings. The separate encoding model comprises independently trained neural and vocal variational autoencoders with emission and recognition models parameterized by deep neural networks. The joint encoding model is the full model as presented in Fig. 3a. For all models, prediction is performed using ridge regression and test performance is evaluated using the cross-validation procedure described in Methods. Average test set performance over 7 cross-validation folds of each of 7 sessions from 5 birds is shown. Each line represents a single bird-session. g) Model comparison split by experimental session. Performance (measured by R2) for the task of predicting vocal latents from neural latents (top) and vice versa (bottom) for each of 7 sessions from 5 birds. In addition to the models presented in b, the comparison includes models using motif tempo to predict joint encoding neural latents (top) and vocal latents (bottom); using kernel ridge regression in place of linear ridge regression (with leave-one-out regularization strength and radial basis function bandwidth selection); and a version of the joint encoding model with emission and recognition networks restricted to linear mappings. Joint encoding predictive performance is compared with each control model for each experimental session (one-sided Wilcoxon signed-rank test, * denotes p <0.05). For both imaging sessions of one bird (bird 5, denoted by triangles in panels b–d), both neural latents and vocal latents could be robustly predicted from song tempo. h) Left: Predictive performance versus number of song motifs (left) for each of 7 experimental sessions. Poor predictive performance is observed for experimental sessions with fewer than 300 motifs and fewer than 50 ROIs (not shown). Symbols denote birds, as in panels b and c. Right: Similar to left. Opaque markers indicate performance using only first motifs in each bout, faded markers performance indicate performance using all motifs.

Source data

Extended Data Fig. 6 Joint encoding model preprocessing and additional examples.

a) To minimize time confounders, components of calcium activity vectors (top left) and spectrograms (top middle) that could reliably be predicted by time-of-day were removed (red lines; see Methods). The calcium and spectrogram residuals after prediction are used for further analysis in place of the original data (top right). Positive weights are shown in green and negative weights in magenta. Note that the effects are restricted to regions with vocalization. For two example spectrograms from 10:15 (bottom left) and 10:35 (bottom middle), time-of-day correction makes the resulting syllables more similar to one another. Scale bar for right column: 100 ms. b) Left: Despite time warping, spectrograms show consistent tempo-related changes. Difference plot between the average faster-than-median spectrograms and the average slower-than-median spectrograms (bottom, positive values in green, negative in magenta) for one example bird (Bird 3, squares). The consistent horizontal bands throughout the motif indicate upward pitch shifts associated with faster tempos, which were observed for almost all experimental sessions. Scale bars denote 100 ms. Right: Both ensemble activity and warped spectrograms contain information about tempo. For each experimental session, tempo can be predicted from ensemble activity vectors (blue) and spectrograms (red) after both signals have been corrected for time-of-day. Dotted line denotes chance performance. Scale bars denote 100 ms. c) Spectrograms also show consistent motif-number-related changes. For the same example bird as in b, the average of the first motifs in every bout and the average of all other motifs exhibit clear differences (bottom, positive values in green, negative in magenta). Right: Both ensemble activity and time-warped spectrograms contain information about motif number. For each experimental session, motif number (first motif vs. rest) could be reliably predicted from ensemble activity vectors (blue) and spectrograms (red) using the same procedure described for tempo prediction (reporting test accuracy, weighted by class so that chance performance is 0.5). Dotted line denotes chance performance. Scale bars denote 100 ms. d) Example average ROI activity aligned to the first syllable of bouts consisting of 1, 2, 3 or 4 motifs. Note that ROIs 20 and 21 display qualitatively different activities in bouts of different lengths. e) Weighted average generated spectrograms and ROI activity pairs, with weights given by their projection along the correlation axis, describe how song spectrograms (middle column) and neural activity (right column) vary together. P-values refer to corresponding correlations of held-out test data, as in Figure 3c. Scale bars for left and middle columns: 100 ms. Scale bars for right column: 250 μm.

Source data

Extended Data Fig. 7 Effects of adrenergic signaling manipulations on song and neural activity.

a) Retrograde labelling of dopamine beta hydroxylase (DBH) and tyrosine hydroxylase (TH) positive cell bodies in the locus coeruleus (LC) and ventral tegmental area (VTA), respectively, following retrograde tracer injections into the sBG.  Scale bar = xx microns and applies to both panels. b) SCH, PHE, and CLON infusion do not significantly affect singing rates (Student’s two-tailed paired t-test, CLON: t5 = 0.81, p = 0.46, n = 6 birds; PHE: t7 = 0.97, p = 0.28, n = 8 birds; SCH: t7 =1.17, p = 0.36, n = 8 birds). c) SCH, PHE, and CLON infusion do not significantly affect number of introductory notes per bout (Student’s two-tailed paired t-test, CLON: t5 = −0.16, p = 0.88, n = 6 birds; PHE: t7 = 1.05, p = 0.33, n = 8 birds; SCH: t7 = 0.91, p = 0.30, n = 8 birds). d) Left: Effects of PHE on pitch variability in directed and undirected song. Mixed effects model, 2-sided permutation test, see Tables 1 and 2 in Supplementary Information for model details. Estimated effect of drug presence on directed: 0.36 (26.1% of baseline) + 0.17, *p = 0.02. Estimated effect of drug presence on undirected: −0.14 (6.7% of baseline) + 0.12, p = 0.24. Middle: Effects of CLON on pitch variability in directed and undirected song. Estimated effect of drug presence on DIR: −0.025 (1.5% of baseline) + 0.10, p = 0.1. Estimated effect of drug presence on UNDIR: −0.36 (14.8% of baseline) + 0.12, **p = 0.0049. Right: Effects of SCH on pitch variability in directed and undirected song. Estimated effect of drug presence on directed: 0.081 (4.9% of baseline) + 0.22, p = 0.71. Estimated effect of drug presence on undirected: 0.048 (2% of baseline) + 0.84, p = 0.57. e) Representative histology showing photometry probe and microdialysis probe placement into sBG for simultaneous drug delivery and photometry. Scale bar = 100 μm. f) Representative DF/F measurements during directed and undirected singing for one bird before and after (>1 hour) beginning muscimol infusion. g) Muscimol infusion suppresses calcium signals recorded in the sBG during both directed and undirected singing (Student’s two-tailed paired t-test; t4 = 3.63, p-values are indicated; n = 5 birds). h) Sample traces for SN imaging during infusion of SCH23390 into the sBG. i) Group data showing mean SN photometry signals during SCH23390 infusion in undirected and directed conditions (n = 4 birds). j) DARPP32 and α2c-AR mRNA co-expression sBG SNs (n = 3 birds). Scale bar = 20 μm. k) Low power confocal images showing Fos mRNA expression in a sagittal section of the finch brain across behavioral conditions. Dashed white outlines highlight the sBG and HVC. l) Fos intensity levels in HVC and the sBG plotted against motif count (30-minute window) in either directed (red) or undirected (blue) singing conditions. For all immediate early gene experiments, undirected n =   6 birds, directed n = 7 birds, silent n = 6 birds. m) Example confocal image z-stack collected in the LC. The intensity and area of Fos puncta (magenta) were quantified within the TH-positive mask (yellow).  Scale bar = 50 μm. n) Mean Fos intensity and area within LC TH mask plotted against for directed (red) and undirected (blue) motif counts. o) Group data for Fos intensity (left) or area (right) plotted for TH and VGAT masks during either directed (red, N = 7 birds) or undirected (blue, N = 6 birds) singing conditions. One-way ANOVAs with post hoc Tukey tests were performed separately for TH and VGAT masks under each condition. Post hoc comparisons for significant ANOVAs are displayed. Fos mRNA puncta Intensity: TH mask, F(2,16) = 7.46, **p = 0.0051, VGAT mask, F(2,16) = 7.4, **p = 0.0053.  Fos mRNA Area: TH mask, F(2,16) = 9.02, p = 0.0024, VGAT mask, F(2,16) = 3.6, p = 0.051.

Source data

Extended Data Fig. 8 Effects of adrenergic signaling on SN excitability.

a) Rise time, sag, and resting membrane potential can be used to distinguish SNs from non-SNs in the sBG (see Methods). b) Three more example SNs recorded during baseline, NA, and PHE. c) Effect of NA and PHE on SN resting membrane potential (One-way repeated measures ANOVA with Greenhouse-Geisser correction. F(1.487, 16.36) = 0.5950 p = 0.51; n = 11 cells). d) Effect of NA and PHE on SN input resistance (One-way repeated measures ANOVA with Greenhouse-Geisser correction and post-hoc Tukey test. F(1.229, 13.52) = 7.980; Baseline vs NA: p = 0.054; NA vs PHE: ***p = 0.0003; n = 11 cells). e) 2 example SNs recorded during baseline and PHE, from a different experiment than ad. f) F-I curves showing increased action potentials in response to positive current injection for baseline and PHE conditions (n = 12 cells). g) Effect of PHE on SN resting membrane potential (Student’s two-tailed paired t-test, t10 = 0.55, p = 0.59; n = 11 cells, separate from those shown in panel c). h) Effect of PHE on SN input resistance (Student’s two-tailed paired t-test, t10 = 5.42, ***p = 0.0003; n = 11 cells, separate from those shown in panel d).

Source data

Supplementary information

Supplementary Information

Reporting Summary

Peer Review File

Supplementary Video 1

A movie showing calcium signals recorded with a miniature microscope in the basal ganglia of a male zebra finch singing in social isolation (undirected song).

Supplementary Video 2

The left frame shows calcium signals recorded with a miniature microscope in the basal ganglia of a male zebra finch singing in social isolation (undirected song). The right frame shows calcium signals recorded with a miniature microscope in the basal ganglia of a male zebra finch singing to a nearby female (directed song). The two movies are from the same imaging field in the same male finch, collected several minutes apart.

Source data

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Singh Alvarado, J., Goffinet, J., Michael, V. et al. Neural dynamics underlying birdsong practice and performance. Nature 599, 635–639 (2021). https://doi.org/10.1038/s41586-021-04004-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41586-021-04004-1

This article is cited by

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing