Neural dynamics underlying birdsong practice and performance

Singh Alvarado, Jonnathan; Goffinet, Jack; Michael, Valerie; Liberti, William; Hatfield, Jordan; Gardner, Timothy; Pearson, John; Mooney, Richard

doi:10.1038/s41586-021-04004-1

Article
Published: 20 October 2021

Neural dynamics underlying birdsong practice and performance

Nature volume 599, pages 635–639 (2021)Cite this article

11k Accesses
16 Citations
106 Altmetric
Metrics details

Subjects

Abstract

Musical and athletic skills are learned and maintained through intensive practice to enable precise and reliable performance for an audience. Consequently, understanding such complex behaviours requires insight into how the brain functions during both practice and performance. Male zebra finches learn to produce courtship songs that are more varied when alone and more stereotyped in the presence of females¹. These differences are thought to reflect song practice and performance, respectively^2,3, providing a useful system in which to explore how neurons encode and regulate motor variability in these two states. Here we show that calcium signals in ensembles of spiny neurons (SNs) in the basal ganglia are highly variable relative to their cortical afferents during song practice. By contrast, SN calcium signals are strongly suppressed during female-directed performance, and optogenetically suppressing SNs during practice strongly reduces vocal variability. Unsupervised learning methods^4,5 show that specific SN activity patterns map onto distinct song practice variants. Finally, we establish that noradrenergic signalling reduces vocal variability by directly suppressing SN activity. Thus, SN ensembles encode and drive vocal exploration during practice, and the noradrenergic suppression of SN activity promotes stereotyped and precise song performance for an audience.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: SN ensemble activity is song-specific and variable.**

**Fig. 2: SN activity drives vocal variability.**

**Fig. 3: A joint neural–behavioural modelling approach relates SN population activity and song.**

**Fig. 4: Noradrenergic signalling in the sBG reduces vocal variability by directly suppressing SN activity.**

Song lyrics have become simpler and more repetitive over the last five decades

Article Open access 28 March 2024

Emilia Parada-Cabaleiro, Maximilian Mayerl, … Eva Zangerle

Sleep quality, duration, and consistency are associated with better academic performance in college students

Article Open access 01 October 2019

Kana Okano, Jakub R. Kaczmarzyk, … Jeffrey C. Grossman

Machine learning reveals the control mechanics of an insect wing hinge

Article 17 April 2024

Johan M. Melis, Igor Siwanowicz & Michael H. Dickinson

Data availability

Core datasets have been posted to the Duke University Library Research Data Repository (https://research.repository.duke.edu). Source data are provided with this paper.

Code availability

Custom code and software are available at https://github.com/pearsonlab/autoencoded-vocal-analysis and https://github.com/pearsonlab/finch-vae.

References

Sossinka, R. & Böhner, J. Song types in the zebra finch Poephila guttata castanotis 1. Zeitschrift für Tierpsychologie 53, 123–132 (1980).
Article Google Scholar
Kao, M. H., Doupe, A. J. & Brainard, M. S. Contributions of an avian basal ganglia–forebrain circuit to real-time modulation of song. Nature 433, 638–643 (2005).
Article ADS CAS PubMed Google Scholar
Jarvis, E. D., Scharff, C., Grossman, M. R., Ramos, J. A. & Nottebohm, F. For whom the bird sings: context-dependent gene expression. Neuron 21, 775–788 (1998).
Article CAS PubMed Google Scholar
Goffinet, J., Brudner, S., Mooney, R. & Pearson, J. Low-dimensional learned feature spaces quantify individual and group differences in vocal repertoires. eLife 10, e67855 (2021).
Article PubMed PubMed Central Google Scholar
Sainburg, T., Thielk, M. & Gentner, T. Q. Finding, visualizing, and quantifying latent structure across diverse animal vocal repertoires. PLoS Comput. Biol. 16, e1008228 (2020).
Article ADS CAS PubMed PubMed Central Google Scholar
Woolley, S. C. & Doupe, A. J. Social context-induced song variation affects female behavior and gene expression. PLoS Biol. 6, e62 (2008).
Article PubMed PubMed Central Google Scholar
Kao, M. H., Wright, B. D. & Doupe, A. J. Neurons in a forebrain nucleus required for vocal plasticity rapidly switch between precise firing and variable bursting depending on social context. J. Neurosci. 28, 13232–13247 (2008).
Article CAS PubMed PubMed Central Google Scholar
Woolley, S. C., Rajan, R., Joshua, M. & Doupe, A. J. Emergence of context-dependent variability across a basal ganglia network. Neuron 82, 208–223 (2014).
Article CAS PubMed PubMed Central Google Scholar
Kojima, S., Kao, M. H., Doupe, A. J. & Brainard, M. S. The avian basal ganglia are a source of rapid behavioral variation that enables vocal motor exploration. J. Neurosci. 38, 9635–9647 (2018).
Article CAS PubMed Google Scholar
Hein, A. M., Sridharan, A., Nordeen, K. W. & Nordeen, E. J. Characterization of CaMKII-expressing neurons within a striatal region implicated in avian vocal learning. Brain Res. 1155, 125–133 (2007).
Article CAS PubMed Google Scholar
Kozhevnikov, A. A. & Fee, M. S. Singing-related activity of identified HVC neurons in the zebra finch. J. Neurophysiol. 97, 4271–4283 (2007).
Article PubMed Google Scholar
Hahnloser, R. H. R., Kozhevnikov, A. A. & Fee, M. S. An ultra-sparse code underlies the generation of neural sequences in a songbird. Nature 419, 65–70 (2002).
Article ADS CAS PubMed Google Scholar
Liberti, W. A. 3rd et al. Unstable neurons underlie a stable learned behavior. Nat. Neurosci. 19, 1665–1671 (2016).
Article CAS PubMed PubMed Central Google Scholar
Kingma D. P. & Welling M. Auto-encoding variational Bayes. Preprint at https://arxiv.org/abs/1312.6114 (2013).
Rezende D. J., Mohamed S. & Wierstra D. Stochastic backpropagation and approximate inference in deep generative models. Preprint at http://arxiv.org/abs/1401.4082 (2014).
Björklund, A. & Dunnett, S. B. Dopamine neuron systems in the brain: an update. Trends Neurosci. 30, 194–202 (2007).
Article PubMed Google Scholar
Zerbi, V. et al. Rapid reconfiguration of the functional connectome after chemogenetic locus coeruleus activation. Neuron 103, 702–718.e5 (2019).
Article CAS PubMed Google Scholar
Castelino, C. B., Diekamp, B. & Ball, G. F. Noradrenergic projections to the song control nucleus area X of the medial striatum in male zebra finches (Taeniopygia guttata). J. Comp. Neurol. 502, 544–562 (2007).
Article CAS PubMed Google Scholar
Person, A. L., Gale, S. D., Farries, M. A. & Perkel, D. J. Organization of the songbird basal ganglia, including area X. J. Comp. Neurol. 508, 840–866 (2008).
Article PubMed Google Scholar
Castelino, C. B. & Ball, G. F. A role for norepinephrine in the regulation of context-dependent ZENK expression in male zebra finches (Taeniopygia guttata). Eur. J. Neurosci. 21, 1962–1972 (2005).
Article PubMed Google Scholar
Leblois, A., Wendel, B. J. & Perkel, D. J. Striatal dopamine modulates basal ganglia output and regulates social context-dependent behavioral variability through D1 receptors. J. Neurosci. 30, 5730–5743 (2010).
Article CAS PubMed PubMed Central Google Scholar
Hara, M. et al. Role of adrenoceptors in the regulation of dopamine/DARPP-32 signaling in neostriatal neurons. J. Neurochem. 113, 1046–1059 (2010).
Article CAS PubMed PubMed Central Google Scholar
Bharati, I. S. & Goodson, J. L. Fos responses of dopamine neurons to sociosexual stimuli in male zebra finches. Neuroscience 143, 661–670 (2006).
Article CAS PubMed Google Scholar
Budzillo, A., Duffy, A., Miller, K. E., Fairhall, A. L. & Perkel, D. J. Dopaminergic modulation of basal ganglia output through coupled excitation-inhibition. Proc. Natl Acad. Sci. USA 114, 5713–5718 (2017).
Article CAS PubMed PubMed Central Google Scholar
Aston-Jones, G. & Cohen, J. D. An integrative theory of locus coeruleus-norepinephrine function: adaptive gain and optimal performance. Annu. Rev. Neurosci. 28, 403–450 (2005).
Article CAS PubMed Google Scholar
Breton-Provencher, V. & Sur, M. Active control of arousal by a locus coeruleus GABAergic circuit. Nat. Neurosci. 22, 218–228 (2019).
Article CAS PubMed PubMed Central Google Scholar
Cooper, B. G. & Goller, F. Physiological insights into the social-context-dependent changes in the rhythm of the song motor program. J. Neurophysiol. 95, 3798–3809 (2006).
Article PubMed Google Scholar
Wong, A. L., Lindquist, M. A., Haith, A. M. & Krakauer, J. W. Explicit knowledge enhances motor vigor and performance: motivation versus practice in sequence tasks. J. Neurophysiol. 114, 219–232 (2015).
Article CAS PubMed PubMed Central Google Scholar
Pekny, S. E., Izawa, J. & Shadmehr, R. Reward-dependent modulation of movement variability. J. Neurosci. 35, 4015–4024 (2015).
Article CAS PubMed PubMed Central Google Scholar
Jaffe, P. I. & Brainard, M. S. Acetylcholine acts on songbird premotor circuitry to invigorate vocal output. eLife 9, e53288 (2020).
Article PubMed PubMed Central Google Scholar
Olveczky, B. P., Andalman, A. S. & Fee, M. S. Vocal experimentation in the juvenile songbird requires a basal ganglia circuit. PLoS Biol. 3, e153 (2005).
Article PubMed PubMed Central Google Scholar
Sober, S. J., Wohlgemuth, M. J. & Brainard, M. S. Central contributions to acoustic variation in birdsong. J. Neurosci. 28, 10370–10379 (2008).
Article CAS PubMed PubMed Central Google Scholar
Sheldon, Z. P. et al. Regulation of vocal precision by noradrenergic modulation of a motor nucleus. J. Neurophysiol. 124, 458–470 (2020).
Article PubMed Google Scholar
Fee, M. S. & Goldberg, J. H. A hypothesis for basal ganglia-dependent reinforcement learning in the songbird. Neuroscience 198, 152–170 (2011).
Article CAS PubMed Google Scholar
Markowitz, J. E. et al. The striatum organizes 3D behavior via moment-to-moment action selection. Cell 174, 44–58.e17 (2018).
Article CAS PubMed PubMed Central Google Scholar
Klaus, A. et al. The spatiotemporal organization of the striatum encodes action space. Neuron 95, 1171–1180.e7 (2017).
Article CAS PubMed PubMed Central Google Scholar
Hisey, E., Kearney, M. G. & Mooney, R. A common neural circuit mechanism for internally guided and externally reinforced forms of motor learning. Nat. Neurosci. 21, 589–597 (2018).
Article CAS PubMed PubMed Central Google Scholar
Xiao, L. et al. A basal ganglia circuit sufficient to guide birdsong learning. Neuron 98, 208–221.e5 (2018).
Article MathSciNet CAS PubMed PubMed Central Google Scholar
Coddington, L. T. & Dudman, J. T. The timing of action determines reward prediction signals in identified midbrain dopamine neurons. Nat. Neurosci. 21, 1563–1573 (2018).
Article CAS PubMed PubMed Central Google Scholar
Ghosh, K. K. et al. Miniaturized integration of a fluorescence microscope. Nat. Methods 8, 871–878 (2011).
Article CAS PubMed PubMed Central Google Scholar
Zhou, P. et al. Efficient and accurate extraction of in vivo calcium signals from microendoscopic video data. eLife 7, e28728 (2018).
Article PubMed PubMed Central Google Scholar
Pisanello, M. et al. Tailoring light delivery for optogenetics by modal demultiplexing in tapered optical fibers. Sci. Rep. 8, 4467 (2018).
Article ADS PubMed PubMed Central Google Scholar
Murphy, K. P. Machine Learning: A Probabilistic Perspective (MIT Press, 2012).
Wu, M. & Goodman, N. Multimodal generative models for scalable weakly-supervised learning. Adv. Neural Info. Process. Syst. 31, 5575–5585 (2018).
Google Scholar
Farries, M. A., Ding, L. & Perkel, D. J. Evidence for “direct” and “indirect” pathways through the song system basal ganglia. J. Comp. Neurol. 484, 93–104 (2005).
Article PubMed Google Scholar

Download references

Acknowledgements

The authors thank M. Booze for animal husbandry and K. Franks, F. Wang and D. Purves for editorial comments on an earlier version of this manuscript. This work was supported by NIH R01 NS099288 (R.M.), R01 NS118424 (R.M., J.P. and T.G.), the George Barth Geller Fund (R.M.), a Broad Predoctoral Fellowship (J.S.A.) and NIH Predoctoral Fellowship F31 DC017879 (V.M.).

Author information

Authors and Affiliations

Department of Neurobiology, Duke University, Durham, NC, USA
Jonnathan Singh Alvarado, Valerie Michael, Jordan Hatfield, John Pearson & Richard Mooney
Department of Computer Science, Duke University, Durham, NC, USA
Jack Goffinet
Department of Electrical Engineering and Computer Science, University of California Berkeley, Berkeley, CA, USA
William Liberti III
Phil and Penny Knight Campus for Accelerating Scientific Impact, University of Oregon, Eugene, OR, USA
Timothy Gardner
Department of Biostatistics & Bioinformatics, Duke University, Durham, NC, USA
John Pearson
Department of Electrical and Computer Engineering, Duke University, Durham, NC, USA
John Pearson

Authors

Jonnathan Singh Alvarado
View author publications
You can also search for this author in PubMed Google Scholar
Jack Goffinet
View author publications
You can also search for this author in PubMed Google Scholar
Valerie Michael
View author publications
You can also search for this author in PubMed Google Scholar
William Liberti III
View author publications
You can also search for this author in PubMed Google Scholar
Jordan Hatfield
View author publications
You can also search for this author in PubMed Google Scholar
Timothy Gardner
View author publications
You can also search for this author in PubMed Google Scholar
John Pearson
View author publications
You can also search for this author in PubMed Google Scholar
Richard Mooney
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

J.S.A. and R.M. designed all experiments except the HVC miniscope imaging experiments, which were designed by W.L. and T.G.; J.G. and J.P. developed VAE methods to analyse acoustic and neural data; J.S.A. performed all in vivo imaging and behavioural experiments and analysed all related data, except for HVC miniscope imaging experiments, which were executed by W.L. and T.G.; V.M. performed in vitro recordings and V.M. and J.S.A. analysed resulting data; J.S.A. and J.H. performed and analysed histological experiments; J.G. analysed acoustic and neural data using VAEs; J.S.A., J.G., J.P. and R.M. wrote the manuscript; J.S.A., J.G., W.L., T.G., J.P. and R.M. edited the manuscript.

Corresponding authors

Correspondence to John Pearson or Richard Mooney.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information Nature thanks David Robbe, Kazuhiro Wada and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data figures and tables

Extended Data Fig. 1 Targeting and characterization of SN activity.

a) CaMKII promoter strategy selectively labels SNs in the sBG. Left: Overlap of CaMKII-GCaMP and the SN marker DARPP-32. Middle: Superimposed image reveals no overlap between retrogradely labeled globus pallidus internus neurons and CaMKII-GFP (0/41 Tracer(+) neurons were co-labeled with GFP, n = 2 birds). Right: Superimposed images reveal almost no overlap between parvalbumin (PV) and CaMKII-GFP (5/251 PV(+) neurons were co-labeled with GFP, n = 2 birds). Scale bars = 100μm. b) Specificity and sensitivity of HVC PNs and sBG SNs. c) Median autocorrelation for all recorded SNs and HVC PNs (median HVC autocorrelation: 0.71, SNs: 0.16). d) Shared fraction of active ensemble for SNs and HVC PNs across song renditions (median HVC shared fraction: 0.86, SNs: 0.28). For (b–d), SNs: n = 529 neurons from 7 birds, HVC: n = 165 neurons from 5 birds. e) Example of mean SN activity aligned to body velocity; orange shading denotes singing periods. Data are displayed as mean + s.e.m. f) Detected movement initiations (1311 detected initiations from 1 recording session, top) aligned to SN activity from photometry recordings (bottom). g) Group data comparing mean SN activity during singing vs. non-singing locomotion (Student’s one-sided paired t-test, t₃=2.464, *p = 0.0453; n = 4 birds). h) Group data comparing mean SN activity during singing vs. playback of the bird’s own song (Student’s one-sided paired t-test; t₃= 3.31, *p = 0.0226; n = 4 birds). i) Disrupted auditory feedback during singing does not acutely affect SN activity. A random 50% of song renditions were targeted for syllable-triggered white noise (top). Participation probability was not affected by the playback of white noise (Student’s two-sided paired t-test; p = 0.91; n = 184 neurons from 3 birds). j) Example traces shown for 4 SNs comparing activity during normal singing and during singing-triggered noise. t = 0 denotes target syllable onset, dashed line is white noise onset. Only song renditions in which the cell participated were included. 0/184 neurons were found to be significantly modulated by white noise (two-sided Mann-Whitney U-test with Hochberg correction, 0/184 significantly modulated neurons from 3 birds). All error bars denote mean + s.e.m.

Source data

Extended Data Fig. 2 Example song-related SN activity.

a) Representative motif spectrogram (top) aligned to sample activity traces from the first 6 undirected song renditions for 5 ROIs, aligned to song motif onset (vertical dashed line; a-g, syllables; i, introductory notes). b) Same representative motif as (a), with activity heatmaps for all 171 trials collected throughout the day, along with the corresponding values for all-to-all correlation and sensitivity. Color scale represents z-scored fluorescence. c) Fluorescence trace for one neuron showing two example calcium events (top). Event-triggered probability of song syllable for the 5 neurons (bottom, see methods). All detected calcium events in the time series (27.7 minutes of concatenated recordings, 4.9 minutes with vocalizations) were used to generate the average spectrogram, which is visually represented in terms of the probability of occurrence for each syllable.

Extended Data Fig. 3 Supplemental analyses of song, movement, and neural activity during directed and undirected song.

a) Example frequency contours of syllable ‘d’ in undirected (blue) and directed (red) renditions. b) Birds with head-mounted miniscopes exhibit typical directed song features in addition to decreased pitch variability, such as faster directed motifs. Left: Cumulative distribution plot for motif durations in 1 bird. Right: group data for 6 birds (Student’s two-sided paired t-test, t5 = 1.87 p = 0.12, n = 6 birds). c) Directed motifs are preceded by more introductory notes than undirected motifs. (Student’s two-sided paired t-test, t5 = −7.69, ***p = 0.00094, n = 6 birds). d) Top: mean activity of 53 ROIs during directed and undirected singing from one bird. Bottom: mean SN population activity aligned to song onset. e) Heatmap of mean population activity for interleaved undirected and directed singing. Dashed line = onset of first syllable in motif. f) Left: Mean z-scored activity in undirected and directed conditions, plotted for all ROIs that were collected in directed and undirected conditions, averaged across all collected songs Right: Similar to left, but using only trials in which each neuron had a detected event (n = 215 neurons from 6 birds).g) Relationship between ROI signal (peak of averaged active trials) and the ratio between its directed and undirected activity (n = 215 neurons from 6 birds). Dashed line indicates no modulation (D/U = 1). h) Photometry (top) and velocity (bottom) color-matched traces aligned to undirected (n = 13) and directed (n = 11) songs. Dashed line indicates the onset of the first motif syllable. i) R values between average locomotion during song (500 ms time window) and DF/F for one bird, computed from data in (h) (f). j) Left: Group data showing R values comparing average song-related neural activity to movement in two conditions: averaging locomotion values over a window of 500 ms before motif onset (pre-song) or 500 ms after motif onset (during-song). Right: Corresponding p values. k) Representative histology of photometry recordings. Left: Histology of AAV 2/9 AxGCaMP6m.p2a.nls.tdTomato injection into HVC. Middle: HVC axons in sBG from the same bird. Right: Local injection of AAV 2/9.CaMKII.GCaMP6s into sBG. Scale bar = 50 μm. l) Sample recording session for dual recordings from HVC and HVC_sBG axons. Undirected singing, (blue) female presentation and directed singing (red) are collected in the same session. m) Same as (l), but for SN photometry. All error bars denote mean + s.e.m.

Source data

Extended Data Fig. 4 Additional analyses of optogenetic suppression experiments.

a) Experimental approach with representative histology from a CAG.ArchT injection into the sBG, both shown in sagittal view. b) Optrode recording in a CAG-ArchT bird showing suppressive effect of green light illumination on spontaneous action potential activity of a single sBG unit. c) Group data showing suppressive effects of green light illumination across neurons (CamKII, 5 neurons, CAG, 5 neurons). d) Sagittal schematic for coinjections of Pan-neuronal (CAG) and SN (CaMKII) fluorescent proteins into the sBG. e) CAG-driven expression of TdTomato (magenta) and CaMKII-driven expression of GFP (green) shown superimposed in the sBG (left) and in separate green (middle) and magenta (right) channels in the pallido-recipient thalamic nucleus DLM. Scale bar = 250 µm. f) Experimental approach for syllable-triggered optogenetic inhibition. g) Group data showing pitch variability during directed and laser-stimulated singing normalized to undirected singing for pan-neuronal inhibition. Mixed effects model, 2-sided permutation test. Laser effect size (relative to baseline: -17.8%, ***p = 0.0011, n = 14 syllables from 6 birds. Directed singing effect size: 13.8%, *p = 0.01, n = 12 syllables from 5 birds.h) Pitch variability group data (same data as Fig. 2j and Extended Data Fig. 4g), non-normalized, comparing values during undirected song versus either undirected + laser (L) or directed (D) conditions. i) Intrasyllabic variability data normalized to undirected levels. Mixed effects, 2-sided permutation test. Model fit to non-normalized data, comparing undirected and experimental (undirected + laser (green), or directed (red)) conditions (for model output details, see Tables 1 and 2 in Supplementary Information for model details, in all cases significance was assessed using a two-sided permutation test). Pan-Neuronal: Estimated laser effect size: −0.00071 (−10.11% of baseline) + 0.0002, *p = 0.015. Estimated directed singing effect size: -0.0013 (-20.65%) + 6.94, **p = 0.0098. SNs: Estimated laser effect size: −0.00034 (-4.09%) + 0.00011, **p = 0.007. Estimated directed singing effect size: -0.0033 (−22.18%) + 0.00063. GFP: Estimated laser effect size: −0.000054 (0.60%) + 0.00043, p = 0.80. Estimated directed singing effect size = −0.0027 (−36.00%) + 0.00050, ***p = 0.000082. Pan-neuronal Laser n = 14 syllables from 6 birds, directed n = 12 syllables from 5 birds; SNs: Laser n = 16 syllables from 6 birds, directed n = 12 syllables from 5 birds; GFP: Laser n = 15 syllables from 5 birds for laser, directed n = 10 syllables from 4 birds. j) Mean syllable frequency group data normalized to undirected levels. Pan-Neuronal: Estimated laser effect size: 3.55 + 3.14 Hz, p = 0.27. Estimated directed singing effect size: 9.91 + 6.94 Hz, p = 0.17. SNs: Estimated laser effect size: - 8.28 + 4.54 Hz, p = 0.079. Estimated directed singing effect size: −13.95 + 9.13 Hz, p = 0.14. GFP: Estimated laser effect size: 0.60 + 0.37 Hz, p = 0.12. Estimated directed singing effect size = 3.041 + 2.46 Hz, p = 0.23. Pan-neuronal laser n = 14 syllables from 6 birds, directed n = 12 syllables from 6 birds; SNs: Laser n = 16 syllables from 6 birds, directed n = 12 syllables from 5 birds; GFP: Laser N = 15 syllables from 5 birds for laser, directed n = 10 syllables from 4 birds. k) Mean syllable duration group data normalized to undirected levels. Pan-Neuronal: Estimated laser effect size: -0.58 + 0.23 ms, *p = 0.016. Estimated directed singing effect size: −0.65 + 0.28 msec, *p =0.035. SNs: Estimated laser effect size: -0.82 + 0.36 ms, **p = 0.029. Estimated directed singing effect size −2.76 + 0.62 ms, ***p = 0.00016. GFP: Estimated laser effect size: -0.84 + 0.63 ms, p = 0.19. Estimated directed singing effect size = −4.37 + 0.80 ms, ***p = 0.000030. Pan-neuronal: Laser N = 14 syllables from 6 birds, directed N = 12 syllables from 6 birds; SNs: Laser N = 16 syllables from 6 birds, directed n = 12 syllables from 5 birds; GFP: Laser n = 15 syllables from 5 birds for laser, directed n = 10 syllables from 4 birds. Data are displayed as mean + sem. All error bars denote mean + s.e.m. Pan-neuronal: Laser N = 14 syllables from 6 birds, Directed N = 12 syllables from 6 birds; SNs: Laser N = 16 syllables from 6 birds, Dir N = 12 syllables from 5 birds; GFP: Laser N = 15 syllables from 5 birds for laser, Dir N = 10 syllables from 4 birds. Data are displayed as mean + sem. All error bars denote mean + s.e.m.

Source data

Extended Data Fig. 5 Joint encoding model details and comparison to alternate models.

a) Schematic for learning low dimensional latent features of motif spectrograms using a variational autoencoder (VAE) approach. The model learns a compressed representation of the data that is sufficient to reconstruct the original. b) Cumulative distribution of pairwise song distances in VAE latent space, grouped by the similarity of the associated neural patterns (neural correlation percentiles, yellow to blue). For more dissimilar neural activity (yellow), songs are farther apart in VAE space, while more highly correlated neural activity (blue) shifts the distribution to the left, implying overall more similar songs, as indicated by smaller VAE vocal latent distances. c) Group data showing median VAE distance (relative to the mean) within each neural correlation decile for all 7 sessions from 5 birds; pairs of trials with highly correlated neural activity patterns are closer in VAE latent space. Marker shape denotes bird identity. d) Schematic of the joint modeling approach. Acoustic data is modeled using a VAE as before (boxed region) and a second VAE is used to model the neural data. A global latent variable is then used to capture shared variation in the two modalities. e) Schematic of model training and validation. VAE models were trained using sevenfold cross-validation. Within each fold, data were partitioned into seven tranches, five for VAE model training (magenta), one for VAE model validation and hyperparameter selection (cyan), and one for assessing model performance (yellow). For the VAE model, average performance on the yellow test set across the seven cross-validation folds is reported. For predictive models trained to predict one set of latents from another, a “leave-one-out” strategy on the yellow data set (right) was used to select predictive model hyperparameters and assess performance. f) Joint encoding outperforms a collection of control models. The shuffle control randomly pairs spectrograms and ROI activity vectors. The time control uses time-in-session to predict the joint encoding model’s neural latents (left) and vocal latents (right). The linear model comprises independently trained neural and vocal variational autoencoders (as in Fig. 3a without the global latent), with emission and recognition networks restricted to linear mappings. The separate encoding model comprises independently trained neural and vocal variational autoencoders with emission and recognition models parameterized by deep neural networks. The joint encoding model is the full model as presented in Fig. 3a. For all models, prediction is performed using ridge regression and test performance is evaluated using the cross-validation procedure described in Methods. Average test set performance over 7 cross-validation folds of each of 7 sessions from 5 birds is shown. Each line represents a single bird-session. g) Model comparison split by experimental session. Performance (measured by R²) for the task of predicting vocal latents from neural latents (top) and vice versa (bottom) for each of 7 sessions from 5 birds. In addition to the models presented in b, the comparison includes models using motif tempo to predict joint encoding neural latents (top) and vocal latents (bottom); using kernel ridge regression in place of linear ridge regression (with leave-one-out regularization strength and radial basis function bandwidth selection); and a version of the joint encoding model with emission and recognition networks restricted to linear mappings. Joint encoding predictive performance is compared with each control model for each experimental session (one-sided Wilcoxon signed-rank test, * denotes p <0.05). For both imaging sessions of one bird (bird 5, denoted by triangles in panels b–d), both neural latents and vocal latents could be robustly predicted from song tempo. h) Left: Predictive performance versus number of song motifs (left) for each of 7 experimental sessions. Poor predictive performance is observed for experimental sessions with fewer than 300 motifs and fewer than 50 ROIs (not shown). Symbols denote birds, as in panels b and c. Right: Similar to left. Opaque markers indicate performance using only first motifs in each bout, faded markers performance indicate performance using all motifs.

Source data

Extended Data Fig. 6 Joint encoding model preprocessing and additional examples.

a) To minimize time confounders, components of calcium activity vectors (top left) and spectrograms (top middle) that could reliably be predicted by time-of-day were removed (red lines; see Methods). The calcium and spectrogram residuals after prediction are used for further analysis in place of the original data (top right). Positive weights are shown in green and negative weights in magenta. Note that the effects are restricted to regions with vocalization. For two example spectrograms from 10:15 (bottom left) and 10:35 (bottom middle), time-of-day correction makes the resulting syllables more similar to one another. Scale bar for right column: 100 ms. b) Left: Despite time warping, spectrograms show consistent tempo-related changes. Difference plot between the average faster-than-median spectrograms and the average slower-than-median spectrograms (bottom, positive values in green, negative in magenta) for one example bird (Bird 3, squares). The consistent horizontal bands throughout the motif indicate upward pitch shifts associated with faster tempos, which were observed for almost all experimental sessions. Scale bars denote 100 ms. Right: Both ensemble activity and warped spectrograms contain information about tempo. For each experimental session, tempo can be predicted from ensemble activity vectors (blue) and spectrograms (red) after both signals have been corrected for time-of-day. Dotted line denotes chance performance. Scale bars denote 100 ms. c) Spectrograms also show consistent motif-number-related changes. For the same example bird as in b, the average of the first motifs in every bout and the average of all other motifs exhibit clear differences (bottom, positive values in green, negative in magenta). Right: Both ensemble activity and time-warped spectrograms contain information about motif number. For each experimental session, motif number (first motif vs. rest) could be reliably predicted from ensemble activity vectors (blue) and spectrograms (red) using the same procedure described for tempo prediction (reporting test accuracy, weighted by class so that chance performance is 0.5). Dotted line denotes chance performance. Scale bars denote 100 ms. d) Example average ROI activity aligned to the first syllable of bouts consisting of 1, 2, 3 or 4 motifs. Note that ROIs 20 and 21 display qualitatively different activities in bouts of different lengths. e) Weighted average generated spectrograms and ROI activity pairs, with weights given by their projection along the correlation axis, describe how song spectrograms (middle column) and neural activity (right column) vary together. P-values refer to corresponding correlations of held-out test data, as in Figure 3c. Scale bars for left and middle columns: 100 ms. Scale bars for right column: 250 μm.

Source data

Extended Data Fig. 7 Effects of adrenergic signaling manipulations on song and neural activity.

a) Retrograde labelling of dopamine beta hydroxylase (DBH) and tyrosine hydroxylase (TH) positive cell bodies in the locus coeruleus (LC) and ventral tegmental area (VTA), respectively, following retrograde tracer injections into the sBG. Scale bar = xx microns and applies to both panels. b) SCH, PHE, and CLON infusion do not significantly affect singing rates (Student’s two-tailed paired t-test, CLON: t₅ = 0.81, p = 0.46, n = 6 birds; PHE: t₇ = 0.97, p = 0.28, n = 8 birds; SCH: t₇ =1.17, p = 0.36, n = 8 birds). c) SCH, PHE, and CLON infusion do not significantly affect number of introductory notes per bout (Student’s two-tailed paired t-test, CLON: t₅ = −0.16, p = 0.88, n = 6 birds; PHE: t₇ = 1.05, p = 0.33, n = 8 birds; SCH: t₇ = 0.91, p = 0.30, n = 8 birds). d) Left: Effects of PHE on pitch variability in directed and undirected song. Mixed effects model, 2-sided permutation test, see Tables 1 and 2 in Supplementary Information for model details. Estimated effect of drug presence on directed: 0.36 (26.1% of baseline) + 0.17, *p = 0.02. Estimated effect of drug presence on undirected: −0.14 (6.7% of baseline) + 0.12, p = 0.24. Middle: Effects of CLON on pitch variability in directed and undirected song. Estimated effect of drug presence on DIR: −0.025 (1.5% of baseline) + 0.10, p = 0.1. Estimated effect of drug presence on UNDIR: −0.36 (14.8% of baseline) + 0.12, **p = 0.0049. Right: Effects of SCH on pitch variability in directed and undirected song. Estimated effect of drug presence on directed: 0.081 (4.9% of baseline) + 0.22, p = 0.71. Estimated effect of drug presence on undirected: 0.048 (2% of baseline) + 0.84, p = 0.57. e) Representative histology showing photometry probe and microdialysis probe placement into sBG for simultaneous drug delivery and photometry. Scale bar = 100 μm. f) Representative DF/F measurements during directed and undirected singing for one bird before and after (>1 hour) beginning muscimol infusion. g) Muscimol infusion suppresses calcium signals recorded in the sBG during both directed and undirected singing (Student’s two-tailed paired t-test; t₄ = 3.63, p-values are indicated; n = 5 birds). h) Sample traces for SN imaging during infusion of SCH23390 into the sBG. i) Group data showing mean SN photometry signals during SCH23390 infusion in undirected and directed conditions (n = 4 birds). j) DARPP32 and α2c-AR mRNA co-expression sBG SNs (n = 3 birds). Scale bar = 20 μm. k) Low power confocal images showing Fos mRNA expression in a sagittal section of the finch brain across behavioral conditions. Dashed white outlines highlight the sBG and HVC. l) Fos intensity levels in HVC and the sBG plotted against motif count (30-minute window) in either directed (red) or undirected (blue) singing conditions. For all immediate early gene experiments, undirected n = 6 birds, directed n = 7 birds, silent n = 6 birds. m) Example confocal image z-stack collected in the LC. The intensity and area of Fos puncta (magenta) were quantified within the TH-positive mask (yellow). Scale bar = 50 μm. n) Mean Fos intensity and area within LC TH mask plotted against for directed (red) and undirected (blue) motif counts. o) Group data for Fos intensity (left) or area (right) plotted for TH and VGAT masks during either directed (red, N = 7 birds) or undirected (blue, N = 6 birds) singing conditions. One-way ANOVAs with post hoc Tukey tests were performed separately for TH and VGAT masks under each condition. Post hoc comparisons for significant ANOVAs are displayed. Fos mRNA puncta Intensity: TH mask, F_(2,16) = 7.46, **p = 0.0051, VGAT mask, F_(2,16) = 7.4, **p = 0.0053. Fos mRNA Area: TH mask, F_(2,16) = 9.02, p = 0.0024, VGAT mask, F_(2,16) = 3.6, p = 0.051.

Source data

Extended Data Fig. 8 Effects of adrenergic signaling on SN excitability.

a) Rise time, sag, and resting membrane potential can be used to distinguish SNs from non-SNs in the sBG (see Methods). b) Three more example SNs recorded during baseline, NA, and PHE. c) Effect of NA and PHE on SN resting membrane potential (One-way repeated measures ANOVA with Greenhouse-Geisser correction. F_{(1.487, 16.36)} = 0.5950 p = 0.51; n = 11 cells). d) Effect of NA and PHE on SN input resistance (One-way repeated measures ANOVA with Greenhouse-Geisser correction and post-hoc Tukey test. F_{(1.229, 13.52)} = 7.980; Baseline vs NA: p = 0.054; NA vs PHE: ***p = 0.0003; n = 11 cells). e) 2 example SNs recorded during baseline and PHE, from a different experiment than a–d. f) F-I curves showing increased action potentials in response to positive current injection for baseline and PHE conditions (n = 12 cells). g) Effect of PHE on SN resting membrane potential (Student’s two-tailed paired t-test, t₁₀ = 0.55, p = 0.59; n = 11 cells, separate from those shown in panel c). h) Effect of PHE on SN input resistance (Student’s two-tailed paired t-test, t₁₀ = 5.42, ***p = 0.0003; n = 11 cells, separate from those shown in panel d).

Source data

Supplementary information

Supplementary Information

Reporting Summary

Peer Review File

Supplementary Video 1

A movie showing calcium signals recorded with a miniature microscope in the basal ganglia of a male zebra finch singing in social isolation (undirected song).

Supplementary Video 2

The left frame shows calcium signals recorded with a miniature microscope in the basal ganglia of a male zebra finch singing in social isolation (undirected song). The right frame shows calcium signals recorded with a miniature microscope in the basal ganglia of a male zebra finch singing to a nearby female (directed song). The two movies are from the same imaging field in the same male finch, collected several minutes apart.

Source data

Source Data Fig. 1

Source Data Fig. 2

Source Data Fig. 3

Source Data Fig. 4

Source Data Extended Data Fig. 1

Source Data Extended Data Fig. 3

Source Data Extended Data Fig. 4

Source Data Extended Data Fig. 5

Source Data Extended Data Fig. 6

Source Data Extended Data Fig. 7

Source Data Extended Data Fig. 8

Source Data Extended Data Fig. 9

Rights and permissions

Reprints and permissions

About this article

Cite this article

Singh Alvarado, J., Goffinet, J., Michael, V. et al. Neural dynamics underlying birdsong practice and performance. Nature 599, 635–639 (2021). https://doi.org/10.1038/s41586-021-04004-1

Download citation

Received: 08 December 2020
Accepted: 07 September 2021
Published: 20 October 2021
Issue Date: 25 November 2021
DOI: https://doi.org/10.1038/s41586-021-04004-1

This article is cited by

Flexible circuit mechanisms for context-dependent song sequencing
- Frederic A. Roemschied
- Diego A. Pacheco
- Mala Murthy
Nature (2023)
Dopaminergic error signals retune to social feedback during courtship
- Andrea Roeser
- Vikram Gadagkar
- Jesse H. Goldberg
Nature (2023)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Subjects

Abstract

Access options

Similar content being viewed by others

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Additional information

Extended data figures and tables

Supplementary information

Source data

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Comments

Search

Quick links