Emergent tuning for learned vocalizations in auditory cortex

Moore, Jordan M.; Woolley, Sarah M. N.

doi:10.1038/s41593-019-0458-4

Article
Published: 12 August 2019

Emergent tuning for learned vocalizations in auditory cortex

Nature Neuroscience volume 22, pages 1469–1476 (2019)Cite this article

5505 Accesses
39 Citations
146 Altmetric
Metrics details

Subjects

Abstract

Vocal learners use early social experience to develop auditory skills specialized for communication. However, it is unknown where in the auditory pathway neural responses become selective for vocalizations or how the underlying encoding mechanisms change with experience. We used a vocal tutoring manipulation in two species of songbird to reveal that tuning for conspecific song arises within the primary auditory cortical circuit. Neurons in the deep region of primary auditory cortex responded more to conspecific songs than to other species’ songs and more to species-typical spectrotemporal modulations, but neurons in the intermediate (thalamorecipient) region did not. Moreover, birds that learned song from another species exhibited parallel shifts in selectivity and tuning toward the tutor species’ songs in the deep but not the intermediate region. Our results locate a region in the auditory processing hierarchy where an experience-dependent coding mechanism aligns auditory responses with the output of a learned vocal motor behavior.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Juvenile songbirds learn song from conspecific or heterospecific tutors.**

**Fig. 2: Selectivity for conspecific song emerges in the primary AC.**

**Fig. 3: Song selectivity and population response dynamics are experience dependent.**

**Fig. 4: Tuning for the spectrotemporal modulations in learned song emerges in parallel with song selectivity.**

**Fig. 5: Neural population response dynamics to song reflect tuning for spectrotemporal modulations.**

**Fig. 6: Neurons that respond selectively to same species’ songs have highly similar modulation tuning regardless of species identity or tutoring experience.**

Complex activity and short-term plasticity of human cerebral organoids reciprocally connected with axons

Article Open access 10 April 2024

Tatsuya Osaki, Tomoya Duenki, … Yoshiho Ikeuchi

Climbing fibers provide essential instructive signals for associative learning

Article Open access 02 April 2024

N. Tatiana Silva, Jorge Ramírez-Buriticá, … Megan R. Carey

Centripetal integration of past events in hippocampal astrocytes regulated by locus coeruleus

Article Open access 03 April 2024

Peter Rupprecht, Sian N. Duss, … Fritjof Helmchen

Data availability

The data that support the findings of this study are available from the corresponding author upon request.

Code availability

The code used to analyze data in this study is available from the corresponding author upon request.

References

Bradbury, J. W. & Vehrencamp, S. L. Principles of Animal Communication 2nd edn (Sinauer Associates, 2011).
Ord, T. J. & Stamps, J. A. Species identity cues in animal communication. Am. Nat. 174, 585–593 (2009).
Article Google Scholar
Dooling, R. J., Brown, S. D., Klump, G. M. & Okanoya, K. Auditory perception of conspecific and heterospecific vocalizations in birds: evidence for special processes. J. Comp. Psychol. 106, 20–28 (1992).
Article CAS Google Scholar
Woolley, S. M. N., Fremouw, T. E., Hsu, A. & Theunissen, F. E. Tuning for spectro-temporal modulations as a mechanism for auditory discrimination of natural sounds. Nat. Neurosci. 8, 1371–1379 (2005).
Article CAS Google Scholar
Saffran, J. R., Werker, J. F. & Werner, L. A. in Handbook of Child Development (eds Seigler, R. & Kuhn, D.) 58–108 (John Wiley & Sons, 2006).
Poremba, A., Bigelow, J. & Rossi, B. Processing of communication sounds: contributions of learning, memory, and experience. Hear. Res. 305, 31–44 (2013).
Article Google Scholar
Doupe, A. J. & Kuhl, P. K. Birdsong and human speech: common themes and mechanisms. Annu. Rev. Neurosci. 22, 567–631 (1999).
Article CAS Google Scholar
Konishi, M. The role of auditory feedback in birdsong. Ann. NY Acad. Sci. 1016, 463–475 (2004).
Article Google Scholar
Werker, J. F. & Tees, R. C. Cross-language speech perception: evidence for perceptual reorganization during the first year of life. Infant Behav. Dev. 7, 49–63 (1984).
Article Google Scholar
Kuhl, P. K., Williams, K. A., Lacerda, F., Stevens, K. N. & Lindblom, B. Linguistic experience alters phonetic perception in infants by 6 months of age. Science 255, 606–608 (1992).
Article CAS Google Scholar
Johnson, J. S. & Newport, E. L. Critical period effects in second language learning: the influence of maturational state on the acquisition of English as a second language. Cogn. Psychol. 21, 60–99 (1989).
Article CAS Google Scholar
Riebel, K. Song and female mate choice in zebra finches: a review. Adv. Study Behav. 40, 197–238 (2009).
Article Google Scholar
Immelmann, K. in Bird Vocalizations (ed Hinde, R. A.) 61–74 (Cambridge Univ. Press, 1969).
Belin, P., Zatorre, R. J., Lafaille, P., Ahad, P. & Pike, B. Voice-selective areas in human auditory cortex. Nature 403, 309–312 (2000).
Article CAS Google Scholar
Näätänen, R. et al. Language-specific phoneme representations revealed by electric and magnetic brain responses. Nature 385, 432–434 (1997).
Article Google Scholar
Mesgarani, N., Cheung, C., Johnson, K. & Chang, E. F. Phonetic feature encoding in human superior temporal gyrus. Science 343, 1006–1010 (2014).
Article CAS Google Scholar
Winkler, I. et al. Brain responses reveal the learning of foreign language phonemes. Psychophysiology 36, 638–642 (1999).
Article CAS Google Scholar
Butler, A. B., Reiner, A. & Karten, H. J. Evolution of the amniote pallium and the origins of mammalian neocortex. Ann. NY Acad. Sci. 1225, 14–27 (2011).
Article Google Scholar
Wang, Y., Brzozowska-Prechtl, A. & Karten, H. J. Laminar and columnar auditory cortex in avian brain. Proc. Natl Acad. Sci. USA 107, 12676–12681 (2010).
Article CAS Google Scholar
Calabrese, A. & Woolley, S. M. N. Coding principles of the canonical cortical microcircuit in the avian brain. Proc. Natl Acad. Sci. USA 112, 3517–3522 (2015).
Article CAS Google Scholar
Grace, J. A., Amin, N., Singh, N. C. & Theunissen, F. E. Selectivity for conspecific song in the zebra finch auditory forebrain. J. Neurophysiol. 89, 472–487 (2003).
Article Google Scholar
Amin, N., Gastpar, M. & Theunissen, F. E. Selective and efficient neural coding of communication signals depends on early acoustic and social environment. PLoS One 8, e61417 (2013).
Article CAS Google Scholar
Eales, L. A. Do zebra finch males that have been raised by another species still tend to select a conspecific song tutor? Anim. Behav. 35, 1347–1355 (1987).
Article Google Scholar
Gomes, A. C. R., Sorenson, M. D. & Cardoso, G. C. Speciation is associated with changing ornamentation rather than stronger sexual selection. Evolution 70, 2823–2838 (2016).
Article Google Scholar
Fecteau, S., Armony, J. L., Joanette, Y. & Belin, P. Is voice processing species-specific in human auditory cortex? An fMRI study. Neuroimage 23, 840–848 (2004).
Article Google Scholar
Remedios, R., Logothetis, N. K. & Kayser, C. An auditory region in the primate insular cortex responding preferentially to vocal communication sounds. J. Neurosci. 29, 1034–1045 (2009).
Article CAS Google Scholar
Wang, X., Merzenich, M. M., Beitel, R. & Schreiner, C. E. Representation of a species-specific vocalization in the primary auditory cortex of the common marmoset: temporal and spectral characteristics. J. Neurophysiol. 74, 2685–2706 (1995).
Article CAS Google Scholar
Perrodin, C., Kayser, C., Logothetis, N. K. & Petkov, C. I. Voice cells in the primate temporal lobe. Curr. Biol. 21, 1408–1415 (2011).
Article CAS Google Scholar
Carruthers, I. M., Natan, R. G. & Geffen, M. N. Encoding of ultrasonic vocalizations in the auditory cortex. J. Neurophysiol. 109, 1912–1927 (2013).
Article Google Scholar
Zhang, L. I., Bao, S. & Merzenich, M. M. Persistent and specific influences of early acoustic environments on primary auditory cortex. Nat. Neurosci. 4, 1123–1130 (2001).
Article CAS Google Scholar
Chang, E. F. & Merzenich, M. M. Environmental noise retards auditory cortical development. Science 300, 498–502 (2003).
Article CAS Google Scholar
Sarro, E. C. & Sanes, D. H. The cost and benefit of juvenile training on adult perceptual skill. J. Neurosci. 31, 5383–5391 (2011).
Article CAS Google Scholar
Sanes, D. H. & Woolley, S. M. N. A behavioral framework to guide research on central auditory development and plasticity. Neuron 72, 912–929 (2011).
Article CAS Google Scholar
Chi, T., Ru, P. & Shamma, S. A. Multiresolution spectrotemporal analysis of complex sounds. J. Acoust. Soc. Am. 118, 887–906 (2005).
Article Google Scholar
Shamma, S. A., Versnel, H. & Kowalski, N. Ripple analysis in ferret primary auditory cortex. I. Response characteristics of single units to sinusoidally rippled spectra. Audit. Neurosci. 1, 233–254 (1995).
Google Scholar
Hullett, P. W., Hamilton, L. S., Mesgarani, N., Schreiner, C. E. & Chang, E. F. Human superior temporal gyrus organization of spectrotemporal modulation tuning derived from speech stimuli. J. Neurosci. 36, 2014–2026 (2016).
Article CAS Google Scholar
Bizley, J. K., Walker, K. M., Nodal, F. R., King, A. J. & Schnupp, J. W. Auditory cortex represents both pitch judgments and the corresponding acoustic cues. Curr. Biol. 23, 620–625 (2013).
Article CAS Google Scholar
Fukushima, M., Saunders, R. C., Leopold, D. A., Mishkin, M. & Averbeck, B. B. Differential coding of conspecific vocalizations in the ventral auditory cortical stream. J. Neurosci. 34, 4665–4676 (2014).
Article CAS Google Scholar
Harris, K. D. & Thiele, A. Cortical state and attention. Nat. Rev. Neurosci. 12, 509–523 (2011).
Article CAS Google Scholar
Kato, H. K., Gillet, S. N. & Isaacson, J. S. Flexible sensory representations in auditory cortex driven by behavioral relevance. Neuron 88, 1027–1039 (2015).
Article CAS Google Scholar
Caras, M. L. & Sanes, D. H. Top-down modulation of sensory cortex gates perceptual learning. Proc. Natl Acad. Sci. USA 114, 9972–9977 (2017).
Article CAS Google Scholar
Schneider, D. M. & Woolley, S. M. N. Discrimination of communication vocalizations by single neurons and groups of neurons in the auditory midbrain. J. Neurophysiol. 103, 3248–3265 (2010).
Article Google Scholar
Mesgarani, N., David, S. V., Fritz, J. B. & Shamma, S. A. Influence of context and behavior on stimulus reconstruction from neural activity in primary auditory cortex. J. Neurophysiol. 102, 3329–3339 (2009).
Article Google Scholar
Sun, W. & Barbour, D. L. Rate, not selectivity, determines neuronal population coding accuracy in auditory cortex. PLoS Biol. 15, e2002459 (2017).
Article Google Scholar
Razak, K. A., Richardson, M. D. & Fuzessery, Z. M. Experience is required for the maintenance and refinement of FM sweep selectivity in the developing auditory cortex. Proc. Natl Acad. Sci. USA 105, 4465–4470 (2008).
Article CAS Google Scholar
Schreiner, C. E. & Polley, D. B. Auditory map plasticity: diversity in causes and consequences. Curr. Opin. Neurobiol. 24, 143–156 (2014).
Article CAS Google Scholar
Han, Y. K., Köver, H., Insanally, M. N., Semerdjian, J. H. & Bao, S. Early experience impairs perceptual discrimination. Nat. Neurosci. 10, 1191–1197 (2007).
Article CAS Google Scholar
Caras, M. L. & Sanes, D. H. Sustained perceptual deficits from transient sensory deprivation. J. Neurosci. 35, 10831–10842 (2015).
Article CAS Google Scholar
Green, D. B., Mattingly, M. M., Ye, Y., Gay, J. D. & Rosen, M. J. Brief stimulus exposure fully remediates temporal processing deficits induced by early hearing loss. J. Neurosci. 37, 7759–7771 (2017).
Article CAS Google Scholar
Cousillas, H. et al. Experience-dependent neuronal specialization and functional organization in the central auditory area of a songbird. Eur. J. Neurosci. 19, 3343–3352 (2004).
Article Google Scholar
Mandelblat-Cerf, Y. & Fee, M. S. An automated procedure for evaluating song imitation. PLoS One 9, e96484 (2014).
Article Google Scholar
Fortune, E. S. & Margoliash, D. Cytoarchitectonic organization and morphology of cells of the field L complex in male zebra finches (Taenopygia guttata). J. Comp. Neurol. 325, 388–404 (1992).
Article CAS Google Scholar
Quiroga, R. Q., Nadasdy, Z. & Ben-Shaul, Y. Unsupervised spike detection and sorting with wavelets and superparamagnetic clustering. Neural Comput. 16, 1661–1687 (2004).
Article Google Scholar
Wild, J., Prekopcsak, Z., Sieger, T., Novak, D. & Jech, R. Performance comparison of extracellular spike sorting algorithms for single-channel recordings. J. Neurosci. Methods 203, 369–376 (2012).
Article Google Scholar
Joris, P. X., Louage, D. H., Cardoen, L. & van der Heijden, M. Correlation index: a new metric to quantify temporal coding. Hear. Res. 216-217, 19–30 (2006).
Article Google Scholar

Download references

Acknowledgements

We thank J. Schumacher, A. Calabrese, D. Schneider, H. Brew, S. Rosis and N. So for suggestions on design and analysis. We are grateful to N. Mesgarani, M. Long, N. So and E. Perez for comments on previous versions of the manuscript. Funding was provided by NIH grant no. DC009810 (to S.M.N.W.) and NSF grant no. IOS-1656825 (to S.M.N.W.).

Author information

Authors and Affiliations

Department of Psychology, Columbia University, New York, NY, USA
Jordan M. Moore & Sarah M. N. Woolley
Zuckerman Institute, Columbia University, New York, NY, USA
Jordan M. Moore & Sarah M. N. Woolley
Kavli Institute for Brain Science, Columbia University, New York, NY, USA
Sarah M. N. Woolley
Center for Integrative Animal Behavior, Columbia University, New York, NY, USA
Sarah M. N. Woolley

Authors

Jordan M. Moore
View author publications
You can also search for this author in PubMed Google Scholar
Sarah M. N. Woolley
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

J.M. designed experiments, collected data, analyzed data and wrote the manuscript. S.M.N.W. designed experiments, analyzed data and wrote the manuscript.

Corresponding author

Correspondence to Sarah M. N. Woolley.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information: Nature Neuroscience thanks Jon Sakata and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Integrated supplementary information

Supplementary Figure 1 Zebra finches and long-tailed finches learn song from conspecific and Bengalese finch tutors.

Spectrograms show songs from: a, a zebra finch tutor (ZF) and pupil (zf_ZF); b, a long-tailed finch tutor (LF) and pupil (lf_LF); and c, a Bengalese finch tutor (BF) and its cross-tutored pupils (zf_BF and lf_BF). Magnified spectrograms on the right show tutor syllable types and corresponding pupil copies. d-g, Zebra finches and long-tailed finches reproduced BF syllable acoustics with varying degrees of accuracy, and some of the differences between species were related to the similarity between BF syllables and their prototypical conspecific syllables. d, Top, Box-and-whisker plots show syllable pitch for all tutor and pupil groups. Both zf_BF and lf_BF birds sang with pitch similar to the BF tutors but different from their respective normal pupil groups. Bottom, The pitch of pupil syllable copies was strongly correlated with the pitch of their corresponding tutor models in all groups (partial correlation coefficients, 0.80 ≤ r ≤ 0.99). e, In contrast, the mean frequencies of zf_BF and lf_BF syllables were different from those of BF syllables but not from those of normal conspecific pupils, and the correlations between pupil-tutor syllable types were weaker for cross-tutored birds (zf_BF, r = 0.65; lf_BF, 0.45) than for normal birds (zf_ZF, 0.81; lf_LF, 0.94). For both f, frequency modulation and g, Wiener entropy, values were similar between all zebra finch groups and the BF tutors whereas lf_BF syllables were intermediate between BF and LF syllables, and all pupil syllable types closely matched their corresponding tutor syllables (0.65 ≤ r ≤ 0.94). For the boxplots in d-g, the measure of center is the median, box limits show the 25^th and 75^th percentiles, whiskers extend up to 1.5× the interquartile range beyond the quartiles; and circles show outliers. Comparisons between adjacent groups used ANOVAs with bird identity as a nested, random effect covariate [ZF, n = 33 syllables, 5 birds; zf_ZF, n = 196 syllables, 25 birds; zf_BF, n = 113 syllables, 11 birds; BF, n = 66 syllables, 5 birds; lf_BF, n = 103 syllables, 10 birds; lf_LF, n = 75 syllables, 12 birds; LF, 35 syllables, 5 birds], *P < 0.05; **P < 0.01; ***P < 0.001. For the regressions in d-g, models correlate the tutors’ and pupils’ syllable features after controlling for tutor identity as a categorical covariate [zf_ZF-ZF, n = 128 syllable pairs; zf_BF-BF, n = 96 pairs; lf_BF-BF, n = 70 pairs; lf_LF-LF, n = 51 pairs], all P < 0.001.

Supplementary Figure 2 Histological verification of recording sites.

Sixteen-channel electrode arrays (4×4) were painted with a carbocyanine dye (DiI or DiO) along the back of each shank and advanced through the caudal telencephalon. Electrode passes varied systematically along the rostro-caudal and medio-lateral planes, and recordings were made at non-overlapping depths to sample all auditory cortical subregions (the section shown was 1.60 mm lateral from the midline). Following the completion of physiology experiments, fixed brains were sectioned in the sagittal plane, imaged with fluorescent filters, Nissl-stained, and then imaged again in the brightfield. Electrode contact locations were reconstructed along dye-labeled tracks and overlaid on Nissl-stained sections with delineated regions. CM, caudal mesopallium; NC, caudal nidopallium; LaM, mesopallial lamina; LPS, pallial-subpallial lamina; HVC, song motor nucleus. Major projections between cortical regions are illustrated in Fig. 2.

Supplementary Figure 3 Spike identification and cluster evaluation.

a, Top, A raw voltage trace (black) shows several spikes from different units recorded on a single channel. Middle, A transformed trace (beige) that amplifies fast voltage accelerations was passed through an amplitude threshold to locate potential spikes. Bottom, Snippets of potential spikes from the raw voltage trace were saved, measured in various ways [e.g., minimum baseline (open circle) and peak (filled) amplitudes at 3 phases of the spike; width at half-height of peaks 2 and 3], and sorted using an unsupervised clustering algorithm (WaveClus). Uniformity of spike shape within clusters and separation between clusters (when more than one unit was identified per channel; >3 per channel was rare) were verified by viewing several plots. They included b, the mean (± SD) spike waveforms (red, n = 16,447 spikes; turquoise, n = 12,329 spikes; orange, n = 8244 spikes; purple, n = 3406 spikes; green, n = 1989 spikes); c, distributions of spike properties for each cluster (i.e. minimum voltage (p2), spike width, amplitude ratios between peaks); and d, spike magnitude over time to ensure the unit was held throughout the experiment. e, Top, Signal-to-noise ratio of the same unit shown in a was quantified by the standard separation index D [difference in sample means (p2 – b) divided by the geometric mean of their standard deviations]. Bottom, Histogram of D values across the entire dataset (n = 4392 units) with an arrow showing the separation of the unit above. For the boxplot, the measure of center is the median, box limits show the 25^th and 75^th percentiles, and whiskers extend up to 1.5× the interquartile range beyond the quartiles. f, Top, Spike waveform variability over time for the same five units shown in b. In general, well-isolated units had p2 SDs greater than or equal to their respective baseline SDs. Bottom, Mean (± SD) ratio of spike waveform SD relative to baseline SD across the entire dataset. g, Top, Inter-spike intervals for the same unit shown in a shows few refractory period violations. Bottom, Histogram of the percent of intervals less than 1 ms per unit across the entire dataset. Some, but relatively few, very well isolated units exhibited ISIs <1 ms.

Supplementary Figure 4 Song selectivity was consistent across birds within each group.

Distributions show spike rate selectivity of all neurons in each AC region for a, ZF versus LF song in zf_ZF (orange) and lf_LF (gray) birds; b, ZF versus BF song in zf_ZF and zf_BF (brown) birds; and c, LF versus BF song in lf_LF and lf_BF (light blue) birds. Histograms are the same as in Figs. 2d, 3a, and c. Colored stars indicate a significant difference in responses between song types within a group (repeated-measures ANOVAs with bird identity as a covariate) and are plotted on the side of the song that evoked a greater response. Black bars show the separation between distribution means, and black stars indicate a difference in selectivity between bird groups (ANOVAs with bird identity as a random effect, nested covariate). *P < 0.05, **P < 0.01, ***P < 0.001. Box-and-whisker plots show selectivity of neurons from each bird; the measure of center is the median, box limits show the 25^th and 75^th percentiles, whiskers extend up to 1.5× the interquartile range beyond the quartiles; and circles show outliers. In a, int.: n = 42, 66, 40 neurons from zf_ZF birds (top-to-bottom, respectively) and n = 40, 64, 64 from lf_LF birds; sup.: n = 82, 52, 45 (zf_ZF) and n = 0, 18, 35 (lf_LF) neurons; deep: n = 98, 80, 103 (zf_ZF) and n = 32, 99, 106 (lf_LF) neurons; sec.: n = 50, 126, 41 (zf_ZF) and n = 114, 28, 57 (lf_LF) neurons. In b, int.: n = 42, 69, 38 (zf_ZF) and n = 31, 33, 9 (zf_BF) neurons; sup.: n = 81, 52, 48 (zf_ZF) and n = 1, 4, 75 (zf_BF) neurons; deep: n = 98, 70, 102 (zf_ZF) and n = 13, 81, 42 (zf_BF) neurons; sec.: n = 48, 114, 35 (zf_ZF) and n = 40, 201, 9 (zf_BF) neurons. In c, int.: n = 44, 62, 58 (lf_LF) and n = 79, 23, 31 (lf_BF) neurons; sup.: n = 0, 17, 31 (lf_LF) and n = 13, 1, 9 (lf_BF) neurons; deep: n = 31, 91, 102 (lf_LF) and n = 35, 17, 23 (lf_BF) neurons; sec.: n = 123, 28, 57 (lf_LF) and n = 61, 61, 68 (lf_BF) neurons.

Supplementary Figure 5 Species selectivity reflects tuning for general song features and not learned auditory objects.

Selectivity metrics computed using spike rates to all syllables (x-axis) were strongly correlated with metrics computed after omitting responses to syllable types in the tutor’s repertoire. a, ZF vs. LF in zf_ZF birds (top, partial r = 0.99, n = 826 neurons) and lf_LF birds (bottom, r = 0.99, n = 657 neurons); b, ZF vs. BF in zf_ZF birds (top, r = 0.99, n = 798 neurons) and zf_BF birds (bottom, r = 0.99, n = 547 neurons); and c, LF vs. BF in lf_LF birds (top, r = 0.99, n = 643 neurons) and lf_BF birds (bottom, r = 0.98, n = 421 neurons). Regression models included bird identity as a covariate. Black dotted lines are the diagonal, colored dotted lines indicate significant selectivity (t = ±1.96). d, The impact of responses to the tutor’s song on species spike rate selectivity was small. Distributions show the difference in selectivity when the tutor’s syllable types were included versus excluded (i.e. horizontal deviations from the diagonal in a-c). While responses to the tutor species songs were greater when all syllables were used than when tutor syllable types were not (repeated measures ANOVAs with bird identity as a covariate, all P < 0.001), the differences were small. Black bars show the difference between distribution means (-0.27 ≤ mean difference (t) ≤ 0.30). Sample sizes are the same as in a-c.

Supplementary Figure 6 Spike rate reliability did not differ consistently between pupil groups.

Reliability was quantified by the Fano factor (coefficient of variation (CV) of syllable-evoked spike rates across trials; here, reliability = -CV). Plots show a difference in reliability between song types to be consistent with plots of song selectivity. a, In both zf_ZF birds (orange, n = 148, 179, 281, 217 neurons in the intermediate, superficial, deep, and secondary regions, respectively) and lf_LF birds (gray, n = 168, 53, 237, 199 neurons across AC regions), reliability was greater to ZF than to LF syllables across all brain regions. b, Same as a, but showing greater reliability to ZF than BF syllables in both zf_ZF birds (n = 149, 181, 270, 197 neurons) and zf_BF birds (brown, n = 73, 80, 136, 250 neurons). c, Same as a, but showing greater reliability to LF than BF syllables in both lf_LF birds (n = 164, 48, 224, 208 neurons) and lf_BF birds (light blue, n = 133, 23, 75, 190 neurons). The measure of center is the median, box limits show the 25^th and 75^th percentiles, whiskers extend up to 1.5× the interquartile range beyond the quartiles; and circles show outliers. Colored stars indicate a difference in reliability between songs within a pupil group (repeated-measures ANOVAs with bird identity as a covariate). Black stars indicate a difference between groups (ANOVAs with bird identity as a nested covariate). *P < 0.05, **P < 0.01, ***P < 0.001.

Supplementary Figure 7 Spike timing precision did not differ consistently across stimulus types or between pupil groups.

Precision was quantified by the correlation index (CI), which is a metric derived from shuffled autocorrelograms that indicates a neuron’s tendency to spike at the same time across trials. Plots show a difference in CI between song types to be consistent with plots of song selectivity. a, In both zf_ZF birds (orange, n = 147, 179, 281, 216 neurons in the intermediate, superficial, deep, and secondary regions, respectively) and lf_LF birds (gray, n = 168, 53, 234, 199 neurons across AC regions), spike timing precision was not different between responses to ZF and LF syllables in any region. b, Same as a, but comparing response precision to ZF versus BF syllables in zf_ZF birds (n = 148, 181, 270, 196 neurons) and zf_BF birds (n = 73, 80, 136, 249 neurons). In the deep and secondary regions of both groups, spike timing was more precise to ZF than BF syllables. There were no differences between groups in any brain region. c, Same as a, but comparing spike timing precision to LF versus BF syllables in lf_LF birds (n = 164, 48, 224, 208 neurons) and lf_BF birds (n = 133, 23, 75, 190 neurons). There were no differences between groups in any region. The measure of center is the median, box limits show the 25^th and 75^th percentiles, whiskers extend up to 1.5× the interquartile range beyond the quartiles; and circles show outliers. Colored stars indicate a difference in CI between songs within a group (repeated-measures ANOVAs with bird identity as a covariate). Black stars indicate a difference between bird groups (ANOVAs with bird identity as a nested covariate). *P < 0.05, **P < 0.01, ***P < 0.001.

Supplementary Figure 8 Neural discrimination of songs varied between song types, pupil groups, and AC regions in parallel with differences in spike rate.

An unbounded neural discrimination metric (d-prime) was computed to measure how accurately single-trial spike trains distinguished among songs of the same species (n = 5 ZF, LF, and BF songs; 10 trials per stimulus). Plots show a difference in d-prime between song types to be consistent with plots of song selectivity. a, In zf_ZF birds, single neurons in the intermediate, superficial, deep, and secondary regions (n = 113, 111, 186, 140 neurons, respectively) tended to discriminate among ZF songs better than among LF songs. Intermediate-region neurons in lf_LF birds also discriminated among ZF songs better than LF songs, but secondary-region neurons performed better to LF songs (n = 98, 33, 123, 37 neurons across AC regions). The difference between groups was significant in the deep region. b, Same as a, but comparing discrimination of ZF versus BF songs in zf_ZF birds (n = 126, 117, 202, 127 neurons) and zf_BF birds (n = 38, 49, 91, 159 neurons). Deep- and secondary-region neurons in zf_ZF birds discriminated among ZF songs better than BF songs, but there was no difference between song types in zf_BF birds. The difference between groups was significant for the deep region. c, Same as a, but comparing discrimination of LF versus BF songs in lf_LF birds (n = 135, 40, 153, 120 neurons) and lf_BF birds (n = 95, 15, 42, 122 neurons). In lf_LF birds, intermediate-region neurons discriminated among BF songs better than LF songs, but secondary-region neurons did better among LF songs. In contrast, intermediate-, deep-, and secondary-region neurons in lf_BF birds all discriminated among BF songs better than among LF songs. The difference between groups was signifcant for the deep region. For a-c, the measure of center is the median, box limits show the 25^th and 75^th percentiles, whiskers extend up to 1.5× the interquartile range beyond the quartiles; and circles show outliers. Colored stars indicate a difference in discrimination between song types within a group (repeated-measures ANOVAs with bird identity as a covariate). Black stars indicate a difference between bird groups (ANOVAs with bird identity as a nested covariate). *P < 0.05, **P < 0.01, ***P < 0.001. d, Correlations between spike rate selectivity and the difference in neural discrimination between song types in zf_ZF birds (0.25 ≤ all partial r ≤ 0.62, all P < 0.01 across AC regions) and lf_LF birds (intermediate, deep, secondary: 0.43 ≤ r ≤ 0.68, P < 0.01). e, Same as d, but showing correlations zf_ZF birds (0.23 ≤ all partial r ≤ 0.50, all P < 0.02) and zf_BF birds (superficial, deep, secondary: 0.45 ≤ r ≤ 0.54, P < 0.001). f, Same as d, but showing correlations in lf_LF birds (0.50 ≤ all partial r ≤ 0.63, all P < 0.001) and lf_BF birds (intermediate, deep, secondary: 0.33 ≤ r ≤ 0.61, P < 0.05). For d-f, regressions included bird identity as a covariate. Sample sizes are the same as in a-c.

Supplementary Figure 9 Song-evoked activity patterns of intermediate-region neurons.

a, Spectrogams of ZF (top) and BF (bottom) song segments plotted above pPSTHs (mean ± 95% C.I.; zf_ZF, n = 149 neurons; zf_BF, n = 73 neurons) and neurograms (z-scored single-neuron PSTHs; zf_ZF, n = 40 randomly selected neurons from two birds; zf_BF, n = 31 and 33 neurons for birds 1 and 2, respectively). Colored lines above pPSTHs indicate sustained differences (≥10 ms) between groups, and bar graphs to the right show the number of segments in each ZF or BF stimulus that evoked a greater pPSTH in zf_ZF birds (orange) or zf_BF birds (brown) (two-sided paired t-tests, n = 5 songs for each species, both P > 0.05). Traces to the right of neurograms show the selectivity of each respective neuron (dashed lines are t = ±1.96). b, As in a but for LF and BF songs and lf_LF birds (gray; n = 164 neurons in pPSTHs; n = 40 neurons each in neurograms) and lf_BF birds (light blue; n = 133 neurons in pPSTHs; n = 40 and 31 neurons in neurograms for birds 1 and 2, respectively). **P < 0.01. For both a and b, pPSTHs and neurograms were shifted in time by the average response latency of paired groups (zf_ZF and zf_BF, 13 ms; lf_LF and lf_BF, 16 ms) to align with the stimuli.

Supplementary Figure 10 Song-evoked activity patterns of superficial-region neurons.

a, Spectrogams of ZF (top) and BF (bottom) song segments plotted above pPSTHs (mean ± 95% C.I.; zf_ZF, n = 181 neurons; zf_BF, n = 80 neurons) and neurograms (z-scored single-neuron PSTHs; zf_ZF, n = 40 randomly selected neurons from two birds; zf_BF, n = 4 and 40 neurons for birds 1 and 2, respectively). Colored lines above pPSTHs indicate sustained differences (≥10 ms) between groups, and bar graphs to the right show the number of segments in each ZF or BF stimulus that evoked a greater pPSTH in zf_ZF birds (orange) or zf_BF birds (brown) (two-sided paired t-tests, n = 5 songs for each species, both P > 0.05). Traces to the right of neurograms show the selectivity of each respective neuron (dashed lines are t = ±1.96). b, As in a but for LF and BF songs and lf_LF birds (gray; n = 48 neurons in pPSTHs; n = 17 and 31 neurons for birds 1 and 2, respectively) and lf_BF birds (light blue; n = 23 neurons in pPSTHs; n = 13 and 9 neurons for birds 1 and 2, respectively). *P < 0.05. For both a and b, pPSTHs and neurograms were shifted in time by the average response latency of paired groups (zf_ZF and zf_BF, 17 ms; lf_LF and lf_BF, 18 ms) to align with the stimuli.

Supplementary Figure 11 Song-evoked activity patterns of secondary-region neurons.

a, Spectrogams of ZF (top) and BF (bottom) song segments plotted above pPSTHs (mean ± 95% C.I.; zf_ZF, n = 197 neurons; zf_BF, n = 250 neurons) and neurograms (z-scored individual PSTHs; zf_ZF, n = 40 randomly selected neurons from two birds; zf_BF, n = 40 neurons each). Colored lines above pPSTHs indicate sustained differences (≥10 ms) between groups, and bar graphs to the right show the number of segments in each ZF or BF stimulus that evoked a greater pPSTH in zf_ZF birds (orange) or zf_BF birds (brown) (two-sided paired t-tests, n = 5 songs for each species, *P < 0.05). Traces to the right of neurograms show the selectivity of each respective neuron (dashed lines are t = ±1.96). b, As in a but for LF and BF songs and lf_LF birds (gray; n = 208 neurons in pPSTHs; n = 40 neurons each in neurograms) and lf_BF birds (light blue; n = 190 neurons in pPSTHs; n = 40 neurons each). *P < 0.05. For both a and b, pPSTHs and neurograms were shifted in time by the average response latency of paired groups (zf_ZF and zf_BF, 19 ms; lf_LF and lf_BF, 25 ms) to align with the stimuli.

Supplementary Figure 12 Population responses were temporally precise and reliable across syllable repetitions.

a, Top, Spectrograms of different syllables from ZF and BF songs. Bottom, Overlaid mean pPSTHs from the deep-region of zf_ZF birds (orange, n = 270 neurons) and zf_BF birds (brown, n = 136 neurons) show responses to different renditions of the same syllable type (aligned to onset). Bars along the top denote sustained differences (≥10 ms) between bird groups to a single syllable and are separated vertically for different syllable occurrences. b, Same as a, but showing LF and BF syllables and deep-region pPSTHs from lf_LF birds (gray, n = 224 neurons) and lf_BF birds (light blue, n = 75 neurons). c, The similarity of population responses to different utterances of a syllable type was measured as the mean of the absolute difference between pPSTHs. The differences within bird groups (filled boxplots) were smaller than the differences between groups (open) in nearly all cases (Tukey-Kramer post hoc tests, 15/16 P < 0.03 across all four AC regions per bird group). For zf_ZF-zf_BF, n = 44 syllable types (ZF and BF combined) occurred multiple times in a stimulus; for lf_LF-lf_BF, n = 39 syllable types (LF and BF). The measure of center is the median, box limits show the 25^th and 75^th percentiles, whiskers extend up to 1.5× the interquartile range beyond the quartiles; and circles show outliers. ***P < 0.001.

Supplementary Figure 13 Frequency tuning does not predict group-level variation in song selectivity.

a, Frequency power spectra (median ± IQR) of the ZF (orange), LF (gray), and BF (blue) song stimuli (n = 5 songs each). b, Frequency response areas and curves (FRC, mean across levels) from two neurons with tuning features that could confer song selectivity. A neuron with a low best frequency (Bf) and/or narrow bandwidth (Bw) could be expected to respond selectively to tonal LF syllables, while a neuron with a high Bf and/or wide Bw could be expected to respond selectively to broadband ZF and BF syllables. c, Best frequency did not explain variation in song selectivity. The Bf distributions of pupil groups were not different in any of the three comparisons (ANOVAs with bird identity as a nested covariation, all P ≥ 0.28) and most partial correlations were not significant: top, ZF versus LF syllables (zf_ZF: P = 0.99, n = 778 neurons; lf_LF: P = 0.90, n = 597 neurons); middle, ZF versus BF (zf_ZF: r = 0.08, P = 0.04, n = 751 neurons; zf_BF: P = 0.70, n = 523 neurons); and bottom, LF versus BF (lf_LF: P = 0.29, n = 589 neurons; lf_BF: P = 0.63, n = 369 neurons). d, Tuning bandwidth did not explain group differences in song selectivity. While Bw was positively correlated with selectivity for ZF versus LF syllables (top, zf_ZF: partial r = 0.21, P = 5.7×10^-9; lf_LF: partial r = 0.09, P = 0.029) and negatively correlated with selectivity for LF versus BF syllables (bottom, lf_LF: partial r = -0.20, P = 1.0×10^-6; lf_BF: partial r = -0.23, P = 9.7×10^-6), there were no significant differences in the distributions of Bw between groups (ANOVAs with bird identity as a nested covariate, all P ≥ 0.11). Sample sizes are the same as in c. All regressions included bird identity as a categorical covariate.

Supplementary Figure 14 Modulation tuning predicts song selectivity in single neurons.

a, Action potential waveforms (mean ± SD) from a single neuron in each bird group (zf_ZF: n = 100,258 spikes; lf_LF: n = 42,641 spikes; zf_BF: n = 22,645 spikes; lf_BF: n = 17,503 spikes). The zf_ZF and lf_LF units are the same as shown in Fig. 2. b, Each neuron’s evoked spike rates to all syllables in five ZF, LF, or BF songs. Circles show the mean spike rates to each syllable across 10 trials (ZF: n = 13, 20, 17, 22, 22 syllables; LF: n = 21, 33, 22, 13, 14 syllables; BF: n = 30, 39, 15, 27, 28 syllables), solid black lines show the mean spike rates across all syllables per species, and dotted lines show spontaneous spike rates. c, Normalized modulation response areas (z-scored spike rates) plotted alongside a matrix of the difference between two species’ log-transformed song ripples (as in Fig. 4c). An index measuring the extent of song-tuning overlap was computed by multiplying the corresponding elements of the two matrices and then adding the products. Positive values indicated tuning for ripples that are mostly in the ‘positive’ song (e.g., ZF- LF ripples in top example), and negative values indicate tuning for ripples that are mostly in the ‘negative’ song. d, Correlations (± 95% CIs) between spike rate selectivity and song-tuning overlap were significant in all cases: zf_ZF (ZF-LF, partial r = 0.43, n = 735 neurons), lf_LF (ZF-LF, partial r = 0.49, n = 507 neurons), zf_ZF (ZF-BF, partial r = 0.44, n = 735 neurons), zf_BF (ZF-BF, partial r = 0.66, n = 497 neurons), lf_LF (LF-BF, partial r = 0.50, n = 507 neurons), lf_BF (LF-BF, partial r = 0.38, n = 365 neurons). Black arrows indicate the neurons shown in a-c. All regressions included bird identity as a categorical covariate. All P < 0.001.

Supplementary Figure 15 Neurons with similar song selectivity have similar modulation tuning regardless of species identity, tutoring experience, or brain region.

a, Mean normalized (z-scored) modulation response areas from each region of zf_ZF (top) and lf_LF (bottom) birds. Plots along the left column show the average tuning of neurons that were selective for ZF over LF songs; plots along the right column show the average tuning of neurons that were selective for LF over ZF songs; and plots along the middle column show the average tuning of neurons that were not selective for one song type over the other. Sample sizes indicate number of neurons included in each plot. Black contour lines indicate the primary ripples composing each song type (see Fig. 4c). b, Same as a, but for zf_ZF and zf_BF neurons selective for ZF (left) or BF (right) songs. c, Same as a, but for lf_LF and lf_BF neurons selective for LF (left) or BF (right) songs.

Supplementary information

Supplementary Figs. 1–15 and Supplementary Table 1.

Reporting Summary

Rights and permissions

Reprints and permissions

About this article

Cite this article

Moore, J.M., Woolley, S.M.N. Emergent tuning for learned vocalizations in auditory cortex. Nat Neurosci 22, 1469–1476 (2019). https://doi.org/10.1038/s41593-019-0458-4

Download citation

Received: 11 September 2018
Accepted: 24 June 2019
Published: 12 August 2019
Issue Date: September 2019
DOI: https://doi.org/10.1038/s41593-019-0458-4

This article is cited by

Spontaneous emergence of rudimentary music detectors in deep neural networks
- Gwangsu Kim
- Dong-Kyum Kim
- Hawoong Jeong
Nature Communications (2024)
A perspective on neuroethology: what the past teaches us about the future of neuroethology
- M. Jerome Beetz
Journal of Comparative Physiology A (2024)
Partial inactivation of songbird auditory cortex impairs both tempo and pitch discrimination
- Gunsoo Kim
- Miguel Sánchez-Valpuesta
- Mimi H. Kao
Molecular Brain (2023)
Machine learning and statistical classification of birdsong link vocal acoustic features with phylogeny
- Moises Rivera
- Jacob A. Edwards
- Sarah M. N. Woolley
Scientific Reports (2023)
Tracing development of song memory with fMRI in zebra finches after a second tutoring experience
- Payal Arya
- Stela P. Petkova
- Sharon M. H. Gobes
Communications Biology (2023)