Emergent tuning for learned vocalizations in auditory cortex

Abstract

Vocal learners use early social experience to develop auditory skills specialized for communication. However, it is unknown where in the auditory pathway neural responses become selective for vocalizations or how the underlying encoding mechanisms change with experience. We used a vocal tutoring manipulation in two species of songbird to reveal that tuning for conspecific song arises within the primary auditory cortical circuit. Neurons in the deep region of primary auditory cortex responded more to conspecific songs than to other species’ songs and more to species-typical spectrotemporal modulations, but neurons in the intermediate (thalamorecipient) region did not. Moreover, birds that learned song from another species exhibited parallel shifts in selectivity and tuning toward the tutor species’ songs in the deep but not the intermediate region. Our results locate a region in the auditory processing hierarchy where an experience-dependent coding mechanism aligns auditory responses with the output of a learned vocal motor behavior.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Fig. 1: Juvenile songbirds learn song from conspecific or heterospecific tutors.
Fig. 2: Selectivity for conspecific song emerges in the primary AC.
Fig. 3: Song selectivity and population response dynamics are experience dependent.
Fig. 4: Tuning for the spectrotemporal modulations in learned song emerges in parallel with song selectivity.
Fig. 5: Neural population response dynamics to song reflect tuning for spectrotemporal modulations.
Fig. 6: Neurons that respond selectively to same species’ songs have highly similar modulation tuning regardless of species identity or tutoring experience.

Data availability

The data that support the findings of this study are available from the corresponding author upon request.

Code availability

The code used to analyze data in this study is available from the corresponding author upon request.

References

  1. 1.

    Bradbury, J. W. & Vehrencamp, S. L. Principles of Animal Communication 2nd edn (Sinauer Associates, 2011).

  2. 2.

    Ord, T. J. & Stamps, J. A. Species identity cues in animal communication. Am. Nat. 174, 585–593 (2009).

  3. 3.

    Dooling, R. J., Brown, S. D., Klump, G. M. & Okanoya, K. Auditory perception of conspecific and heterospecific vocalizations in birds: evidence for special processes. J. Comp. Psychol. 106, 20–28 (1992).

  4. 4.

    Woolley, S. M. N., Fremouw, T. E., Hsu, A. & Theunissen, F. E. Tuning for spectro-temporal modulations as a mechanism for auditory discrimination of natural sounds. Nat. Neurosci. 8, 1371–1379 (2005).

  5. 5.

    Saffran, J. R., Werker, J. F. & Werner, L. A. in Handbook of Child Development (eds Seigler, R. & Kuhn, D.) 58–108 (John Wiley & Sons, 2006).

  6. 6.

    Poremba, A., Bigelow, J. & Rossi, B. Processing of communication sounds: contributions of learning, memory, and experience. Hear. Res. 305, 31–44 (2013).

  7. 7.

    Doupe, A. J. & Kuhl, P. K. Birdsong and human speech: common themes and mechanisms. Annu. Rev. Neurosci. 22, 567–631 (1999).

  8. 8.

    Konishi, M. The role of auditory feedback in birdsong. Ann. NY Acad. Sci. 1016, 463–475 (2004).

  9. 9.

    Werker, J. F. & Tees, R. C. Cross-language speech perception: evidence for perceptual reorganization during the first year of life. Infant Behav. Dev. 7, 49–63 (1984).

  10. 10.

    Kuhl, P. K., Williams, K. A., Lacerda, F., Stevens, K. N. & Lindblom, B. Linguistic experience alters phonetic perception in infants by 6 months of age. Science 255, 606–608 (1992).

  11. 11.

    Johnson, J. S. & Newport, E. L. Critical period effects in second language learning: the influence of maturational state on the acquisition of English as a second language. Cogn. Psychol. 21, 60–99 (1989).

  12. 12.

    Riebel, K. Song and female mate choice in zebra finches: a review. Adv. Study Behav. 40, 197–238 (2009).

  13. 13.

    Immelmann, K. in Bird Vocalizations (ed Hinde, R. A.) 61–74 (Cambridge Univ. Press, 1969).

  14. 14.

    Belin, P., Zatorre, R. J., Lafaille, P., Ahad, P. & Pike, B. Voice-selective areas in human auditory cortex. Nature 403, 309–312 (2000).

  15. 15.

    Näätänen, R. et al. Language-specific phoneme representations revealed by electric and magnetic brain responses. Nature 385, 432–434 (1997).

  16. 16.

    Mesgarani, N., Cheung, C., Johnson, K. & Chang, E. F. Phonetic feature encoding in human superior temporal gyrus. Science 343, 1006–1010 (2014).

  17. 17.

    Winkler, I. et al. Brain responses reveal the learning of foreign language phonemes. Psychophysiology 36, 638–642 (1999).

  18. 18.

    Butler, A. B., Reiner, A. & Karten, H. J. Evolution of the amniote pallium and the origins of mammalian neocortex. Ann. NY Acad. Sci. 1225, 14–27 (2011).

  19. 19.

    Wang, Y., Brzozowska-Prechtl, A. & Karten, H. J. Laminar and columnar auditory cortex in avian brain. Proc. Natl Acad. Sci. USA 107, 12676–12681 (2010).

  20. 20.

    Calabrese, A. & Woolley, S. M. N. Coding principles of the canonical cortical microcircuit in the avian brain. Proc. Natl Acad. Sci. USA 112, 3517–3522 (2015).

  21. 21.

    Grace, J. A., Amin, N., Singh, N. C. & Theunissen, F. E. Selectivity for conspecific song in the zebra finch auditory forebrain. J. Neurophysiol. 89, 472–487 (2003).

  22. 22.

    Amin, N., Gastpar, M. & Theunissen, F. E. Selective and efficient neural coding of communication signals depends on early acoustic and social environment. PLoS One 8, e61417 (2013).

  23. 23.

    Eales, L. A. Do zebra finch males that have been raised by another species still tend to select a conspecific song tutor? Anim. Behav. 35, 1347–1355 (1987).

  24. 24.

    Gomes, A. C. R., Sorenson, M. D. & Cardoso, G. C. Speciation is associated with changing ornamentation rather than stronger sexual selection. Evolution 70, 2823–2838 (2016).

  25. 25.

    Fecteau, S., Armony, J. L., Joanette, Y. & Belin, P. Is voice processing species-specific in human auditory cortex? An fMRI study. Neuroimage 23, 840–848 (2004).

  26. 26.

    Remedios, R., Logothetis, N. K. & Kayser, C. An auditory region in the primate insular cortex responding preferentially to vocal communication sounds. J. Neurosci. 29, 1034–1045 (2009).

  27. 27.

    Wang, X., Merzenich, M. M., Beitel, R. & Schreiner, C. E. Representation of a species-specific vocalization in the primary auditory cortex of the common marmoset: temporal and spectral characteristics. J. Neurophysiol. 74, 2685–2706 (1995).

  28. 28.

    Perrodin, C., Kayser, C., Logothetis, N. K. & Petkov, C. I. Voice cells in the primate temporal lobe. Curr. Biol. 21, 1408–1415 (2011).

  29. 29.

    Carruthers, I. M., Natan, R. G. & Geffen, M. N. Encoding of ultrasonic vocalizations in the auditory cortex. J. Neurophysiol. 109, 1912–1927 (2013).

  30. 30.

    Zhang, L. I., Bao, S. & Merzenich, M. M. Persistent and specific influences of early acoustic environments on primary auditory cortex. Nat. Neurosci. 4, 1123–1130 (2001).

  31. 31.

    Chang, E. F. & Merzenich, M. M. Environmental noise retards auditory cortical development. Science 300, 498–502 (2003).

  32. 32.

    Sarro, E. C. & Sanes, D. H. The cost and benefit of juvenile training on adult perceptual skill. J. Neurosci. 31, 5383–5391 (2011).

  33. 33.

    Sanes, D. H. & Woolley, S. M. N. A behavioral framework to guide research on central auditory development and plasticity. Neuron 72, 912–929 (2011).

  34. 34.

    Chi, T., Ru, P. & Shamma, S. A. Multiresolution spectrotemporal analysis of complex sounds. J. Acoust. Soc. Am. 118, 887–906 (2005).

  35. 35.

    Shamma, S. A., Versnel, H. & Kowalski, N. Ripple analysis in ferret primary auditory cortex. I. Response characteristics of single units to sinusoidally rippled spectra. Audit. Neurosci. 1, 233–254 (1995).

  36. 36.

    Hullett, P. W., Hamilton, L. S., Mesgarani, N., Schreiner, C. E. & Chang, E. F. Human superior temporal gyrus organization of spectrotemporal modulation tuning derived from speech stimuli. J. Neurosci. 36, 2014–2026 (2016).

  37. 37.

    Bizley, J. K., Walker, K. M., Nodal, F. R., King, A. J. & Schnupp, J. W. Auditory cortex represents both pitch judgments and the corresponding acoustic cues. Curr. Biol. 23, 620–625 (2013).

  38. 38.

    Fukushima, M., Saunders, R. C., Leopold, D. A., Mishkin, M. & Averbeck, B. B. Differential coding of conspecific vocalizations in the ventral auditory cortical stream. J. Neurosci. 34, 4665–4676 (2014).

  39. 39.

    Harris, K. D. & Thiele, A. Cortical state and attention. Nat. Rev. Neurosci. 12, 509–523 (2011).

  40. 40.

    Kato, H. K., Gillet, S. N. & Isaacson, J. S. Flexible sensory representations in auditory cortex driven by behavioral relevance. Neuron 88, 1027–1039 (2015).

  41. 41.

    Caras, M. L. & Sanes, D. H. Top-down modulation of sensory cortex gates perceptual learning. Proc. Natl Acad. Sci. USA 114, 9972–9977 (2017).

  42. 42.

    Schneider, D. M. & Woolley, S. M. N. Discrimination of communication vocalizations by single neurons and groups of neurons in the auditory midbrain. J. Neurophysiol. 103, 3248–3265 (2010).

  43. 43.

    Mesgarani, N., David, S. V., Fritz, J. B. & Shamma, S. A. Influence of context and behavior on stimulus reconstruction from neural activity in primary auditory cortex. J. Neurophysiol. 102, 3329–3339 (2009).

  44. 44.

    Sun, W. & Barbour, D. L. Rate, not selectivity, determines neuronal population coding accuracy in auditory cortex. PLoS Biol. 15, e2002459 (2017).

  45. 45.

    Razak, K. A., Richardson, M. D. & Fuzessery, Z. M. Experience is required for the maintenance and refinement of FM sweep selectivity in the developing auditory cortex. Proc. Natl Acad. Sci. USA 105, 4465–4470 (2008).

  46. 46.

    Schreiner, C. E. & Polley, D. B. Auditory map plasticity: diversity in causes and consequences. Curr. Opin. Neurobiol. 24, 143–156 (2014).

  47. 47.

    Han, Y. K., Köver, H., Insanally, M. N., Semerdjian, J. H. & Bao, S. Early experience impairs perceptual discrimination. Nat. Neurosci. 10, 1191–1197 (2007).

  48. 48.

    Caras, M. L. & Sanes, D. H. Sustained perceptual deficits from transient sensory deprivation. J. Neurosci. 35, 10831–10842 (2015).

  49. 49.

    Green, D. B., Mattingly, M. M., Ye, Y., Gay, J. D. & Rosen, M. J. Brief stimulus exposure fully remediates temporal processing deficits induced by early hearing loss. J. Neurosci. 37, 7759–7771 (2017).

  50. 50.

    Cousillas, H. et al. Experience-dependent neuronal specialization and functional organization in the central auditory area of a songbird. Eur. J. Neurosci. 19, 3343–3352 (2004).

  51. 51.

    Mandelblat-Cerf, Y. & Fee, M. S. An automated procedure for evaluating song imitation. PLoS One 9, e96484 (2014).

  52. 52.

    Fortune, E. S. & Margoliash, D. Cytoarchitectonic organization and morphology of cells of the field L complex in male zebra finches (Taenopygia guttata). J. Comp. Neurol. 325, 388–404 (1992).

  53. 53.

    Quiroga, R. Q., Nadasdy, Z. & Ben-Shaul, Y. Unsupervised spike detection and sorting with wavelets and superparamagnetic clustering. Neural Comput. 16, 1661–1687 (2004).

  54. 54.

    Wild, J., Prekopcsak, Z., Sieger, T., Novak, D. & Jech, R. Performance comparison of extracellular spike sorting algorithms for single-channel recordings. J. Neurosci. Methods 203, 369–376 (2012).

  55. 55.

    Joris, P. X., Louage, D. H., Cardoen, L. & van der Heijden, M. Correlation index: a new metric to quantify temporal coding. Hear. Res. 216-217, 19–30 (2006).

Download references

Acknowledgements

We thank J. Schumacher, A. Calabrese, D. Schneider, H. Brew, S. Rosis and N. So for suggestions on design and analysis. We are grateful to N. Mesgarani, M. Long, N. So and E. Perez for comments on previous versions of the manuscript. Funding was provided by NIH grant no. DC009810 (to S.M.N.W.) and NSF grant no. IOS-1656825 (to S.M.N.W.).

Author information

J.M. designed experiments, collected data, analyzed data and wrote the manuscript. S.M.N.W. designed experiments, analyzed data and wrote the manuscript.

Correspondence to Sarah M. N. Woolley.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information: Nature Neuroscience thanks Jon Sakata and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Integrated supplementary information

Supplementary Figure 1 Zebra finches and long-tailed finches learn song from conspecific and Bengalese finch tutors.

Spectrograms show songs from: a, a zebra finch tutor (ZF) and pupil (zfZF); b, a long-tailed finch tutor (LF) and pupil (lfLF); and c, a Bengalese finch tutor (BF) and its cross-tutored pupils (zfBF and lfBF). Magnified spectrograms on the right show tutor syllable types and corresponding pupil copies. d-g, Zebra finches and long-tailed finches reproduced BF syllable acoustics with varying degrees of accuracy, and some of the differences between species were related to the similarity between BF syllables and their prototypical conspecific syllables. d, Top, Box-and-whisker plots show syllable pitch for all tutor and pupil groups. Both zfBF and lfBF birds sang with pitch similar to the BF tutors but different from their respective normal pupil groups. Bottom, The pitch of pupil syllable copies was strongly correlated with the pitch of their corresponding tutor models in all groups (partial correlation coefficients, 0.80 ≤ r ≤ 0.99). e, In contrast, the mean frequencies of zfBF and lfBF syllables were different from those of BF syllables but not from those of normal conspecific pupils, and the correlations between pupil-tutor syllable types were weaker for cross-tutored birds (zfBF, r = 0.65; lfBF, 0.45) than for normal birds (zfZF, 0.81; lfLF, 0.94). For both f, frequency modulation and g, Wiener entropy, values were similar between all zebra finch groups and the BF tutors whereas lfBF syllables were intermediate between BF and LF syllables, and all pupil syllable types closely matched their corresponding tutor syllables (0.65 ≤ r ≤ 0.94). For the boxplots in d-g, the measure of center is the median, box limits show the 25th and 75th percentiles, whiskers extend up to 1.5× the interquartile range beyond the quartiles; and circles show outliers. Comparisons between adjacent groups used ANOVAs with bird identity as a nested, random effect covariate [ZF, n = 33 syllables, 5 birds; zfZF, n = 196 syllables, 25 birds; zfBF, n = 113 syllables, 11 birds; BF, n = 66 syllables, 5 birds; lfBF, n = 103 syllables, 10 birds; lfLF, n = 75 syllables, 12 birds; LF, 35 syllables, 5 birds], *P < 0.05; **P < 0.01; ***P < 0.001. For the regressions in d-g, models correlate the tutors’ and pupils’ syllable features after controlling for tutor identity as a categorical covariate [zfZF-ZF, n = 128 syllable pairs; zfBF-BF, n = 96 pairs; lfBF-BF, n = 70 pairs; lfLF-LF, n = 51 pairs], all P < 0.001.

Supplementary Figure 2 Histological verification of recording sites.

Sixteen-channel electrode arrays (4×4) were painted with a carbocyanine dye (DiI or DiO) along the back of each shank and advanced through the caudal telencephalon. Electrode passes varied systematically along the rostro-caudal and medio-lateral planes, and recordings were made at non-overlapping depths to sample all auditory cortical subregions (the section shown was 1.60 mm lateral from the midline). Following the completion of physiology experiments, fixed brains were sectioned in the sagittal plane, imaged with fluorescent filters, Nissl-stained, and then imaged again in the brightfield. Electrode contact locations were reconstructed along dye-labeled tracks and overlaid on Nissl-stained sections with delineated regions. CM, caudal mesopallium; NC, caudal nidopallium; LaM, mesopallial lamina; LPS, pallial-subpallial lamina; HVC, song motor nucleus. Major projections between cortical regions are illustrated in Fig. 2.

Supplementary Figure 3 Spike identification and cluster evaluation.

a, Top, A raw voltage trace (black) shows several spikes from different units recorded on a single channel. Middle, A transformed trace (beige) that amplifies fast voltage accelerations was passed through an amplitude threshold to locate potential spikes. Bottom, Snippets of potential spikes from the raw voltage trace were saved, measured in various ways [e.g., minimum baseline (open circle) and peak (filled) amplitudes at 3 phases of the spike; width at half-height of peaks 2 and 3], and sorted using an unsupervised clustering algorithm (WaveClus). Uniformity of spike shape within clusters and separation between clusters (when more than one unit was identified per channel; >3 per channel was rare) were verified by viewing several plots. They included b, the mean (± SD) spike waveforms (red, n = 16,447 spikes; turquoise, n = 12,329 spikes; orange, n = 8244 spikes; purple, n = 3406 spikes; green, n = 1989 spikes); c, distributions of spike properties for each cluster (i.e. minimum voltage (p2), spike width, amplitude ratios between peaks); and d, spike magnitude over time to ensure the unit was held throughout the experiment. e, Top, Signal-to-noise ratio of the same unit shown in a was quantified by the standard separation index D [difference in sample means (p2 – b) divided by the geometric mean of their standard deviations]. Bottom, Histogram of D values across the entire dataset (n = 4392 units) with an arrow showing the separation of the unit above. For the boxplot, the measure of center is the median, box limits show the 25th and 75th percentiles, and whiskers extend up to 1.5× the interquartile range beyond the quartiles. f, Top, Spike waveform variability over time for the same five units shown in b. In general, well-isolated units had p2 SDs greater than or equal to their respective baseline SDs. Bottom, Mean (± SD) ratio of spike waveform SD relative to baseline SD across the entire dataset. g, Top, Inter-spike intervals for the same unit shown in a shows few refractory period violations. Bottom, Histogram of the percent of intervals less than 1 ms per unit across the entire dataset. Some, but relatively few, very well isolated units exhibited ISIs <1 ms.

Supplementary Figure 4 Song selectivity was consistent across birds within each group.

Distributions show spike rate selectivity of all neurons in each AC region for a, ZF versus LF song in zfZF (orange) and lfLF (gray) birds; b, ZF versus BF song in zfZF and zfBF (brown) birds; and c, LF versus BF song in lfLF and lfBF (light blue) birds. Histograms are the same as in Figs. 2d, 3a, and c. Colored stars indicate a significant difference in responses between song types within a group (repeated-measures ANOVAs with bird identity as a covariate) and are plotted on the side of the song that evoked a greater response. Black bars show the separation between distribution means, and black stars indicate a difference in selectivity between bird groups (ANOVAs with bird identity as a random effect, nested covariate). *P < 0.05, **P < 0.01, ***P < 0.001. Box-and-whisker plots show selectivity of neurons from each bird; the measure of center is the median, box limits show the 25th and 75th percentiles, whiskers extend up to 1.5× the interquartile range beyond the quartiles; and circles show outliers. In a, int.: n = 42, 66, 40 neurons from zfZF birds (top-to-bottom, respectively) and n = 40, 64, 64 from lfLF birds; sup.: n = 82, 52, 45 (zfZF) and n = 0, 18, 35 (lfLF) neurons; deep: n = 98, 80, 103 (zfZF) and n = 32, 99, 106 (lfLF) neurons; sec.: n = 50, 126, 41 (zfZF) and n = 114, 28, 57 (lfLF) neurons. In b, int.: n = 42, 69, 38 (zfZF) and n = 31, 33, 9 (zfBF) neurons; sup.: n = 81, 52, 48 (zfZF) and n = 1, 4, 75 (zfBF) neurons; deep: n = 98, 70, 102 (zfZF) and n = 13, 81, 42 (zfBF) neurons; sec.: n = 48, 114, 35 (zfZF) and n = 40, 201, 9 (zfBF) neurons. In c, int.: n = 44, 62, 58 (lfLF) and n = 79, 23, 31 (lfBF) neurons; sup.: n = 0, 17, 31 (lfLF) and n = 13, 1, 9 (lfBF) neurons; deep: n = 31, 91, 102 (lfLF) and n = 35, 17, 23 (lfBF) neurons; sec.: n = 123, 28, 57 (lfLF) and n = 61, 61, 68 (lfBF) neurons.

Supplementary Figure 5 Species selectivity reflects tuning for general song features and not learned auditory objects.

Selectivity metrics computed using spike rates to all syllables (x-axis) were strongly correlated with metrics computed after omitting responses to syllable types in the tutor’s repertoire. a, ZF vs. LF in zfZF birds (top, partial r = 0.99, n = 826 neurons) and lfLF birds (bottom, r = 0.99, n = 657 neurons); b, ZF vs. BF in zfZF birds (top, r = 0.99, n = 798 neurons) and zfBF birds (bottom, r = 0.99, n = 547 neurons); and c, LF vs. BF in lfLF birds (top, r = 0.99, n = 643 neurons) and lfBF birds (bottom, r = 0.98, n = 421 neurons). Regression models included bird identity as a covariate. Black dotted lines are the diagonal, colored dotted lines indicate significant selectivity (t = ±1.96). d, The impact of responses to the tutor’s song on species spike rate selectivity was small. Distributions show the difference in selectivity when the tutor’s syllable types were included versus excluded (i.e. horizontal deviations from the diagonal in a-c). While responses to the tutor species songs were greater when all syllables were used than when tutor syllable types were not (repeated measures ANOVAs with bird identity as a covariate, all P < 0.001), the differences were small. Black bars show the difference between distribution means (-0.27 ≤ mean difference (t) ≤ 0.30). Sample sizes are the same as in a-c.

Supplementary Figure 6 Spike rate reliability did not differ consistently between pupil groups.

Reliability was quantified by the Fano factor (coefficient of variation (CV) of syllable-evoked spike rates across trials; here, reliability = -CV). Plots show a difference in reliability between song types to be consistent with plots of song selectivity. a, In both zfZF birds (orange, n = 148, 179, 281, 217 neurons in the intermediate, superficial, deep, and secondary regions, respectively) and lfLF birds (gray, n = 168, 53, 237, 199 neurons across AC regions), reliability was greater to ZF than to LF syllables across all brain regions. b, Same as a, but showing greater reliability to ZF than BF syllables in both zfZF birds (n = 149, 181, 270, 197 neurons) and zfBF birds (brown, n = 73, 80, 136, 250 neurons). c, Same as a, but showing greater reliability to LF than BF syllables in both lfLF birds (n = 164, 48, 224, 208 neurons) and lfBF birds (light blue, n = 133, 23, 75, 190 neurons). The measure of center is the median, box limits show the 25th and 75th percentiles, whiskers extend up to 1.5× the interquartile range beyond the quartiles; and circles show outliers. Colored stars indicate a difference in reliability between songs within a pupil group (repeated-measures ANOVAs with bird identity as a covariate). Black stars indicate a difference between groups (ANOVAs with bird identity as a nested covariate). *P < 0.05, **P < 0.01, ***P < 0.001.

Supplementary Figure 7 Spike timing precision did not differ consistently across stimulus types or between pupil groups.

Precision was quantified by the correlation index (CI), which is a metric derived from shuffled autocorrelograms that indicates a neuron’s tendency to spike at the same time across trials. Plots show a difference in CI between song types to be consistent with plots of song selectivity. a, In both zfZF birds (orange, n = 147, 179, 281, 216 neurons in the intermediate, superficial, deep, and secondary regions, respectively) and lfLF birds (gray, n = 168, 53, 234, 199 neurons across AC regions), spike timing precision was not different between responses to ZF and LF syllables in any region. b, Same as a, but comparing response precision to ZF versus BF syllables in zfZF birds (n = 148, 181, 270, 196 neurons) and zfBF birds (n = 73, 80, 136, 249 neurons). In the deep and secondary regions of both groups, spike timing was more precise to ZF than BF syllables. There were no differences between groups in any brain region. c, Same as a, but comparing spike timing precision to LF versus BF syllables in lfLF birds (n = 164, 48, 224, 208 neurons) and lfBF birds (n = 133, 23, 75, 190 neurons). There were no differences between groups in any region. The measure of center is the median, box limits show the 25th and 75th percentiles, whiskers extend up to 1.5× the interquartile range beyond the quartiles; and circles show outliers. Colored stars indicate a difference in CI between songs within a group (repeated-measures ANOVAs with bird identity as a covariate). Black stars indicate a difference between bird groups (ANOVAs with bird identity as a nested covariate). *P < 0.05, **P < 0.01, ***P < 0.001.

Supplementary Figure 8 Neural discrimination of songs varied between song types, pupil groups, and AC regions in parallel with differences in spike rate.

An unbounded neural discrimination metric (d-prime) was computed to measure how accurately single-trial spike trains distinguished among songs of the same species (n = 5 ZF, LF, and BF songs; 10 trials per stimulus). Plots show a difference in d-prime between song types to be consistent with plots of song selectivity. a, In zfZF birds, single neurons in the intermediate, superficial, deep, and secondary regions (n = 113, 111, 186, 140 neurons, respectively) tended to discriminate among ZF songs better than among LF songs. Intermediate-region neurons in lfLF birds also discriminated among ZF songs better than LF songs, but secondary-region neurons performed better to LF songs (n = 98, 33, 123, 37 neurons across AC regions). The difference between groups was significant in the deep region. b, Same as a, but comparing discrimination of ZF versus BF songs in zfZF birds (n = 126, 117, 202, 127 neurons) and zfBF birds (n = 38, 49, 91, 159 neurons). Deep- and secondary-region neurons in zfZF birds discriminated among ZF songs better than BF songs, but there was no difference between song types in zfBF birds. The difference between groups was significant for the deep region. c, Same as a, but comparing discrimination of LF versus BF songs in lfLF birds (n = 135, 40, 153, 120 neurons) and lfBF birds (n = 95, 15, 42, 122 neurons). In lfLF birds, intermediate-region neurons discriminated among BF songs better than LF songs, but secondary-region neurons did better among LF songs. In contrast, intermediate-, deep-, and secondary-region neurons in lfBF birds all discriminated among BF songs better than among LF songs. The difference between groups was signifcant for the deep region. For a-c, the measure of center is the median, box limits show the 25th and 75th percentiles, whiskers extend up to 1.5× the interquartile range beyond the quartiles; and circles show outliers. Colored stars indicate a difference in discrimination between song types within a group (repeated-measures ANOVAs with bird identity as a covariate). Black stars indicate a difference between bird groups (ANOVAs with bird identity as a nested covariate). *P < 0.05, **P < 0.01, ***P < 0.001. d, Correlations between spike rate selectivity and the difference in neural discrimination between song types in zfZF birds (0.25 ≤ all partial r ≤ 0.62, all P < 0.01 across AC regions) and lfLF birds (intermediate, deep, secondary: 0.43 ≤ r ≤ 0.68, P < 0.01). e, Same as d, but showing correlations zfZF birds (0.23 ≤ all partial r ≤ 0.50, all P < 0.02) and zfBF birds (superficial, deep, secondary: 0.45 ≤ r ≤ 0.54, P < 0.001). f, Same as d, but showing correlations in lfLF birds (0.50 ≤ all partial r ≤ 0.63, all P < 0.001) and lfBF birds (intermediate, deep, secondary: 0.33 ≤ r ≤ 0.61, P < 0.05). For d-f, regressions included bird identity as a covariate. Sample sizes are the same as in a-c.

Supplementary Figure 9 Song-evoked activity patterns of intermediate-region neurons.

a, Spectrogams of ZF (top) and BF (bottom) song segments plotted above pPSTHs (mean ± 95% C.I.; zfZF, n = 149 neurons; zfBF, n = 73 neurons) and neurograms (z-scored single-neuron PSTHs; zfZF, n = 40 randomly selected neurons from two birds; zfBF, n = 31 and 33 neurons for birds 1 and 2, respectively). Colored lines above pPSTHs indicate sustained differences (≥10 ms) between groups, and bar graphs to the right show the number of segments in each ZF or BF stimulus that evoked a greater pPSTH in zfZF birds (orange) or zfBF birds (brown) (two-sided paired t-tests, n = 5 songs for each species, both P > 0.05). Traces to the right of neurograms show the selectivity of each respective neuron (dashed lines are t = ±1.96). b, As in a but for LF and BF songs and lfLF birds (gray; n = 164 neurons in pPSTHs; n = 40 neurons each in neurograms) and lfBF birds (light blue; n = 133 neurons in pPSTHs; n = 40 and 31 neurons in neurograms for birds 1 and 2, respectively). **P < 0.01. For both a and b, pPSTHs and neurograms were shifted in time by the average response latency of paired groups (zfZF and zfBF, 13 ms; lfLF and lfBF, 16 ms) to align with the stimuli.

Supplementary Figure 10 Song-evoked activity patterns of superficial-region neurons.

a, Spectrogams of ZF (top) and BF (bottom) song segments plotted above pPSTHs (mean ± 95% C.I.; zfZF, n = 181 neurons; zfBF, n = 80 neurons) and neurograms (z-scored single-neuron PSTHs; zfZF, n = 40 randomly selected neurons from two birds; zfBF, n = 4 and 40 neurons for birds 1 and 2, respectively). Colored lines above pPSTHs indicate sustained differences (≥10 ms) between groups, and bar graphs to the right show the number of segments in each ZF or BF stimulus that evoked a greater pPSTH in zfZF birds (orange) or zfBF birds (brown) (two-sided paired t-tests, n = 5 songs for each species, both P > 0.05). Traces to the right of neurograms show the selectivity of each respective neuron (dashed lines are t = ±1.96). b, As in a but for LF and BF songs and lfLF birds (gray; n = 48 neurons in pPSTHs; n = 17 and 31 neurons for birds 1 and 2, respectively) and lfBF birds (light blue; n = 23 neurons in pPSTHs; n = 13 and 9 neurons for birds 1 and 2, respectively). *P < 0.05. For both a and b, pPSTHs and neurograms were shifted in time by the average response latency of paired groups (zfZF and zfBF, 17 ms; lfLF and lfBF, 18 ms) to align with the stimuli.

Supplementary Figure 11 Song-evoked activity patterns of secondary-region neurons.

a, Spectrogams of ZF (top) and BF (bottom) song segments plotted above pPSTHs (mean ± 95% C.I.; zfZF, n = 197 neurons; zfBF, n = 250 neurons) and neurograms (z-scored individual PSTHs; zfZF, n = 40 randomly selected neurons from two birds; zfBF, n = 40 neurons each). Colored lines above pPSTHs indicate sustained differences (≥10 ms) between groups, and bar graphs to the right show the number of segments in each ZF or BF stimulus that evoked a greater pPSTH in zfZF birds (orange) or zfBF birds (brown) (two-sided paired t-tests, n = 5 songs for each species, *P < 0.05). Traces to the right of neurograms show the selectivity of each respective neuron (dashed lines are t = ±1.96). b, As in a but for LF and BF songs and lfLF birds (gray; n = 208 neurons in pPSTHs; n = 40 neurons each in neurograms) and lfBF birds (light blue; n = 190 neurons in pPSTHs; n = 40 neurons each). *P < 0.05. For both a and b, pPSTHs and neurograms were shifted in time by the average response latency of paired groups (zfZF and zfBF, 19 ms; lfLF and lfBF, 25 ms) to align with the stimuli.

Supplementary Figure 12 Population responses were temporally precise and reliable across syllable repetitions.

a, Top, Spectrograms of different syllables from ZF and BF songs. Bottom, Overlaid mean pPSTHs from the deep-region of zfZF birds (orange, n = 270 neurons) and zfBF birds (brown, n = 136 neurons) show responses to different renditions of the same syllable type (aligned to onset). Bars along the top denote sustained differences (≥10 ms) between bird groups to a single syllable and are separated vertically for different syllable occurrences. b, Same as a, but showing LF and BF syllables and deep-region pPSTHs from lfLF birds (gray, n = 224 neurons) and lfBF birds (light blue, n = 75 neurons). c, The similarity of population responses to different utterances of a syllable type was measured as the mean of the absolute difference between pPSTHs. The differences within bird groups (filled boxplots) were smaller than the differences between groups (open) in nearly all cases (Tukey-Kramer post hoc tests, 15/16 P < 0.03 across all four AC regions per bird group). For zfZF-zfBF, n = 44 syllable types (ZF and BF combined) occurred multiple times in a stimulus; for lfLF-lfBF, n = 39 syllable types (LF and BF). The measure of center is the median, box limits show the 25th and 75th percentiles, whiskers extend up to 1.5× the interquartile range beyond the quartiles; and circles show outliers. ***P < 0.001.

Supplementary Figure 13 Frequency tuning does not predict group-level variation in song selectivity.

a, Frequency power spectra (median ± IQR) of the ZF (orange), LF (gray), and BF (blue) song stimuli (n = 5 songs each). b, Frequency response areas and curves (FRC, mean across levels) from two neurons with tuning features that could confer song selectivity. A neuron with a low best frequency (Bf) and/or narrow bandwidth (Bw) could be expected to respond selectively to tonal LF syllables, while a neuron with a high Bf and/or wide Bw could be expected to respond selectively to broadband ZF and BF syllables. c, Best frequency did not explain variation in song selectivity. The Bf distributions of pupil groups were not different in any of the three comparisons (ANOVAs with bird identity as a nested covariation, all P ≥ 0.28) and most partial correlations were not significant: top, ZF versus LF syllables (zfZF: P = 0.99, n = 778 neurons; lfLF: P = 0.90, n = 597 neurons); middle, ZF versus BF (zfZF: r = 0.08, P = 0.04, n = 751 neurons; zfBF: P = 0.70, n = 523 neurons); and bottom, LF versus BF (lfLF: P = 0.29, n = 589 neurons; lfBF: P = 0.63, n = 369 neurons). d, Tuning bandwidth did not explain group differences in song selectivity. While Bw was positively correlated with selectivity for ZF versus LF syllables (top, zfZF: partial r = 0.21, P = 5.7×10-9; lfLF: partial r = 0.09, P = 0.029) and negatively correlated with selectivity for LF versus BF syllables (bottom, lfLF: partial r = -0.20, P = 1.0×10-6; lfBF: partial r = -0.23, P = 9.7×10-6), there were no significant differences in the distributions of Bw between groups (ANOVAs with bird identity as a nested covariate, all P ≥ 0.11). Sample sizes are the same as in c. All regressions included bird identity as a categorical covariate.

Supplementary Figure 14 Modulation tuning predicts song selectivity in single neurons.

a, Action potential waveforms (mean ± SD) from a single neuron in each bird group (zfZF: n = 100,258 spikes; lfLF: n = 42,641 spikes; zfBF: n = 22,645 spikes; lfBF: n = 17,503 spikes). The zfZF and lfLF units are the same as shown in Fig. 2. b, Each neuron’s evoked spike rates to all syllables in five ZF, LF, or BF songs. Circles show the mean spike rates to each syllable across 10 trials (ZF: n = 13, 20, 17, 22, 22 syllables; LF: n = 21, 33, 22, 13, 14 syllables; BF: n = 30, 39, 15, 27, 28 syllables), solid black lines show the mean spike rates across all syllables per species, and dotted lines show spontaneous spike rates. c, Normalized modulation response areas (z-scored spike rates) plotted alongside a matrix of the difference between two species’ log-transformed song ripples (as in Fig. 4c). An index measuring the extent of song-tuning overlap was computed by multiplying the corresponding elements of the two matrices and then adding the products. Positive values indicated tuning for ripples that are mostly in the ‘positive’ song (e.g., ZF- LF ripples in top example), and negative values indicate tuning for ripples that are mostly in the ‘negative’ song. d, Correlations (± 95% CIs) between spike rate selectivity and song-tuning overlap were significant in all cases: zfZF (ZF-LF, partial r = 0.43, n = 735 neurons), lfLF (ZF-LF, partial r = 0.49, n = 507 neurons), zfZF (ZF-BF, partial r = 0.44, n = 735 neurons), zfBF (ZF-BF, partial r = 0.66, n = 497 neurons), lfLF (LF-BF, partial r = 0.50, n = 507 neurons), lfBF (LF-BF, partial r = 0.38, n = 365 neurons). Black arrows indicate the neurons shown in a-c. All regressions included bird identity as a categorical covariate. All P < 0.001.

Supplementary Figure 15 Neurons with similar song selectivity have similar modulation tuning regardless of species identity, tutoring experience, or brain region.

a, Mean normalized (z-scored) modulation response areas from each region of zfZF (top) and lfLF (bottom) birds. Plots along the left column show the average tuning of neurons that were selective for ZF over LF songs; plots along the right column show the average tuning of neurons that were selective for LF over ZF songs; and plots along the middle column show the average tuning of neurons that were not selective for one song type over the other. Sample sizes indicate number of neurons included in each plot. Black contour lines indicate the primary ripples composing each song type (see Fig. 4c). b, Same as a, but for zfZF and zfBF neurons selective for ZF (left) or BF (right) songs. c, Same as a, but for lfLF and lfBF neurons selective for LF (left) or BF (right) songs.

Supplementary information

Supplementary Figs. 1–15 and Supplementary Table 1.

Reporting Summary

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark