Hidden neural states underlie canary song syntax

Abstract

Coordinated skills such as speech or dance involve sequences of actions that follow syntactic rules in which transitions between elements depend on the identities and order of past actions. Canary songs consist of repeated syllables called phrases, and the ordering of these phrases follows long-range rules1 in which the choice of what to sing depends on the song structure many seconds prior. The neural substrates that support these long-range correlations are unknown. Here, using miniature head-mounted microscopes and cell-type-specific genetic tools, we observed neural activity in the premotor nucleus HVC2,3,4 as canaries explored various phrase sequences in their repertoire. We identified neurons that encode past transitions, extending over four phrases and spanning up to four seconds and forty syllables. These neurons preferentially encode past actions rather than future actions, can reflect more than one song history, and are active mostly during the rare phrases that involve history-dependent transitions in song. These findings demonstrate that the dynamics of HVC include ‘hidden states’ that are not reflected in ongoing behaviour but rather carry information about prior actions. These states provide a possible substrate for the control of syntax transitions governed by long-range rules.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Fig. 1: Long-range syntax rules in canary song.
Fig. 2: HVC PN activity reflects long-range phrase sequence information.
Fig. 3: Sequence-correlated HVC neurons reflect within-phrase timing.
Fig. 4: Sequence-correlated HVC neurons reflect preceding context up to four phrases apart and show enhanced activity during context-dependent transitions.

Data availability

Data can be found at figshare (https://figshare.com/) with https://doi.org/10.6084/m9.figshare.12006657Source data are provided with this paper.

Code availability

All custom-made code in this manuscript is publicly available in Github repositories (https://github.com/gardner-lab/FinchScope; https://github.com/gardner-lab/video-capture; https://github.com/gardner-lab/FinchScope/tree/master/Analysis%20Pipeline/extractmedia; https://github.com/yardencsGitHub/BirdSongBout/tree/master/helpers/GUI; https://github.com/yardencsGitHub/tweetynet; and https://github.com/jmarkow/pst).

References

  1. 1.

    Markowitz, J. E., Ivie, E., Kligler, L. & Gardner, T. J. Long-range order in canary song. PLOS Comput. Biol. 9, e1003052 (2013).

    ADS  CAS  PubMed  PubMed Central  Google Scholar 

  2. 2.

    Nottebohm, F., Stokes, T. M. & Leonard, C. M. Central control of song in the canary, Serinus canarius. J. Comp. Neurol. 165, 457–486 (1976).

    CAS  PubMed  Google Scholar 

  3. 3.

    Hahnloser, R. H. R., Kozhevnikov, A. A. & Fee, M. S. An ultra-sparse code underlies the generation of neural sequences in a songbird. Nature 419, 65–70 (2002).

    ADS  CAS  PubMed  Google Scholar 

  4. 4.

    Long, M. A. & Fee, M. S. Using temperature to analyse temporal dynamics in the songbird motor pathway. Nature 456, 189–194 (2008).

    ADS  CAS  PubMed  PubMed Central  Google Scholar 

  5. 5.

    Rokni, U., Richardson, A. G., Bizzi, E. & Seung, H. S. Motor learning with unstable neural representations. Neuron 54, 653–666 (2007).

    CAS  PubMed  Google Scholar 

  6. 6.

    Todorov, E. Optimality principles in sensorimotor control. Nat. Neurosci. 7, 907–915 (2004).

    CAS  PubMed  PubMed Central  Google Scholar 

  7. 7.

    Wolpert, D. M. Computational approaches to motor control. Trends Cogn. Sci. 1, 209–216 (1997).

    CAS  PubMed  Google Scholar 

  8. 8.

    Leonardo, A. Degenerate coding in neural systems. J. Comp. Physiol. A Neuroethol. Sens. Neural Behav. Physiol. 191, 995–1010 (2005).

    PubMed  Google Scholar 

  9. 9.

    Jin, D. Z. & Kozhevnikov, A. A. A compact statistical model of the song syntax in Bengalese finch. PLOS Comput. Biol. 7, e1001108 (2011).

    ADS  CAS  PubMed  PubMed Central  Google Scholar 

  10. 10.

    Ohbayashi, M., Ohki, K. & Miyashita, Y. Conversion of working memory to motor sequence in the monkey premotor cortex. Science 301, 233–236 (2003).

    ADS  CAS  PubMed  Google Scholar 

  11. 11.

    Goldman-Rakic, P. S. Cellular basis of working memory. Neuron 14, 477–485 (1995).

    CAS  PubMed  Google Scholar 

  12. 12.

    Svoboda, K. & Li, N. Neural mechanisms of movement planning: motor cortex and beyond. Curr. Opin. Neurobiol. 49, 33–41 (2018).

    CAS  PubMed  Google Scholar 

  13. 13.

    Thompson, J. A., Costabile, J. D. & Felsen, G. Mesencephalic representations of recent experience influence decision making. eLife 5, e16572 (2016).

    PubMed  PubMed Central  Google Scholar 

  14. 14.

    Pastalkova, E., Itskov, V., Amarasingham, A. & Buzsáki, G. Internally generated cell assembly sequences in the rat hippocampus. Science 321, 1322–1327 (2008).

    ADS  CAS  PubMed  PubMed Central  Google Scholar 

  15. 15.

    Churchland, M. M., Afshar, A. & Shenoy, K. V. A central source of movement variability. Neuron 52, 1085–1096 (2006).

    CAS  PubMed  PubMed Central  Google Scholar 

  16. 16.

    Mushiake, H., Saito, N., Sakamoto, K., Itoyama, Y. & Tanji, J. Activity in the lateral prefrontal cortex reflects multiple steps of future events in action plans. Neuron 50, 631–641 (2006).

    CAS  PubMed  Google Scholar 

  17. 17.

    Shima, K. & Tanji, J. Neuronal activity in the supplementary and presupplementary motor areas for temporal organization of multiple movements. J. Neurophysiol. 84, 2148–2160 (2000).

    CAS  PubMed  Google Scholar 

  18. 18.

    Fujimoto, H., Hasegawa, T. & Watanabe, D. Neural coding of syntactic structure in learned vocalizations in the songbird. J. Neurosci. 31, 10023–10033 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  19. 19.

    Hamaguchi, K., Tanaka, M. & Mooney, R. A distributed recurrent network contributes to temporally precise vocalizations. Neuron 91, 680–693 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  20. 20.

    Ashmore, R. C., Wild, J. M. & Schmidt, M. F. Brainstem and forebrain contributions to the generation of learned motor behaviors for song. J. Neurosci. 25, 8543–8554 (2005).

    CAS  PubMed  PubMed Central  Google Scholar 

  21. 21.

    Alonso, R. G., Trevisan, M. A., Amador, A., Goller, F. & Mindlin, G. B. A circular model for song motor control in Serinus canaria. Front. Comput. Neurosci. 9, 41 (2015).

    PubMed  PubMed Central  Google Scholar 

  22. 22.

    Goldberg, J. H. & Fee, M. S. Singing-related neural activity distinguishes four classes of putative striatal neurons in the songbird basal ganglia. J. Neurophysiol. 103, 2002–2014 (2010).

    PubMed  PubMed Central  Google Scholar 

  23. 23.

    Jin, D. Z. Generating variable birdsong syllable sequences with branching chain networks in avian premotor nucleus HVC. Phys. Rev. E 80, 051902 (2009).

    ADS  Google Scholar 

  24. 24.

    Hinton, G. et al. Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Signal Process. Mag. 29, 82–97 (2012).

    ADS  Google Scholar 

  25. 25.

    Chen, T.-W. et al. Ultrasensitive fluorescent proteins for imaging neuronal activity. Nature 499, 295–300 (2013).

    ADS  CAS  PubMed  PubMed Central  Google Scholar 

  26. 26.

    Bouchard, K. E. & Brainard, M. S. Auditory-induced neural dynamics in sensory–motor circuitry predict learned temporal and sequential statistics of birdsong. Proc. Natl Acad. Sci. USA 113, 9641–9646 (2016).

    CAS  PubMed  Google Scholar 

  27. 27.

    Wittenbach, J. D., Bouchard, K. E., Brainard, M. S. & Jin, D. Z. An adapting auditory-motor feedback loop can contribute to generating vocal repetition. PLOS Comput. Biol. 11, e1004471 (2015).

    ADS  PubMed  PubMed Central  Google Scholar 

  28. 28.

    Dave, A. S., Yu, A. C. & Margoliash, D. Behavioral state modulation of auditory activity in a vocal motor system. Science 282, 2250–2254 (1998).

    ADS  CAS  PubMed  Google Scholar 

  29. 29.

    Cardin, J. A. & Schmidt, M. F. Noradrenergic inputs mediate state dependence of auditory responses in the avian song system. J. Neurosci. 24, 7745–7753 (2004).

    CAS  PubMed  PubMed Central  Google Scholar 

  30. 30.

    Glaze, C. M. & Troyer, T. W. Development of temporal structure in zebra finch song. J. Neurophysiol. 109, 1025–1035 (2013).

    PubMed  Google Scholar 

  31. 31.

    Castelino, C. B. & Schmidt, M. F. What birdsong can teach us about the central noradrenergic system. J. Chem. Neuroanat. 39, 96–111 (2010).

    CAS  PubMed  Google Scholar 

  32. 32.

    Prather, J. F., Peters, S., Nowicki, S. & Mooney, R. Precise auditory–vocal mirroring in neurons for learned vocal communication. Nature 451, 305–310 (2008).

    ADS  CAS  PubMed  Google Scholar 

  33. 33.

    Okubo, T. S., Mackevicius, E. L., Payne, H. L., Lynch, G. F. & Fee, M. S. Growth and splitting of neural sequences in songbird vocal development. Nature 528, 352–357 (2015).

    ADS  CAS  PubMed  PubMed Central  Google Scholar 

  34. 34.

    Zucker, R. S. & Regehr, W. G. Short-term synaptic plasticity. Annu. Rev. Physiol. 64, 355–405 (2002).

    CAS  PubMed  Google Scholar 

  35. 35.

    Iacobucci, G. J. & Popescu, G. K. NMDA receptors: linking physiological output to biophysical operation. Nat. Rev. Neurosci. 18, 236–249 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  36. 36.

    Nagel, K., Kim, G., McLendon, H. & Doupe, A. A bird brain’s view of auditory processing and perception. Hear. Res. 273, 123–133 (2011).

    PubMed  Google Scholar 

  37. 37.

    Fiete, I. R., Senn, W., Wang, C. Z. H. & Hahnloser, R. H. R. Spike-time-dependent plasticity and heterosynaptic competition organize networks to produce long scale-free sequences of neural activity. Neuron 65, 563–576 (2010).

    CAS  PubMed  Google Scholar 

  38. 38.

    Abeles, M. Corticonics: Neural Circuits of the Cerebral Cortex (Cambridge Univ. Press, 1991).

  39. 39.

    Cannon, J., Kopell, N., Gardner, T. & Markowitz, J. Neural sequence generation using spatiotemporal patterns of inhibition. PLOS Comput. Biol. 11, e1004581 (2015).

    ADS  PubMed  PubMed Central  Google Scholar 

  40. 40.

    Hamaguchi, K. & Mooney, R. Recurrent interactions between the input and output of a songbird cortico-basal ganglia pathway are implicated in vocal sequence variability. J. Neurosci. 32, 11671–11687 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  41. 41.

    Graves, A., Mohamed, A. & Hinton, G. Speech recognition with deep recurrent neural networks. 2013 IEEE Intl Conf. Acoustics, Speech and Signal Processing 6645–6649 (2013).

  42. 42.

    Yamashita, Y. & Tani, J. Emergence of functional hierarchy in a multiple timescale neural network model: a humanoid robot experiment. PLOS Comput. Biol. 4, e1000220 (2008).

  43. 43.

    Santoro, A. et al. in Advances in Neural Information Processing Systems 31 (eds Bengio, S. et al.) 7310–7321 (Curran Associates, 2018).

  44. 44.

    Chorowski, J. K., Bahdanau, D., Serdyuk, D., Cho, K. & Bengio, Y. in Advances in Neural Information Processing Systems 28 (eds Cortes, C. et al.) 577–585 (Curran Associates, 2015).

  45. 45.

    Stokes, T. M., Leonard, C. M. & Nottebohm, F. The telencephalon, diencephalon, and mesencephalon of the canary, Serinus canaria, in stereotaxic coordinates. J. Comp. Neurol. 156, 337–374 (1974).

    CAS  PubMed  Google Scholar 

  46. 46.

    Liberti, W. A., III et al. Unstable neurons underlie a stable learned behavior. Nat. Neurosci. 19, 1665–1671 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  47. 47.

    Wild, J. M., Williams, M. N., Howie, G. J. & Mooney, R. Calcium-binding proteins define interneurons in HVC of the zebra finch (Taeniopygia guttata). J. Comp. Neurol. 483, 76–90 (2005).

    CAS  PubMed  Google Scholar 

  48. 48.

    Wohlgemuth, M. J., Sober, S. J. & Brainard, M. S. Linked control of syllable sequence and phonology in birdsong. J. Neurosci. 30, 12936–12949 (2010).

    CAS  PubMed  PubMed Central  Google Scholar 

  49. 49.

    Zhou, P. et al. Efficient and accurate extraction of in vivo calcium signals from microendoscopic video data. eLife 7, e28728 (2018).

    PubMed  PubMed Central  Google Scholar 

  50. 50.

    Howell, D. C. Statistical Methods for Psychology (Cengage Learning, 2009).

Download references

Acknowledgements

This study was supported by NIH grants R01NS089679, R01NS104925, R24NS098536 (T.J.G.) and R24HL123828, U01TR001810 (D.N.K.) We thank J. Markowitz, I. Davison, and J. Gavornik for discussions and comments on this manuscript, and Nvidia Corporation for a technology grant (Y.C.).

Author information

Affiliations

Authors

Contributions

Y.C. and T.J.G. conceived and designed the study. W.A.L.III designed miniaturized microscopes and tether commutators and consulted on surgical procedures. L.N.P. created the video acquisition software. D.C.L. and D.N.K. produced lentivirus. Y.C. and J.S. designed surgical procedures. Y.C., J.S., and D.S. performed animal surgeries. Y.C. and D.P.L. built the experimental setup. Y.C. and J.S. gathered the data. Y.C. and D.S. performed histology and immunohistochemistry. Y.C. designed and wrote the machine-learning audio segmentation and annotation algorithm. Y.C. analysed the data. Y.C., W.A.L.III, L.N.P., and T.J.G. wrote the manuscript.

Corresponding authors

Correspondence to Yarden Cohen or Timothy J. Gardner.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information Nature thanks Jesse Goldberg and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data figures and tables

Extended Data Fig. 1 Canary song annotation and sequence statistics.

a, Architecture of syllable segmentation and annotation machine learning algorithm. (i) A spectrogram is fed into the algorithm as a 2D matrix in segments of 1 s. (ii) Convolutional and max-pooling layers learn local spectral and temporal filters. (iii) Bidirectional recurrent LSTM layer learns temporal sequencing features. (iv) Projection onto syllable classes assigns a probability for each 2.7-ms time bin and syllable. b, After manual proofreading (see Methods), a support vector machine classifier was used to assess the pairwise confusion between all syllable classes of bird 1 (see Methods). The test set confusion matrix (right) and its histogram (left) show that in rare cases the error exceeded 1% and at most reached 6%. As the higher values occurred only in phrases with 10 s of syllables, this metric guarantees that most of the syllables in every phrase cannot be confused as belonging to another syllable class. Accordingly, the possibility of making a mistake in identifying a phrase type is negligible. c, Number of phrases per song for the three birds used in this study. d, Song durations for the three birds. e, Mean syllable durations for 85 syllable classes from three birds. Red arrow marks the duration below which all trill types have more than ten repetitions on average. f, Relation between phrase class mean duration (x axis) and standard deviation (y axis). Syllable classes (dots) of three birds are coloured according to bird number. Dashed line marks 450 ms (upper limit for the decay time constant of GCaMP6f). g, Range of mean number of syllables per phrase (y axis) for all syllable types with mean duration shorter than the x-axis value. Red line is the median, light grey marks the 25% and 75% quantiles and dark grey marks the 5% and 95% quantiles (blue line marks the number of syllable types contributing to these statistics). The red arrow matches the arrow in e. h, Cumulative histogram of trill phrase durations. i, All complex phrase transitions with second-order or higher dependence on song history context (for birds 1 and 2). For each phrase type that precedes a complex transition, the context dependence is visualized by a PST (see Methods). Transition outcome probabilities are marked by pie charts at the centre of each node. The song context (phrase sequence) that leads to the transition is marked by concentric circles, the innermost being the phrase type that preceded the transition. Nodes are connected to indicate the sequences in which they are added in the search for longer Markov chains that describe context dependence (for example, i–iii for first- to third-order Markov chains). Grey arrows indicate additional incoming links that are omitted for simplicity. Source Data

Extended Data Fig. 2 Examples of canary song phrase sequences, rare inter-phrase gaps, and aberrant syllables.

a, Additional spectrograms of phrase sequences (colours above the spectrograms indicate phrase identity) that lead to a repeating pair of phrases (pink and yellow). b, Examples of flexible phrase sequencing comprising pitch changes (from bird 3). c, Examples of phrase transitions with a pitch change from bird 2. df, Phrase sequences showing changes in spectral and temporal parameters. d, Bird 1 changes from up sweep (purple) to down sweep (dark red) through intermediate phrases of intermediate acoustic structure. e, Bird 1 shows a change in inter-syllable gaps. f, Bird 2 shows changes in pitch sweep rate. g, Top and bottom sonograms compare the same phrase transitions where the inter-phrase gap varies. h, i, The top sonogram includes a rare vocalization at the beginning of the second phrase (highlighted) that, in i, resembles the onset of an orange phrase type.

Extended Data Fig. 3 An example in which the context-dependence of syllable acoustics before complex transitions is too small for clear distinction.

a, Same as Fig. 1b. A summary of all phrase sequences that contain a common transition reveals that the choice of what to sing after the pink phrase depends on the phrases that were produced earlier. Lines represent phrase identity and duration. Song sequences are stacked (vertical axis) sorted by the identity of the first phrase, the identity of the last phrase, and then the duration of the centre phrases. b, The discriminability (d′, x axis) measures the acoustic distance between pairs of syllable classes in units of the within-class standard deviation (see Methods). Bars show the histogram across all pairs of syllables identified by human observers (see Methods), corresponding to about 99% or more identification success (Extended Data Fig. 1b). The pink ticks mark the d′ values for six within-class comparison of the main four contexts in a. The orange tick marks the d′ for another context comparison in a different syllable that precedes a complex transition for this bird. c, The pairwise comparison of distributions matching the pink ticks in b. Each inset shows overlays of two distributions marked by contours at the 0.1 and 0.5 values of the peak and coloured according to context in a. The distributions are projected onto the two leading principle components of the acoustic features (see Methods, in the space defined by eight acoustic features48). While some of these distributions are statistically distinct, they allow for only about 70% context identification success in the most distinct case.

Extended Data Fig. 4 Calcium indicator is expressed exclusively in HVC excitatory neurons and imaged in annotated ROIs.

a, Sagittal slice of HVC showing GCaMP-expressing PNs (experiment repeated in five birds with similar results). b, We observed no overlap between transduced GCaMP6f-expressing neurons and neurons stained for the inhibitory neurons markers calretinin (CR), calbindin, and parvalbumin (calretinin stain shown, staining experiment repeated six times for each marker with similar results). ce, Examples of daily ROI annotation in three birds (1–3). Coloured circles mark different ROIs, manually annotated on maximum fluorescence projection images on an exemplary day (see Methods). f, Maximum fluorescence images (from bird 1; see Methods) revealing fluorescence sources, including sparsely active cells, in the imaging window across multiple days.

Extended Data Fig. 5 Syllable and phrase-sequence-correlated ROIs from three birds.

a, Sonograms above rasters from four ROIs from three birds. White ticks indicate phrase onsets. The fluorescent calcium indicator is able to resolve individual long syllables. b, Top, average maximum fluorescence images during the pink phrase in Fig. 2d (compare the two most common contexts in orthogonal colours (red and cyan)). Scale bar, is 50 μm. Bottom, difference of the overlaid images. ROI outlined in green. c, (i) One-way ANOVA (F, P, η2 and its 95% CI) tests the effect of contexts (x axis, second preceding phrase type in n = 41 sequences) on the signal (y axis) during the target phrase (marked by star) in Fig. 2d. Lines, boxes, whiskers, and plus symbols show the median, first and third quartiles, full range, and outliers. (ii–iv) ANOVA tests carried out using the residuals from the signal after removing the cumulative linear dependence on the duration of the target phrase, the relative timing of onset and offset edges of two fixed phrases, and the absolute onset time of the target phrase in each rendition. Colours correspond to phrases in Fig. 2d. d, Fractions of daily annotated ROIs showing sequence correlation in all three birds. Each ROI can be counted only once per order. This estimate includes sparsely active ROIs. ej, Activity during a target phrase (marked by Σ) is strongly related to non-adjacent phrase identities (empty lozenges in colour-coded phrase sequence). Songs are arranged by the phrase sequence context (left or right colour patches for past and future phrase types, respectively). White ticks indicate phrase onsets. Box plots and contrast images as defined in b, c. n = 31, 16, 23, 23, 16 and 30 songs contribute to ej, respectively. e, f, Similar to main Fig. 2d, (Δf/f0)denoised from ROIs with second-order upstream sequence (colour coded) from two more birds. g, Third-order upstream relation. h, i, Second-order downstream relations. j, First-order downstream relation from another bird. Source Data

Extended Data Fig. 6 Durations and onset times of phrases also correlate with their sequence, but cannot fully account for HVC activity.

a, (Δf/f0)denoised signal traces (ROI 18, bird 3) during one phrase type (red) arranged by phrase duration. Coloured barcode annotates the final phrase in the sequence. b, The signal correlates to the red phrase’s duration (r (95% CI), P: Two-sided Pearson’s test for n = 32 songs). Colours match barcode in a. c, Sonograms of two phrase sequences. dg, ROI signals during n = 36 sequences containing the last two phrases in c have various relations to the duration of the middle (purple) phrase (middle; scatter plots as in b, dashed lines indicate significant correlations) and the identity of the first phrase (right; colours, one-way ANOVA (F, P, η2 (95% CI)) tests the effect on the signal Σ. Whiskers, boxes, and lines show full range, first and third quartiles, and medians, respectively). d, Signal correlation with phrase duration is completely entangled with the signal’s sequence preference and does not apply in separate preceding contexts (red, P > 0.5). e, Signal correlation with phrase duration is influenced by the signal’s sequence preference but also exists in the preferred sequence context separately (red). f, Signal duration correlation is observed within each single preceding context separately, but the correlation reduces across all songs. g, Similar to a, but the signal is in the second phrase, not the third. h, Distributions of one-way ANOVA P values (y axis; whiskers, boxes, and red lines show full range, first and third quartiles, and medians, respectively) relating phrase identity and signal for adjacent phrases (n = 279 independent first-order tests, left) and non-adjacent phrases (n = 119 independent second- or higher-order tests, right). Tests were also done on residuals of signals, after discounting the following variables: variance explained by the target phrase duration, the timing of all phrase edges in the test sequence, and the time-in-song (x axis, effects accumulated left to right by multivariate linear regression; see Methods). Coloured dashed lines mark P = 0.05 and 0.1. i, Effect size (η2 denotes fraction of variance accounted for by the signals’ context dependence) of past (red) and future (blue) one-way ANOVA tests for first-order (left, N = 279 tests) and second- or higher-order (right, n = 119) correlations. The difference in the mean value (μ) is tested using one-sided bootstrap shuffles (P values, see Methods). Source Data

Extended Data Fig. 7 Signal shape and onset time of sequence-correlated HVC neuron activity reflect within-phrase timing.

a, Simulation of calcium indicator (GCaMP6f) fluorescence corresponding to syllable-locked spike bursts in HVC PNs. Syllable-locked spike bursts are convolved with the indicator’s kernel (see Methods) to estimate the expected signal when the number of spikes per burst is constant (left), ramps up (middle), or ramps down (right) linearly with the syllable number. The simulation assumes one burst per syllable in time spacing (x axis) that matches long canary syllables (400–500ms), medium-length syllables (100 ms) and short syllables (50 ms). b, Complementing Fig. 3a, average context-sensitive activity in phrases with long syllables reveals syllable-locked peaks aligned to phrase onsets (left) or offsets (right, same row order as left) that change in magnitude across the phrase. c, Signal shape and onset timing have properties of within-phrase timing codes. Example raw Δf/f0 signals (y axis, 0.1 marked by vertical bar) of four ROIs aligned to the onset of specific phrase types (green line). Sonograms show the repeating syllables. Red lines and blue box plots show the median, range, and quartiles of the phrase offset timing. The signal shapes resemble the expected fluorescence of the calcium indicator elicited by syllable-locked ramping (sketches, top three) or constant activity (bottom). d, Left, barcodes show the fraction of signal onsets found in the preceding transition, within the phrase, and in the following transition (T→P→T, see Methods). Rows correspond to the phrases in Fig. 3a. Right, rows show the average signal state occupancy estimated from HMMs fitted to the single-trial data used for Fig. 3a. The resulting traces are time-warped to fixed phrase edges (white lines). e, Single-trial data from Fig. 3a aligned to phrase onsets (left) and offsets (right) and averaged in real time. The resulting traces are ordered by peak location (separately in left and right rasters).

Extended Data Fig. 8 Context-sensitive signals aggregate in complex transitions and preferentially encode past transitions.

a, Distribution of signal integrals (y axis; whiskers show full range, boxes show first and third quartiles, and lines show medians) for ROIs in Fig. 4a. Text label is colour coded by phrase type in i–iv. F numbers, P values, and η2 (95% CI) for one-way ANOVA relating history (x axis) and signal (y axis) in n = 15 song sequences. b, ROIs in a retain their song-context bias for songs that terminate at end of the third phrase rather than continuing. Box plots repeat the ANOVA tests in a for n = 16 songs in which the last phrase is replaced by the end of the song. cf, Dark grey slices indicate the fraction of correlations that occur in complex behavioural transitions. c, d, Data from Fig. 4c separated into the two birds. e, f, The fraction in c, d expected by the null hypothesis of correlations distributing by the frequency of each phrase type among Nphrases phrases in the dataset. g, In sequence-correlated ROIs, multi-way ANOVA is used to separate the effects of the preceding and following phrase types on the signal (see Methods). Pie chart shows the percentage of sequence-correlated ROIs that were significantly influenced by the past, future, or both phrase identities among n = 336 significant ANOVA tests. h, Restricting analysis to complex transitions, more ROIs correlated with the preceding phrase type (blue) than with the following one (red). This is true in both naive signal values (left, n = 185 tests) and after we removed dependencies on phrase durations and time-in-song (right, n = 185). One-sided binomial z-test: *proportion difference 0.33 ± 0.09, Z = 6.45, P = 5.5 × 10−11; ‡proportion difference 0.19 ± 0.09, Z = 4.05, P = 2 × 10−5. i, Restricting the analysis to phrase types that are not in complex transitions (n = 136 ANOVA tests) reveals more ROIs correlated with the future phrase type, but the difference is not significant (left, right, n.a.: one-sided binomial z-test, P = 0.14, 0.11). j, Fig. 4a showed maximum projection images, calculated with denoised videos (see Methods). The algorithm CNMF-E49 involves estimating the source ROI shapes, de-convolving spike times and estimating the background noise. Here, recreating the maximum projection images with the original fluorescence videos shows the background as well, but the preceding-context-sensitive neurons remain the same. Namely, the same ROI footprints annotated in i–iv show the colour bias (cyan or red) that indicates coding of the past phrase with the same colour. Source Data

Extended Data Fig. 9 ROIs that reflect several preceding song contexts.

a, b, ROIs that are active in multiple preceding contexts. (Δf/f0)denoised traces are aligned to a specific phrase onset, arranged by identity of the preceding phrase (colour barcode). White ticks indicate phrase onsets. Box plot shows distributions of (Δf/f0)denoised integrals (y axis, summation in the phrase marked by star) for various song contexts (x axis). F number, P value, and effect size (η2 (95% CI)) show the significance of separation by song context (one-way ANOVA). Asterisks mark contexts that lead to larger mean activity compared to another context (Tukey’s multiple comparisons; n = 41 songs and P = 0.01,7.5 × 10−6, 5.6 × 10−5 in a; n = 19, P = 8.8 × 10−7, 8.15 × 10−8 in b). Average maximum projection images (see Methods) during the aligned phrase compare the song contexts that lead to significantly higher activity with the other contexts in orthogonal colours (cyan and red for high and low activity, respectively). Scale bar, 50 μm. ce, Neurons with similar context preference to those in a and b on adjacent days. Tukey’s multiple comparisons: n = 44, P = 0.001, 4.08 × 10−6, 1.3 × 10−6 in c; n = 45, P = 0.0016, 2.85 × 10−6 in d; n = 30, P = 0.0002, 0.0001 in e. f, Fraction of ROIs with selectivity for one context (purple) or multiple contexts (red) identified using Tukey’s post hoc multiple comparisons (see Methods). Grey slices (n.a.) mark context-sensitive ROIs for which the post hoc analysis did not isolate a specific context with a larger mean signal. Top (bottom) pie shows selectivity for first (second) preceding phrases. Source Data

Extended Data Fig. 10 HVC neurons can be tuned to complementary preceding contexts.

a, Four jointly recorded ROIs exhibit complementary context selectivity. Colour bars indicate phrase identities preceding and following a fixed phrase (pink). For each ROI (rasters), (Δf/f0)denoised traces are aligned to the onset of the pink phrase (x axis) arranged by the identity of the preceding phrase, by the identity of the following phrase, and finally by the duration of the pink phrase. b, For the example in a, normalized mutual information between the identity of past (P) and future (F) phrase types is significantly smaller than the information held by the network states about the past and future contexts (left bars; N is the activity of the four ROIs). Dots, bars, and red lines mark bootstrap assessment shuffles, their means, and the 95% level of the mean in shuffled data (see Methods). *Difference is 0.09 ± 0.03, Z = 4.3, P = 7.3 × 10−6; **difference is 0.26 ± 0.02, Z = 8.9, P < 1 × 10−15, bootstrapped one-sided z-test. c, Signal integrals from the four ROIs in a are plotted for each song (dots, n = 54 songs) on the three most informative principle components. Dots are coloured by the identity of the preceding phrase. Clustering accuracy measures the ‘leave-one-out’ label prediction for each preceding phrase (true positive), calculated by assigning each dot to the nearest centroid (L2). Dashed line marks chance level. d, As in c but for the first following phrase. Source Data

Supplementary information

Supplementary Information

Supplementary Note 1 - Supplementary analysis of overlapping ROIs. This SI Note estimates the number of independent sources of fluorescence signal among regions of interests (ROIs) defined in the main text. The note shows that unifying overlapping ROIs as one source persisting across days does not change the finding in the main manuscript. Supplementary Note 2 - Non-parametric statistical analysis for results in main manuscript. This SI Note repeats the manuscript’s analyses with the non-parametric 1-way ANOVA (Kruskal Wallis). These analyses recapitulated all the findings in the manuscript.

Reporting Summary

Video 1: Confocal imaging of GCaMP and calbindin #1.

Video frames show stacks of confocal microscopy sections (3µm thick) that were used to test the specificity of GCaMP expression to excitatory neurons (methods). GCaMP is stained in green and calbindin markers are stained in blue.

Video 2: Confocal imaging of GCaMP and calbindin #2.

Video frames show stacks of confocal microscopy sections (3µm thick) that were used to test the specificity of GCaMP expression to excitatory neurons (methods). GCaMP is stained in green and calbindin markers are stained in blue.

Video 3: Confocal imaging of GCaMP and calretinin #1.

Video frames show stacks of confocal microscopy sections (3µm thick) that were used to test the specificity of GCaMP expression to excitatory neurons (methods). GCaMP is stained in green and calretinin markers are stained in blue.

Video 4: Confocal imaging of GCaMP and calretinin #2.

Video frames show stacks of confocal microscopy sections (3µm thick) that were used to test the specificity of GCaMP expression to excitatory neurons (methods). GCaMP is stained in green and calretinin markers are stained in blue.

Video 5: Confocal imaging of GCaMP and parvalbumin #1.

Video frames show stacks of confocal microscopy sections (3µm thick) that were used to test the specificity of GCaMP expression to excitatory neurons (methods). GCaMP is stained in green and parvalbumin markers are stained in blue.

Video 6: Confocal imaging of GCaMP and parvalbumin #2.

Video frames show stacks of confocal microscopy sections (3µm thick) that were used to test the specificity of GCaMP expression to excitatory neurons (methods). GCaMP is stained in green and parvalbumin markers are stained in blue.

Video 7: Confocal imaging of GCaMP and parvalbumin #3.

Video frames show stacks of confocal microscopy sections (3µm thick) that were used to test the specificity of GCaMP expression to excitatory neurons (methods). GCaMP is stained in green and parvalbumin markers are stained in blue.

Video 8: Denoised videos used to create Figure 4a.

The results of the CNMFE algorithm that was used to de-noise the fluorescence videos to visualize context dependent neurons in Figure 4a.

Source data

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Cohen, Y., Shen, J., Semu, D. et al. Hidden neural states underlie canary song syntax. Nature 582, 539–544 (2020). https://doi.org/10.1038/s41586-020-2397-3

Download citation

Further reading

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.