## Abstract

Changes in behaviour resulting from environmental influences, development and learning^{1,2,3,4,5} are commonly quantified on the basis of a few hand-picked features^{2,3,4,6,7} (for example, the average pitch of acoustic vocalizations^{3}), assuming discrete classes of behaviours (such as distinct vocal syllables)^{2,3,8,9,10}. However, such methods generalize poorly across different behaviours and model systems and may miss important components of change. Here we present a more-general account of behavioural change that is based on nearest-neighbour statistics^{11,12,13}, and apply it to song development in a songbird, the zebra finch^{3}. First, we introduce the concept of ‘repertoire dating’, whereby each rendition of a behaviour (for example, each vocalization) is assigned a repertoire time, reflecting when similar renditions were typical in the behavioural repertoire. Repertoire time isolates the components of vocal variability that are congruent with long-term changes due to vocal learning and development, and stratifies the behavioural repertoire into ‘regressions’, ‘anticipations’ and ‘typical renditions’. Second, we obtain a holistic, yet low-dimensional, description of vocal change in terms of a stratified ‘behavioural trajectory’, revealing numerous previously unrecognized components of behavioural change on fast and slow timescales, as well as distinct patterns of overnight consolidation^{1,2,4,14,15} across the behavioral repertoire. We find that diurnal changes in regressions undergo only weak consolidation, whereas anticipations and typical renditions consolidate fully. Because of its generality, our nonparametric description of how behaviour evolves relative to itself—rather than to a potentially arbitrary, experimenter-defined goal^{2,3,14,16}—appears well suited for comparing learning and change across behaviours and species^{17,18}, as well as biological and artificial systems^{5}.

## Access options

Subscribe to Journal

Get full journal access for 1 year

$199.00

only $3.83 per issue

All prices are NET prices.

VAT will be added later in the checkout.

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

## Data availability

The data that support the findings of this study are available from the corresponding authors upon reasonable request.

## Code availability

We provide source code for our nearest-neighbour-based analyses (https://github.com/skollmor/repertoireDating). We also provide a data-browser/data analysis graphical user interface that we used to perform some analyses and to explore and visualize data (https://github.com/skollmor/dspace).

## References

- 1.
Brashers-Krug, T., Shadmehr, R. & Bizzi, E. Consolidation in human motor memory.

*Nature***382**, 252–255 (1996). - 2.
Derégnaucourt, S., Mitra, P. P., Fehér, O., Pytte, C. & Tchernichovski, O. How sleep affects the developmental learning of bird song.

*Nature***433**, 710–716 (2005). - 3.
Tchernichovski, O., Mitra, P. P., Lints, T. & Nottebohm, F. Dynamics of the vocal imitation process: how a zebra finch learns its song.

*Science***291**, 2564–2569 (2001). - 4.
Andalman, A. S. & Fee, M. S. A basal ganglia–forebrain circuit in the songbird biases motor output to avoid vocal errors.

*Proc. Natl Acad. Sci. USA***106**, 12518–12523 (2009). - 5.
Arulkumaran, K., Deisenroth, M. P., Brundage, M. & Bharath, A. A. Deep reinforcement learning: a brief survey.

*IEEE Signal Process. Mag*.**34**, 26–38 (2017). - 6.
Ingram, J. N., Flanagan, J. R. & Wolpert, D. M. Context-dependent decay of motor memories during skill acquisition.

*Curr. Biol*.**23**, 1107–1112 (2013). - 7.
Klaus, A. et al. The spatiotemporal organization of the striatum encodes action space.

*Neuron***95**, 1171–1180 (2017). - 8.
Han, S., Taralova, E., Dupre, C. & Yuste, R. Comprehensive machine learning analysis of

*Hydra*behavior reveals a stable basal behavioral repertoire.*eLife***7**, e32605 (2018). - 9.
Egnor, S. E. R. & Branson, K. Computational analysis of behavior.

*Annu. Rev. Neurosci*.**39**, 217–236 (2016). - 10.
Wiltschko, A. B. et al. Mapping sub-second structure in mouse behavior.

*Neuron***88**, 1121–1135 (2015). - 11.
van der Maaten, L. Accelerating

*t*-SNE using tree-based algorithms.*J. Mach. Learn. Res*.**15**, 3221–3245 (2014). - 12.
Chen, H. & Friedman, J. J. H. A new graph-based two-sample test for multivariate and object data.

*J. Am. Stat. Assoc*.**1459**, 1–41 (2016). - 13.
Hawks, M.

*Graph-Theoretic Statistical Methods for Detecting and Localizing Distributional Change in Multivariate Data*. PhD Thesis, Naval Postgraduate School, Monterey, California (2015). - 14.
Shank, S. S. & Margoliash, D. Sleep and sensorimotor integration during early vocal learning in a songbird.

*Nature***458**, 73–77 (2009). - 15.
Fenn, K. M., Nusbaum, H. C. & Margoliash, D. Consolidation during sleep of perceptual learning of spoken language.

*Nature***425**, 614–616 (2003). - 16.
Tchernichovski, O., Nottebohm, F., Ho, C. E., Pesaran, B. & Mitra, P. P. A procedure for an automated measurement of song similarity.

*Anim. Behav*.**59**, 1167–1176 (2000). - 17.
Anderson, D. J. J. & Perona, P. Toward a science of computational ethology.

*Neuron***84**, 18–31 (2014). - 18.
Krakauer, J. W. & Shadmehr, R. Consolidation of motor memory.

*Trends Neurosci*.**29**, 58–64 (2006). - 19.
Brainard, M. S. & Doupe, A. J. What songbirds teach us about learning.

*Nature***417**, 351–358 (2002). - 20.
Catchpole, C. K. & Slater, P. J. B.

*Bird Song: Biological Themes and Variations*(Cambridge Univ. Press, 2003). - 21.
Dhawale, A. K., Smith, M. A. & Ölveczky, B. P. The role of variability in motor learning.

*Annu. Rev. Neurosci*.**40**, 479–498 (2017). - 22.
Lipkind, D. et al. Songbirds work around computational complexity by learning song vocabulary independently of sequence.

*Nat. Commun*.**8**, 1247 (2017). - 23.
Korman, M. et al. Daytime sleep condenses the time course of motor memory consolidation.

*Nat. Neurosci*.**10**, 1206–1213 (2007). - 24.
Fischer, S., Hallschmid, M., Elsner, A. L. & Born, J. Sleep forms memory for finger skills.

*Proc. Natl Acad. Sci. USA***99**, 11987–11991 (2002). - 25.
Kruskal, J. B. Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis.

*Psychometrika***29**, 1–27 (1964). - 26.
Fehér, O., Wang, H., Saar, S., Mitra, P. P. & Tchernichovski, O. De novo establishment of wild-type song culture in the zebra finch.

*Nature***459**, 564–568 (2009). - 27.
Adam, I. & Elemans, C. P. H. Vocal motor performance in birdsong requires brain–body interaction.

*eNeuro***6**, ENEURO.0053-19.2019 (2019). - 28.
Walker, M. P., Brakefield, T., Hobson, J. A. & Stickgold, R. Dissociable stages of human memory consolidation and reconsolidation.

*Nature***425**, 616–620 (2003). - 29.
Vogelstein, J. T. et al. Discovery of brainwide neural-behavioral maps via multiscale unsupervised structure learning.

*Science***344**, 386–392 (2014). - 30.
Fahad, A. et al. A survey of clustering algorithms for big data: taxonomy and empirical analysis.

*IEEE Trans. Emerg. Top. Comput*.**2**, 267–279 (2014).

## Acknowledgements

We thank J. Herbst and Z. Huang for performing the tutoring experiments; and A. Zai, S. Surace, A. Huber, I. Calangiu and K. Martin for discussing the manuscript. This work was supported by grants from the Simons Foundation (SCGB 328189 and SCGB 543013 to V.M.) and the Swiss National Science Foundation (SNSF PP00P3_157539 to V.M. and SNSF 31003A_182638 to R. H.). All experimental procedures were approved by the Veterinary Office of the Canton of Zurich.

## Author information

### Affiliations

### Contributions

S.K. conceived of the approach. S.K. and V.M. performed analyses. S.K., V.M. and R.H.R.H. wrote the paper. R.H.R.H. conceived and supervised collection of the behavioural data.

### Corresponding authors

Correspondence to Sepp Kollmorgen or Valerio Mante.

## Ethics declarations

### Competing interests

The authors declare no competing interests.

## Additional information

**Publisher’s note** Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

## Extended data figures and tables

### Extended Data Fig. 1 Clustering of juvenile and adult zebra finch song.

**a**, Vocal development in male zebra finches. Tutoring by an adult male started at around day 46 (post-hatch) and lasted 10–20 days. **b**, Time course of the acoustic feature frequency modulation (FM), for syllable b in the example bird (compare with Fig. 1b). **c**, Normalized mean silhouette values for 2–10 clusters for vocalizations from the seven days shown in **d**. High values indicate evidence for the respective cluster count. Normalized mean silhouette coefficients are based on 20 repetitions of *k*-means clustering of random subsets of 1,000 68-ms onset-aligned spectrogram segments from a single day (as in **d**), projected onto the first five principal components. **d**, *t*-SNE visualizations of vocalizations produced on a given day post-hatch for the example bird (bird 4, the same bird as in Fig. 1a–c). A separate embedding was computed for each day, and the embedding’s initial condition was based on the previous day. Note the gradual emergence of clusters, each corresponding to a distinct syllable type (for example, syllables i, a, b, c in Fig. 2a). **e**, Average fraction of neighbours from a different cluster, as a function of neighbourhood size. These data are analogous to those from **c**, **d** but for vocalizations from day 90 (12,854 data points), when clusters are fully developed. For a wide range of neighbourhood sizes, the neighbours of a data point mostly belong to the same cluster or syllable type. For a neighbourhood size of 100, the average fraction of out-of-cluster neighbours from the same day is 0.0089. Thus, for an appropriately chosen neighbourhood size, nearest-neighbour methods respect clustering structure in the data by construction, and sidestep having to explicitly identify clusters in the data. In most analyses, we computed nearest neighbours for data from all days, meaning that clustering structure is respected even for neighbourhood sizes that are slightly larger than those suggested in **e**.

### Extended Data Fig. 2 Properties of large-scale embeddings.

**a**, Three auditory features computed on renditions of syllable b. This panel uses the same embedding as in Fig. 2a, but with different colours. **b**, Across-day change in vocalizations. This is a magnified cutout from the bottom left region of the dashed outline of Fig. 2a. The colours differ from Fig. 2a, and points from days 50–56 only are shown. **c**, Within-day change in vocalizations. Points from **b** are shown separately for three individual days and coloured according to production time within the day (early to late). Vocalizations change within a day: early vocalizations (dark green) are more similar to vocalizations from previous days (dark green points in **b**); late vocalizations (light blue) are more similar to vocalizations from future days (light blue in **b**). **d**, *t*-SNE visualizations for dense recordings from three birds (analogous to Fig. 2a). **e**–**g**, Illustration of a fictitious behaviour that undergoes distinct phases of abrupt change, no change and gradual change, and the identification of these phases on the basis of nearest-neighbour graphs. **e**, A low-dimensional representation of the behaviour. Each point corresponds to a behavioural rendition (for example, a syllable rendition) and is coloured according to production time. Similar renditions (for example, syllable renditions with similar spectrograms) appear near each other in this representation. The dotted ellipses mark three subsets of points corresponding to: (1) a phase of abrupt change; (2) a phase of no change; and (3) a phase of gradual change. **f**, Nearest-neighbour graphs for the three subsets of points in **e**. Points are replotted from **e** with different symbols, indicating whether their production times fall within the first half (squares) or second half (crosses) of the corresponding subset. Edges connect each point to its five nearest neighbours. The edge colour marks neighbouring pairs of points falling into the same (black) or different (red) halves. Relative counts of within- and across-half edges differ according to the nature of the underlying behavioural change (histograms of edge counts). If an abrupt change in behaviour occurs between the first and second half, nearest neighbours of points in one half will all be points from the same half, and none from the other half (discontinuity). When behaviour is stationary, the neighbourhoods are maximally mixed: that is, every point has about an equal number of neighbours from the two halves. Phases of gradual change result in intermediate levels of mixing. **g**, Mixing matrix for the simulated data in **e**, analogous to Fig. 2e. Each location in the matrix corresponds to a pair of production times. Strong mixing (white) indicates a large number of nearest-neighbour edges across the two corresponding production times (as in **f**; stationary) and thus similar behaviour at the two times. Weak mixing (black) indicates a small number of such edges (as in **f**; discontinuity), and thus dissimilar behaviour. Note that such statistics on the composition of local neighbourhoods can be computed for any kind of behaviour and are invariant with respect to transformations of the data that preserve nearest neighbours, such as scaling, translation and rotation. These properties make nearest-neighbour approaches highly general.

### Extended Data Fig. 3 Repertoire dating and the direction of slow change.

**a**, The four nearest neighbours for example vocalizations (bird 4, syllable b, from Fig. 1a). Production times of nearest neighbours (numbers) need not equal that of the corresponding example rendition. **b**, Neighbourhood production times for three renditions from day 70 (analogous to Fig. 2a, inset). Rendition 2 is ‘typical’ for day 70 (most neighbours lie in the same or adjacent days); renditions 1 and 3 are a ‘regression’ and an ‘anticipation’ (with neighbours predominantly produced in the past or future, respectively). **c**, All renditions from day 70 (a subset of the points in Fig. 2a). Colours correspond to repertoire time (50th percentile in Fig. 2d). Anticipations (repertoire times greater than 70) and regressions (repertoire times less than 70) occur at locations corresponding to vocalizations typical of later and earlier development (compare with Fig. 2a). Numbers 1–3 mark the approximate locations of the example renditions in **b**. **d**, Mixing matrices for additional birds (analogous to Fig. 2e, using the same birds as in Extended Data Fig. 2d). Bird 3 produced only a very few vocalizations (mostly calls) before tutoring onset (black arrows). The mixing matrices consistently show a period of gradual change starting after tutor onset and lasting several weeks. This gradual change typically slows down (resulting in larger mixing values far from the diagonal) at the end of the developmental period considered here (day 90 post-hatch; later periods are in Extended Data Fig. 6). Grey values correspond to the base-2 logarithm of the mixing ratio (LMR), that is, histograms over the pooled neighbourhood times (Fig. 2c) normalized by a null hypothesis obtained from a random distribution of production times (see Supplementary Methods). For example, an LMR value of 5 implies that renditions from the corresponding pair of production times are 2^{5} = 32 times more mixed at the level of local neighbourhoods than would be expected by chance (that is, there is a random distribution of production times across renditions). **e**, As in **d**, bird 2, but after shuffling production times among all data points. Effects under this null hypothesis are small (the maximal observed mixing ratio is 2^{0.06} or approximately 1.042). Similar, small effects under the null hypothesis are obtained for the other mixing matrices discussed throughout the text. **f**–**h**, Properties of the behavioural trajectory inferred from the mixing matrix in Fig. 2f. **f**, Pairwise distances between points along the inferred behavioural trajectory (*x* axis), plotted against measured disparities (*y* axis). Disparities are obtained by rescaling and inverting the similarities in Fig. 2f (see Supplementary Methods). The points on the trajectory are inferred with ten-dimensional non-metric MDS on the measured disparities. Importantly, the pairwise distances between inferred points faithfully represent the corresponding, measured disparities (all points lie close to the diagonal; MDS stress = 0.0002). **g**, **h**, Structure of low-dimensional projections of the behavioural trajectory. We applied principle-component analysis to the ten-dimensional arrangement of points inferred with MDS and retained an increasing number of dimensions (number of dimensions indicated by greyscale). For example, the projection onto the first two principle components is shown in Fig. 2h (MDS dimension 2 in **g**, **h**). The first two principle components explain 75% of the variance in the full ten-dimensional trajectory. **g**, Measured (true) disparity (thick grey curve) and distances along the inferred trajectories (points and thin curves) as a function of the day gap (*δ*) between points. For any choice of projection dimensionality and *δ*, we computed the Euclidean distances between any two points separated by *δ* and averaged across pairs of points. The measured (true) disparities increase rapidly between subsequent and nearby days, but only slowly between far apart days (thick grey curve). Low-dimensional projections of the trajectory (for example, MDS dimension 2) underestimate the initial increase in disparities. **h**, Angle between the reconstructed direction of across-day change for inferred behavioural trajectories, as a function of the day gap between points. Same conventions and legend as in **g**. For the one- and two-dimensional trajectories, the direction of across-day change varies little or not at all from day to day (see inset; the arrow indicates the angle of across-day change). On the other hand, the direction of across-day change along the full, ten-dimensional behavioural trajectory is almost orthogonal for subsequent days. Data shown in **g**, **h** suggest that the full behavioural trajectory is more ‘rugged’ than indicated by the two-dimensional projection in Fig. 2h. This structure is consistent with the finding that across-day change includes a large component that is orthogonal to the directions of slow change and of within-day change (Fig. 3j). Note that **a** shows 200-ms spectrogram segments, whereas **b**–**h** are based on 68-ms segments (as are most of the analyses).

### Extended Data Fig. 4 Models of the alignment between components of change.

**a**–**d**, Validation of repertoire dating. We simulated individual behavioural renditions as points in a high-dimensional space, drawn from a time-dependent probability distribution that changes both within and across days (see Supplementary Methods), and verified that repertoire dating can successfully recover the underlying structure of the models. The main parameters determining the relative alignment of the DiSC with the directions of within-day and across-day change are: *k*_{a}, the amount of within-day change along the DiSC; *g*_{n}, the amount of within-day change orthogonal to the DiSC; and *b*_{n}, the amount of across-day change orthogonal to the DiSC. These parameters are expressed relative to the amount of across-day change along the DiSC (thick black arrow in **a**, **b**). The two models shown (**a**, **c**) are characterized by different amounts of overnight consolidation of within-day changes along the DiSC. **a**, Model 1. Within-day change is aligned with the DiSC (*g*_{n} = 0) and is large (*k*_{a} = 5). The component of across-day change orthogonal to the DiSC is as large as the component of across-day change along it. In this scenario, overnight consolidation of within-day changes along the DiSC is weak for typical renditions (20% of change is consolidated, corresponding to a consolidation index of −0.8 in Fig. 1c, d). **b**, Repertoire dating percentiles for model 1, analogous to Fig. 3b. The time course of the 50th repertoire-dating percentile (typical renditions, red) closely reproduces the dynamics of change along the DiSC implied by **a**: within-day change along the DiSC is large (the red line extends over about five days) and consolidation is weak (the starting point on day *k* + 1 relative to day *k* moves by about 20% of the overall within-day range). **c**, Model 2. Within-day change has a large component orthogonal to the DiSC, whereas across-day change is aligned with the DiSC. In this scenario, overnight consolidation of within-day changes along the DiSC is strong (80% of the change is consolidated; consolidation index −0.2) for typical renditions. **d**, Repertoire dating percentiles for model 2, analogous to **b**. The time course of the 50th repertoire dating percentile (typical renditions, red) closely reproduces the dynamics of change along the DiSC implied by **c**. In **b**, **d**, differences between anticipations (95th percentile) and regressions (5th percentile) correctly reflect the underlying model parameters (see Supplementary Methods). **e**–**j**, Validation of stratified behavioural trajectories. We generated three sets of stratified behavioural trajectories that differ with respect to the alignment of within-day and across-day change with the DiSC. We built each set of trajectories by arranging 50 points (five strata per day, five production time periods per day, on two consecutive days; same conventions as Fig. 3f, g) within a four-dimensional space. We then generated simulated stratified mixing matrices (**e**–**g**, replotted from Fig. 3f) by computing pairwise distances between all points, and transforming distances into similarities. We visualize the behavioural trajectories (**h**–**j**) with the same two-dimensional projections as in Fig. 3h–j, with the same scale along all dimensions. In all models, overnight consolidation along the DiSC is perfect (strong consolidation) for all strata. **e**, Model 1: within-day change and across-day change occur only along the DiSC. For each stratum (that is, each of the five 10-by-10 squares along the diagonal), similarity decreases smoothly with time, reflecting the gradual progression of the trajectory along the DiSC within and across days. **f**, Model 2: within-day change has a large component that is not aligned with the DiSC. **g**, Model 3: both within-day and across-day change have large components that are not aligned with the DiSC. The misaligned component of across-day change reduces the similarity between day *k* and day *k* + 1 compared with model 2, resulting in smaller values in the 5-by-5 squares comparing points from day *k* and day *k* + 1. **h**, Behavioural trajectories for model 1: the two-dimensional projection containing the DiSC (top) explains all the variance in the trajectories. **i**, Behavioural trajectories for model 2: similar to **h**, but points from different periods during the day are also displaced along an orthogonal direction of within-day change (middle). **j**, Behavioural trajectories for model 3: similar to **i**, but points from adjacent days are also displaced along an orthogonal direction of across-day change (bottom). Note that the models in **e**–**j** are implemented differently to the models in **a**–**d** (see Supplementary Methods).

### Extended Data Fig. 5 Repertoire dating control analyses.

**a**–**c**, Within-bout effects, analogous to Fig. 3c, d. **a**, Within-bout effects computed only from renditions that fall into short bouts (the bout length is less than the median). **b**, Analogous to **a**, but computed only from renditions that fall into long bouts (the bout length is more than the median). The changes in the behavioural repertoire observed within a bout are qualitatively similar for short and long bouts (compare **a**, **b**; within-bout effects are most pronounced after day 70). In particular, the song becomes more regressive shortly before the end of a bout (5th percentile, bottom curves). This suggests that the analogous effect in Fig. 3c, d occurs at the end of a bout, rather than at a fixed time after the beginning of a bout. **c**, Analogous to Fig. 3c, d but computed over the entire dataset without prior clustering into syllables. The changes in behavioural repertoire differ in several respects from those in Fig. 3c, d, which were computed on individual syllables and then averaged across syllables (see Supplementary Methods). Here, the increase in regressions at the bout end is less pronounced. Moreover, large within-bout changes also occur for anticipations early in development. Both differences may reflect changes in the relative frequency of renditions from each syllable (for example, introductory notes) sung throughout a bout. Such changes in frequency can affect the results in **c**, which were computed on the unclustered data, but not those in **a**, **b**. **d**, Within-day effects, analogous to Fig. 3a, b, but computed for individual syllables, and then averaged across syllables and animals. The changes in behavioural repertoire are qualitatively similar to those in Fig. 3a, b, which were computed using the unclustered data. This similarity implies that the dynamics along the direction of slow change in Fig. 3 cannot be explained by changes during the day in the relative frequency of renditions from each syllable. **e**, Analogous to Fig. 3a, b but computed after shuffling production times among all data points. Within-day changes of the percentile curves are small under this null hypothesis. The maximal span of within-day fluctuations is 0.2 days, compared with 3.71 for the unshuffled data in Fig. 3b. The total repertoire spread (5th to 95th percentiles) is around 40 days, compared with around 23 days for unshuffled data. The 50th percentile curve is flat, implying that the shuffled data do not undergo a systematic drift over time (that is, do not describe a DiSC). The vertical separation between percentiles, then, reflects the range of production times in the data, not the spread along the DiSC. The time course of the 5th and 95th repertoire dating percentiles should thus be interpreted as the progression of regressions and anticipations along the DiSC only over the range of repertoire times covered by typical renditions (that is, approximately the vertical range of the 50th repertoire dating percentile). **f**, Analogous to Fig. 3e but for different distance metrics (Euclidean; correlation; Euclidean after time warping) and feature representations (32 acoustic features; 1 acoustic feature (entropy variance)). See also Extended Data Fig. 9.

### Extended Data Fig. 6 Behavioural change in adult versus juvenile birds.

**a**–**d**, Comparison of within-day repertoire dating results during and after the end of development (averaged over three birds). Top, juvenile birds; bottom, same birds but as adults. **a**, Large-scale embeddings analogous to Fig. 2a. **b**, Repertoire dating percentiles, analogous to Fig. 3a, b. **c**, Stratified mixing matrix, analogous to Fig. 3g. **d**, Stratified behavioural trajectories, analogous to Fig. 3h–k. **e**, Shift and span values for the 50th percentile, for juvenile and adult birds. Points indicate individual birds. Song in adult birds is not static, but the time course of change differs from that observed in juveniles. First, change in adults is substantially less than in juveniles (see the slope of the 50th percentile in the top versus bottom parts of **b**). Second, the relation of fast (within-day) and slow (across-day) change differs in juveniles versus adults. In juveniles, vocalizations move along the DiSC (*y* axes in **b**; slow local axis in **d**) within each day and the repertoire time of typical renditions increases by about one day from morning to evening (50th percentile; span is approximately one day) and is maintained through the next morning (shift is approximately 0 days). In adults, typical renditions do not show within-day progress along the DiSC (the span is approximately 0 days) but change overnight across days (the shift is greater than 0 days). In adults, the regressive tail of the repertoire in particular moves towards smaller values during the day (**b**, bottom right; 5th percentile), whereas in juveniles it consistently moves towards larger values (**b**, top). In both juvenile and adult birds, within-day change has a strong component that is misaligned with the DiSC (within-day axis in **d**).

### Extended Data Fig. 7 Local linear analysis.

**a**–**e**, We validated the structure of change inferred with nearest-neighbour statistics (Fig. 3) with an approach based on linear regression in the high-dimensional spectrogram space (see Supplementary Methods). Unlike for the case of nearest neighbour-based statics, here each rendition must first be assigned to a cluster (that is, a syllable; compare with Fig. 2a) and each cluster is analysed separately. **a**, Illustration of the linearization scheme. First, we infer the (local) DiSC on days *k* and *k* + 1 (grey arrow) as the vector of linear-regression coefficients relating production day to variability of renditions from days *k* − 1 and *k* + 2. Second, we infer the direction of within-day change (green arrow) as the linear-regression coefficients relating the period within a day to variability of renditions from days *k* and *k* + 1, orthogonalized to the DiSC. Third, we infer the direction of across-day change (orange arrow) as the linear-regression coefficients relating production day to variability of renditions from days *k* and *k* + 1, orthogonalized to the DiSC and within-day change. All three sets of coefficients, and the corresponding directions in spectrogram space, typically vary across days, syllables and birds. The progression of song along the DiSC and along the (orthogonalized) directions of within-day and across-day change are obtained by projecting renditions on day *k* and *k* + 1 onto the corresponding directions. **b**, Example rendition of syllable b as in Fig. 1 (top, encapsulated by red lines) and inferred coefficients (directions in spectrogram space; bottom) for day *k* = 57. Bright and dark shades of grey mark spectrogram bins for which power increases or decreases, respectively, over the corresponding timescales in **a**. **c**, Dependency of cross-validated regression quality (fraction of variance explained; *y* axis) on the regularization constant (*λ*) for the estimation of the DiSC. One regularization constant was chosen for each syllable and the direction based on maximizing the leave-one-out cross-validation error on the training set. **d**, Progression of syllable b along the directions of change shown in **b**, during days 57 and 58. Renditions from each day are binned into ten consecutive periods on the basis of production time within the day (analogous to the ten periods in Fig. 3a, b; curves and error bars represent means and 95% bootstrapped confidence intervals). For simplicity of visualization, the time elapsed (*x* axis) during the night between days *k* and *k* + 1 is not shown to scale. The position along the DiSC for the morning of day *k* + 1 is close to that for the evening of day *k*, indicating overall strong consolidation (left). The position along the direction of within-day change is reset overnight, implying that the underlying changes are not consolidated (middle). The position along the direction of across-day change jumps overnight, consistent with offline learning (right). We note that strong consolidation, weak consolidation and offline learning have all been reported previously, albeit in different behaviours and species^{2,4,14,15,23,24}. The charts in **d** show that these different patterns of change can occur in the very same syllable along distinct spectral features (see also Fig. 1h and Extended Data Fig. 8). By considering features with different projections onto these directions, a wide range of consolidation patterns can be uncovered (see also Fig. 1h). **e**, As for **d**, but averaged across all four-day windows during days 60–69 and over all syllables and birds (same five birds as in Figs. 2, 3). The resulting averages include contributions from the entire behavioural repertoire, including regressions, typical renditions and anticipations. The two right-most panels show concurrent progression along the DiSC and the direction of within-day or across-day change, combining data from the first and second, or first and third, panels in **e**. These representations are analogous, and in qualitative agreement, with the behavioural trajectories in Fig. 3h–k (typical). **f**, Analogous to **e**, but computed on vocalizations represented by 32 acoustic features instead of spectrograms. Directions as in **e** can be retrieved, but progression along the DiSC appears noisier, suggesting that the 32 acoustic features do not fully capture in particular the slow spectral changes occurring over development (see also Extended Data Fig. 9). **g**, **h**, Contribution of individual acoustic features to the directions of slow, within-day and across-day change. As in **f**, the directions are computed in the space of 32 acoustic features. **g**, Distribution of coefficients in the retrieved orthonormalized directions. Thick and thin black bars represent means and 95% confidence intervals; crosses show outliers; thin vertical lines represent medians. **h**, Means (solid lines) and medians (dotted lines) of the signed (left) or unsigned (right) distributions in **g**. Most coefficients are small and variable, indicating that the alignment between any of the 32 acoustic features and the inferred directions of change is weak and highly variable over time, syllables and birds.

### Extended Data Fig. 8 Behavioural variability and stratification in an example syllable.

**a**, Songs of an example bird for three days during development. Only those spectrogram segments that belong to a particular syllable and location in the motif (68-ms window of interest; red dotted lines) are analysed in the subsequent panels. **b**, Developmental changes over the course of weeks. Renditions are binned by production day, and averaged. The most apparent changes are an increase in pitch and the later successive appearance of additional spectral lines at low frequencies. **c**, Within-day and across-day changes for days 60–69. Renditions are binned into five production-time periods spanning a day and averaged within bins. On many days, the changes within a day do not appear to recapitulate the changes occurring across days (for example, days 60 and 65; within-day progression does not smoothly transition between the vocalizations on preceding and subsequent days; see **b**). The averages also reveal occasional overnight ‘jumps’ in the properties of the vocalizations (see, for example, the vertical black arrows). **d**, Comparison of within-day change and change on longer timescales. Renditions within each period and day were split into strata according to their repertoire times (for example, the quintiles in Fig. 2d), resulting in 25 averages, one for each combination of stratum and period within the day. Only the upper part of the spectrogram is shown (red rectangle in **c**). The progression along strata (*x* axis) emphasizes the large extent of motor variability along the DiSC existing within a single day (day 62). **e**, Same averages as in **d**, but with *x* and *y* axes swapped. In particular for regressive renditions (quintile 1), change within day 62 (*x* axis) does not recapitulate developmental changes occurring over months (*x* axis in **d**). **f**, Repertoire dating based on repertoire time (as in Extended Data Fig. 3c). Each point corresponds to a production-time period and the average of all repertoire times of renditions in that period. Error bars show bootstrapped 95% confidence intervals. The change in repertoire time, which is computed without using a low-dimensional parametrization of vocalizations captures the movement along the DiSC.

### Extended Data Fig. 9 Behavioural change based on alternative distance metrics and features.

To demonstrate the robustness of the proposed nearest-neighbour statistics, we verified that the inferred time course of behavioural change is reproduceable using a number of different distance metrics (used to define nearest neighbours) and parameterizations of vocalizations. **a**–**d**, We recomputed the main analyses using a Pearson’s correlation metric on 68-ms onset-aligned spectrogram segments (first row); and the Euclidean distance on onset-to-offset spectrogram segments that were linearly time-warped to a duration of 100 ms (second row). For comparison, the main analyses in the text were based on Euclidean distance on 68-ms onset-aligned spectrogram segments (for example, Fig. 2c–f, 3). **a**, *t*-SNE visualization based on the corresponding distance metrics and sound representation for the example bird, analogous to Fig. 2a. **b**, Repertoire dating averaged over birds, analogous to Fig. 3a, b. **c**, Stratified mixing matrices averaged over birds, analogous to Fig. 3g. The mixing values are highly correlated across distance metrics: Euclidean (main text) versus correlation, variance explained = 92%; Euclidean (main text) versus time-warped Euclidean, 93%. **d**, Stratified behavioural trajectories based on **c**, analogous to Fig. 3h–k. The results in **a**–**d** are consistent with those in Fig. 3, showing that our findings are robust with respect to the exact definition of nearest neighbours. Moreover, the overall structure of the behavioural trajectory appears to depend only minimally on changes in tempo and spectrogram magnitude (first row: Pearson’s correlation is invariant to changes in overall magnitude of vocalizations; second row: time-warped Euclidean distance is invariant to changes in tempo). **e**–**h**, We recomputed all main analyses with four additional parameterizations of vocalizations: time-dependent normalized acoustic feature traces for 16 acoustic features within 68-ms windows after syllable onset (first row); means and variances of the same 16 acoustic features over entire syllables (second row); means and variances of 8 of the 16 acoustic features (third row); and a one-dimensional parametrization consisting solely of entropy variance computed over entire syllables (fourth row). Feature means and variances were *z*-scored across all syllables. For all of these parameterizations we defined nearest neighbours with the Euclidean distance. **e**, Embedding using *t*-SNE based on the corresponding parameterization and metric. For entropy variance alone, the embedding appears locally one dimensional (for visibility, data points are larger than for the other parameterizations). Entropy variance maps mostly smoothly onto this one-dimensional manifold (data not shown). **f**, Repertoire dating averaged over birds, analogous to Fig. 3a, b. Repertoire dating based on entropy variance alone fails to reproduce most of the results in Fig. 3 obtained with spectrogram segments. The percentile curves are almost flat, indicating that renditions cannot be reliably assigned to their production times on the basis of entropy variance alone. In this case, vertical separation between percentiles cannot be interpreted as spread along the DiSC (see Extended Data Fig. 5e). For entropy variance alone, span is greater than zero across all percentiles, but consolidation is consistently close to zero. **g**, Stratified mixing matrix averaged over birds, analogous to Fig. 3g. The match with the mixing matrix in Fig. 3g decreases as the dimensionality of the parameterization is reduced (spectrogram versus time-dependent feature traces: variance explained = 93%; spectrogram versus 16 acoustic feature means and variances, 91%; spectrogram versus 8 acoustic feature means and variances, 84%; spectrogram versus entropy variance, 54%). **h**, Stratified behavioural trajectories based on **g**, as in Fig. 3h–k. The inferred behavioural trajectories are similar across the first three song parameterizations. However, these alternative parameterizations result in more vertical separation between percentiles in **f**, suggesting that they capture the direction of slow change less well (compare with Fig. 3a and Extended Data Fig. 5e). Parameterizations of reduced dimensionality also result in progressively less defined syllable clusters in the embeddings (**e**, top to bottom). These observations suggest that a parameterization based on the full spectrogram is better suited to capture the different directions of change explored during development (see also Extended Data Fig. 7). Note that for entropy variance (bottom row), the projections onto the local direction of slow change are highly magnified compared with the projections in the top panels.

### Extended Data Fig. 10 Behavioural change based on random spectrogram segments.

We recomputed all main-text analyses with a random segmentation of behaviour that does not require alignment to syllable onsets. This segmentation scheme can be applied to behaviour that does not fall into temporally discrete elements. Here each data point corresponds to a randomly chosen 68-ms spectrogram snippet drawn from a period of singing. Not all song was sampled, as we used 1,000,000 non-overlapping segments for each bird (see Supplementary Methods). **a**, Vocalizations of the example bird (as in Fig. 1a) from day 76, with example segments used for the analysis (at the top). **b**, *t*-SNE visualization for random segments from the example bird, based on nearest neighbours defined with respect to the Euclidean distance (left) and average spectrograms for different locations in the (*t*-SNE) embedding (right; analogous to Fig. 2b). Clusters corresponding to individual syllables are elongated compared with Fig. 2a. Variation along one direction within the cluster tends to account for production time (colour bar), while variation along another direction tends to reflect the timing of segments relative to syllable onsets. **c**, Embedding from Fig. 2a (bottom) and embedding of random 68-ms segments (top). Points in both embeddings are coloured according to cluster identity defined on onset-aligned spectrogram segments covering entire syllables (bottom). The colour of each point corresponding to a random snippet (top) corresponds to the cluster identity of the surrounding syllable. Some clusters in the embedding based on random segments contain points assigned to two different syllables (for example, black versus green colours). **d**, Repertoire dating averaged over birds, analogous to Fig. 3a, b. **e**, Stratified mixing matrix averaged over birds, analogous to Fig. 3g. The mixing values are highly correlated with those in Fig. 3g (variance explained = 89%). **f**, Stratified behavioural trajectories based on **e**, as in Fig. 3h–k. The results in **d**–**f** largely reproduce the corresponding findings obtained with onset-aligned 68-ms spectrogram segments (Fig. 3) as well as with other song parameterizations (Extended Data Fig. 9). Nonetheless, the overall effect sizes are reduced, probably because of the additional variability introduced by the random position of segments relative to syllable onsets. In **d**, the vertical separation between the 5th and 95th percentiles is increased and the slope of 50th percentile is reduced compared with the main-text analyses (Fig. 3a), suggesting a noisier representation of the direction of slow change (see Extended Data Fig. 5e) compared with onset-aligned 68-ms segments (Fig. 3).

## Supplementary information

### Supplementary Information

This file contains Supplementary Sections 1-9, including Supplementary Text, Supplementary Data, Supplementary Tables and Supplementary References.

## Rights and permissions

## About this article

### Cite this article

Kollmorgen, S., Hahnloser, R.H.R. & Mante, V. Nearest neighbours reveal fast and slow components of motor learning.
*Nature* **577, **526–530 (2020). https://doi.org/10.1038/s41586-019-1892-x

Received:

Accepted:

Published:

Issue Date:

## Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.