## Results

Rats moved unconstrained in a rectangular arena and conducted movements in different behavioral categories (i.e., stepping, turning, drinking, grooming, and rearing) while searching for water drops, which a robot arm positioned under mesh occasionally delivered (Fig. 1a). We recorded neuronal activities using electrodes that covered the sensorimotor cortex over both hemispheres (Fig. 1b). Two cameras videotaped the behavior of the rats for simultaneous 3D tracking. Recording sessions (n = 106 in total) were distributed over three months and varied between 30 and 60 min (μ = 36.1 min, σ = 5.2 min). In total, we identified 3723 single-units (μ = 35, σ = 21 across sessions) that we used for further analysis: 734, 896, and 231 in the left M2, M1, and S1, respectively, and 435, 796, and 631 in the right M2, M1, and S1, respectively24.

We focused on step-like behavior to extract behavioral components from the movements. To extract the steps, we binarized the movements of the paws into swing (moving) and stance (not moving) according to a horizontal velocity threshold (0.03 mm/ms). With each paw, rats performed one step per second on average (μ = 1.22, σ = 0.29). The average percentage of time spent in the stance phase across rats was 71%, σ = 17%.

### The strongest paw coupling in contralateral S1

Since classical methods such as peristimulus time histograms (PSTHs) are not applicable to behavior without a trial structure, we computed spike-triggered averages to investigate the relationship between neuronal activity and unconstrained movements25. We defined the spike-triggered average paw swing–stance status (STAPSSS) as a rough measure of the coupling of individual neurons to paw movements. For each neuron and each paw, we calculated the STAPSSS by averaging the swing–stance status in the period ±1 s around the spikes (Fig. 1c). For statistical control, we randomly shifted the spike train 1000 times to calculate 1000 control STAPSSS waveforms. Bootstrapping via temporal shifting preserves autocorrelations in time series and, therefore, helps to exclude false positives arising solely from autocorrelations in paw movements. We considered the STAPSSSs to be significant if their standard deviation over time exceeded the 0.99 quantile standard deviation of the control STAPSSS waveforms. Only neurons that spiked more systematically than expected by chance in relation to movement parameters could pass this test. Significantly coupled neurons were characterized by clear peaks in the STAPSSS (Fig. 1d and Supplementary Fig. 1). In total, 54% (2029/3723) of all neurons were significantly coupled to at least one paw. These were 47% (546/1169) of all neurons in M2, compared to 53% (906/1692) in M1 and 67% (577/862) in S1.

To take into account the strength of coupling, we defined a continuous measure for paw coupling as the ratio of the STAPSSS’s standard deviation and the control standard deviation (>1 for significant neurons). Using this quotient as a dependent variable, we calculated three-way ANOVAs (with hemisphere, area, and rat as factors) for all four paws separately (detailed results in Supplementary Table 2). In summary, for all four paws, we found a stronger coupling on the contralateral side (p = 0.04), which suggests lateralization during locomotion. The coupling increased from anterior to posterior areas (p < 1e − 11). For all four paws, the highest mean coupling was localized in contralateral S1 (Fig. 1e). In three out of the four paws, the interaction between the area and hemisphere was also significant, that is, the differences between the contralateral and ipsilateral hemisphere increased from anterior to posterior areas (p = 0.02). To further investigate the difference in magnitude between contralateral and ipsilateral paw coupling, we defined contralateral bias as the ratio between the coupling of the contralateral and ipsilateral paws: b = cr/cl for left-hemispheric neurons and b = cl/cr for right-hemispheric neurons (b ≈ 1 for non-biased neurons), with bias denoted as b, coupling as c, the right paw as r, and the left paw as l. We calculated this bias separately for the front and hind paws. A two-way ANOVA on the contralateral bias of individual neurons revealed a significant effect of the brain area for the front paws (F2,3715 = 44.66, p < 1e − 19) and the hind paws (F2,3715 = 54.56, p < 1e − 23). This confirmed that single neurons had a larger contralateral bias from anterior to posterior areas for the front and hind paws (Supplementary Fig. 2).

To further investigate the temporal relationship between neuronal activity in different motor areas and paw movements, we quantified the offset between each movement peak (the STAPSSS peak) and spike. We found that for all four paws, the offset for neurons in S1 tended to be more negative (i.e., the spike followed movement) than that of neurons in M1 and M2 (Supplementary Fig. 3). This effect was more pronounced for the hind paws according to an unpaired Mann–Whitney U-test between offsets from S1 and M1/M2 (front left paw Mann–Whitney U = 1157042, n1 = 862, n2 = 2861, p = 0.002 two-sided; front right paw U = 1155623.5, p = 0.002; hind left paw U = 1056184.5, p < 1e − 10; hind right paw U = 1070083, p < 1e − 8). The finding that neurons in S1 tended to spike after movements, whereas neurons in M1 and M2 spiked in a closer temporal relationship to paw movements, aligns well with the idea of S1 reacting to sensory input and M1 and M2 being more involved in movement generation.

### Single-unit activity allows for the decoding of paw movements within sessions

Owing to the strong paw coupling, we hypothesized that it is possible to decode the paw movements of freely moving rats from neuronal activity. To test this hypothesis, we applied feed-forward neural networks to decode the swing–stance status of the right front paw posed as a two-class classification problem. For each time point, we fed in the spike trains ±400 ms of all units in time bins of 10 ms. Deep neural networks were trained and evaluated separately for each recording session. We chose this approach because single-neuron activity does not generalize over sessions, in contrast to our population-level decoding approach in the following section. The mean per-class decoding accuracies were well above-chance level (μ = 71%, σ = 10%; chance level 50%). While there was no significant correlation between accuracy and train set sizes (Spearman’s ρ = 0.17, p = 0.07), we found a significant correlation between the accuracy and percentage of coupled neurons according to our STAPSSS analysis per session (Spearman’s ρ = 0.63, p < 1e − 12, Fig. 1f). This confirms that STAPSSS is a reliable measure of the correlation between neuronal activity and movement.

### The structure of population activities allows the decoding of behavior

Owing to the promising decoding results of paw movements, we sought to determine whether population activity during unconstrained movements also contained information on more complex behavior. We used LEMs5,22 to reveal and visualize the structures in the population activities. LEM is a non-linear dimensionality-reduction method for extracting low-dimensional manifolds in high-dimensional data using spectral techniques. Owing to redundancy in the code of the cortex, organized in subpopulations of correlated units, we expected in fact to capture most of the population variance in a space with a lower dimension than the recording space. We applied LEM to the neighborhood graphs of neuronal activity vectors to visualize structures and relationships among population activities at different time points of a recording session in a low-dimensional space. Most of the resulting projections showed a clear saddle-like shape when visualized in three dimensions (Fig. 2a, 52/95, roughly 55% of session structures had a similar shape, as classified visually), although there were also random-like population structures that differed from this general trend (14/95, about 15% of the sessions, Supplementary Fig. 4; the remaining 35% had intermediate levels of structuredness). In random-like population structures, the time points were uniformly distributed in a sphere. Sessions with a clear saddle-like shape were characterized by a larger number of neurons that were significantly coupled to at least one paw compared to sessions with an intermediate or low level of structuredness (μ = 23.6, σ = 14.7 vs. μ = 16.6, σ = 12.3 neurons, Mann–Whitney U = 797, n1 = 52, n2 = 43, p = 0.008 two-sided). To ensure that the saddle-like structures were not a simple artifact of the dimensionality-reduction method, we also performed time-shuffled, neuron-shuffled, and time-shifted control reductions5. These did not lead to any apparent structure (Supplementary Fig. 5).

To investigate the relationship between population structures and the corresponding behavior, we proceeded by manually labeling sessions in 500 ms snippets into six behavioral classes (stepping/paw movement, turning/head movement, drinking, grooming, rearing, and resting). These six classes included complex behaviors, recruiting the full body of the rat and the full sensorimotor cortex (Supplementary Fig. 6). We included all sessions with clear saddle-like shapes and with at least five significantly coupled neurons, which resulted in a total of 48 sessions (13 for Rat A, 16 for Rat B, 7 for Rat C, 6 for Rat D, 3 for Rat E, 3 for Rat F).

While each session contained at least some samples of each behavior, the occurrences of behaviors still differed considerably across sessions and rats (Supplementary Fig. 7). In contrast, the distributions of behaviors across the neural structures revealed clear similarities across rats, which was surprising assuming a sampling of approximately 0.005% of all neurons (ratio between the number of recorded cells and the estimated total number of cells, approximated for the area covered by the implanted electrodes by assuming a cortical thickness of 2mm and a density of 90k neurons per mm326) on average in only roughly overlapping recording sites (cf. Fig. 1b). For example, the second eigenvector (here: first dimension), the so-called Fiedler vector, clearly represented the difference between movement and rest (Fig. 2b left column). For some animals, a clear distinction between more paw-related (paw movement, rearing) and head-related behavior (head movement, drinking) was observable in the third and fourth eigenvector (here: second/third dimension, Fig. 2b right column). Although the position of a population vector in the LEM space is univocally defined by the instantaneous activity of all its composing units, and is relatively little affected by the activity of a single-unit, there is a relationship between the overall structure emerging in the LEM space when observing the totality of recorded data and the firing of neurons with high behavioral selectivity. While population vectors cluster in space due to the similarity between neuronal representations during a specific behavior, single-units with high selectivity for such a behavior will fire more strongly at that behavior’s cluster (Supplementary Fig. 8). Moreover, the distance in the LEM space between population vectors corresponding to two behaviors will increase with the number of units within the population that change their firing rate between the two behaviors (Supplementary Fig. 9a, b).

To quantify the separation between the neuronal representations of the six identified behaviors in the LEM space, we trained a neural network based on the first 10 dimensions of the population vectors. We chose 10 dimensions because we found the mean dimensionality in the LEM space to be μ = 8.6, σ = 1.1 (see Methods). By choosing a slightly higher value than the mean dimensionality, we added a small safety margin to ensure the inclusion of all relevant dimensions. The neural network correctly classified behaviors more frequently than by chance (mean per-class accuracy μ = 47.1%, σ = 9.6%; chance level 16.66%, Fig. 2c). The accuracies were correlated to the number of significantly coupled neurons (n = 48, Pearson’s ρ = 0.59, p < 1e − 5, Fig. 2d), the total number of units (Pearson’s ρ = 0.54, p < 1e − 4, Supplementary Fig. 10a), and the signal-to-noise ratio (SNR) averaged over units (Pearson’s ρ = 0.49, p < 0.001, Supplementary Fig. 10b). Common classification mistakes consisted of confusing rearing or turning with stepping, as well as turning with drinking or resting (Fig. 2e). At the single-unit level, these behavioral classes shared, in fact, the highest number of selective units (Supplementary Fig. 9c, d) in each of the three recorded regions (Supplementary Fig. 9e). Moreover, we observed the lowest accuracy for Rat D, Rat E, and Rat F. These rats had a low mean SNR (Rats E–F, Supplementary Fig. 10b) or no electrode coverage of posterior areas (Rats D and F, cf. Fig. 1b). This last aspect made us hypothesize that more posterior regions are primarily involved in the encoding of behavioral classes. To test this hypothesis, we investigated the influence of the different sensorimotor areas on neural population structures. Thus, we conducted dimensionality reductions with equal numbers of neurons (i.e., 20 randomly chosen units) from M2, M1, or S1 as input. With this subset, we trained artificial neural networks to decode the behavioral classes with the neural activity in a given area reduced to five dimensions as input. The decoding accuracies for M1 were significantly better than those for M2 (Wilcoxon signed-rank test W = 155, n = 41, p < 1e − 5 two-sided) and those for S1 (W = 335, n = 42, p = 0.009). In total, the accuracies were highest in M1 for 28 out of the 48 sessions, compared to 15 for S1 and 5 for M2 (Fig. 2f, accuracies μ = 25.8%, σ = 4.9% in M2, μ = 28.6%, σ = 5.3% in M1, μ = 26.7%, σ = 5.4% in S1). The low relevance of anterior sensorimotor regions is in line with the STAPSSS results, as well as with the lower decoding accuracies in Rat D and Rat F.

### A cross-session polytope comparison reveals similarities in the average encoding of behavioral classes across animals

The visual similarity between neural population structures in three dimensions (cf. Fig. 2a, b) led us to wonder whether correspondences between the full dimensional structures could be quantified. Such similarities become more apparent when reducing the extended manifolds to polytopes—high-dimensional polyhedra—with vertexes defined by the average population vectors associated with the six behavioral classes (Fig. 3a). This encouraged us to systematically investigate whether population activities during unconstrained movements contained structures that were conserved across recording sessions or even across different animals. We excluded Rat F from all of the following analyses because of low recording quality, which may reflect the long delay between implantation and measurements compared to the other rats (see Supplementary Table 1).

Polytopes are useful tools to facilitate the visualization of the complex structure associated with the neuronal representations of the six identified behaviors and their reciprocal distance. To test whether such distances were preserved across sessions and animals above-chance level, we first tested whether behaviors associated with similar population vectors in one session corresponded to behaviors with similar population vectors in other sessions. For example, if in one session the population vectors during turn and step are similar to each other but dissimilar from those during rest, we wondered whether the same relationship can be found in other sessions as well. More formally, for each pair of sessions v and w and each behavioral class i, we ranked the remaining classes by the Euclidean distance between their average population vector and the average population vector of class i. If v and w had the same polytope structure, the rank associated with each behavior would be identical. We quantified the similarity between ranks across sessions with the statistic $${s}_{i}^{vw}$$, defined as the number of concordant ranks, and compared its distribution with that obtained from bootstrapping (see Methods for details). We found significant similarities across sessions, both when computing distances in the high-dimensional recording space (Kolmogorov–Smirnov test, p < 1e − 39) and in the reduced LEM space (p < 1e − 73) (Supplementary Fig. 11a, b). This finding confirms that the obtained result was not an artifact of the dimensionality-reduction procedure. Moreover, to ensure that such significance did not depend exclusively on the enhanced distance between the “rest” class and any other classes, we repeated the analysis with “rest” excluded from the accounted classes (p < 1e − 21, Supplementary Fig. 11c).

We performed a second test to compare the overall conservation of relative distances among the population vectors associated with different behaviors at the single-session level. We captured the differences between the cross-behavioral distance matrices of two sessions with the Jeffries–Matusita metric and compared them with the bootstrap distribution obtained by shuffling the behavioral labels (see Methods for details). This was done for each pair of sessions within and across animals, both in the high-dimensional recording space and in the LEM space, and by excluding the “rest” class from the test (Supplementary Fig. 11d, Fig. 3b, and Supplementary Fig. 11e, respectively). In all of these cases, the similarity between the polytopes of different sessions and animals was above chance for most session pairs, and non-significance often occurred for animals with a low SNR (cf. Supplementary Fig. 10b). Finally, we wondered whether the observed similarity between polytopes might have been induced by the sheer presence of somatotopy along the sensorimotor cortex. In fact, in the case of behaviors selectively engaging different parts of the sensorimotor cortex, somatotopy would generate a coarse-grained covariance structure between somatotopic regions that is similar across animals. However, the spontaneous behaviors considered here are complex and engage the full sensorimotor cortex (cf. Supplementary Fig. 6). This makes such a scenario unlikely. Nevertheless, to rule out this hypothesis, we divided the recorded units according to their somatotopic region (Supplementary Fig. 11e) and repeated the analysis from Supplementary Fig. 11d while shuffling the identities of the units recorded in the same somatotopic region across behaviors (i.e., the identities of the neurons recorded in the same somatotopic region were shuffled when building each of the average population vectors representing the six considered behaviors). This destroyed the covariance of neuronal activity across behaviors at the single-unit level, but maintained both the covariance of neuronal activity across behaviors at the somatotopic-region level, and the average firing rate of the different somatotopic regions. With the shuffled data, no session pair comparison reached an average p-value below 0.05 (Supplementary Fig. 11g), demonstrating that the obtained results on polytope similarity across animals were not the byproduct of somatotopy.

### Cross-subject and cross-session decoding

Polytopes capture the distance between the average neuronal representations of the different behaviors but neglect their shape and extension on the manifold. Encouraged by the similarities observed among the polytope structures of different recording sessions, we decided to perform a stronger test and attempted cross-subject decoding. Cross-subject decoding requires not only an agreement between the average representation of behaviors but also accounts for the variability in neuronal representations associated with each behavior. While it is impossible to find a direct correspondence at the single-neuron level across animals, similarities in lower-dimensional population structures can be used for cross-subject and cross-session decoding (Fig. 4a). For the decoding analysis, we divided the six behavioral classes (Fig. 4b) into two disjointed sets: one “alignment set”, which was used to align the neural structures, and one “decoding set”, which was used for training and testing a classifier. Thereby we ensured that no class was used for aligning the structures and classification in the neural space at the same time. The mean neural vectors (four dimensions) corresponding to the behavioral classes in the alignment set were used to compute a Procrustes transformation between two sessions to align the population activity structures27,28 (Fig. 4c). Procrustes transformations involve translation, scaling, reflection, and rotation and thus preserve the shape of a set of points. For decoding, we trained a classifier on samples from the decoding set of one session for a single rat using the activity in the dimensionality-reduced neural space as input. Then, we tested the generalization on another session of the same rat (cross-session decoding) or another rat (cross-subject decoding) (Fig. 4c). Notably, the samples of the decoding set of the two tested sessions were not used for computing the Procrustes transformation. In the first experiment, the alignment set consisted of four behavioral classes, with two other classes remaining for the decoding set. This resulted in a total of 15 possible splits into two sets. Classifiers trained on highly decodable sessions also successfully generalized to other sessions from the same or other rats (Fig. 5a–c and Supplementary Fig. 14a–b). In the generalization matrix (Fig. 5a), 13.88% of the generalization results (275 out of 45*44 = 1980) had a mean per-class accuracy higher than 60%, and 59 higher than 65%. In the set of sessions with the highest 10% signal-to-noise ratio (SNR > 4.33), the mean per-class accuracy in the generalization task was μ = 59.7%, σ = 5.4. The best-performing sessions included sessions of Rats A, B, and C with sufficient recording quality and a sufficiently high number of units for a robust estimation of the underlying population structures (Fig. 5d). Additionally, the correlation between within-session and between-session accuracies was high (Fig. 5c, n = 45, Pearson’s ρ = 0.68, p < 1e − 6). We defined the “generalization accuracy” of a session as the average test accuracy across all sessions (mean value per row of Fig. 5a first matrix). These generalization accuracies were correlated to the total number of units (Pearson’s ρ = 0.38, p < 0.01), with a higher number of units leading to a better estimation of the population structure. The generalization accuracies were also correlated to the session length (Pearson’s ρ = 0.37, p < 0.05) since the number of samples used for LEM (which included only time points with sufficient activity) varied across sessions and rats. Finally, the recording quality—namely, the SNR averaged over units—was correlated with generalization (Pearson’s ρ = 0.38, p < 0.01). Particularly, Rats A and B, which performed best in the generalization, had both a high SNR and a high total number of units (Fig. 5d).

### Decoding is robust to methodological and class-selection changes

To evaluate whether our results depended on a specific dimensionality-reduction method, we repeated the analysis using Isomap, another non-linear dimensionality-reduction method, and principal component analysis (PCA). Also Isomap revealed neuronal structures that were comparable across subjects (Supplementary Figs. 12 and Fig. 5a second matrix). In contrast, linear methods such as PCA were not powerful enough to extract these neuronal structures, as they search for projections in the directions of largest variance (which can be biased by outliers) and are not optimized to maintain the local neighborhood structure of the high-dimensional neural space (Fig. 5a third matrix); the LEM results were significantly better than those from data reduced using PCA (Wilcoxon signed-rank test W = 167978303, n = 30375, p = 0 two-sided). These results highlight that the activity of the sensorimotor cortex during spontaneous behavior evolves over a non-linear manifold embedded in the original high-dimensional neural space.

For a more systematic test of the relationship between the number of units and generalization, we took all sessions with a generalization accuracy of at least 55% (a total of 19 sessions from Rats A, B, and C) and conducted an ablation study with LEM reductions on the reduced number of units (20, 40, 60, and 80 units removed per session). We then repeated the generalization experiment on the aligned LEM structures. The accuracies steadily decreased with fewer units (Supplementary Fig. 13), confirming the high relevance of the number of units for a robust estimate of the population structure.

To determine which sensorimotor areas were most relevant for generalization, we again took the 19 best sessions and conducted LEM reductions after removing M1, S1, or M2. Additionally, for a fair comparison, we removed a random portion of all of the other units for underrepresented areas such that the number of units after the removal of M1, M2, and S1 neurons remained constant across sessions. Generalization accuracies on aligned LEM structures decreased considerably after the removal of M1 (accuracies μ = 53.9%, σ = 7.5%); these values were slightly but significantly lower than after the removal of units from S1 (accuracies μ = 55.9%, σ = 7.7%, Wilcoxon W = 5497776, n = 5415, p < 1e − 56), M2 (accuracies μ = 55.0%, σ = 8.3%, W = 6341443, p < 1e − 17), or of the same number of units distributed over all areas (accuracies μ = 55.6%, σ = 7.8%, W = 5659147, p < 1e − 47).

In a second experiment, we used only three classes in the alignment set and the three remaining in the decoding set to test the generalization under more difficult conditions, resulting in 20 possible splits of the six classes in total. The general pattern of the generalization matrix stayed the same (Supplementary Fig. 14c–d). To verify that the classifiers did not only learn to discriminate the simplest difference—the difference between rest and movement—we conducted another experiment without the class “rest”. Although the accuracies were lower in this setting, the general pattern remained the same (Supplementary Fig. 14e–f). To assess the relevance of the alignment of neural structures, we also tested the generalization on neural structures without explicit alignment as a control. In most cases, the accuracies on aligned structures were much higher than those on unaligned structures (Supplementary Fig. 15).

To further explore our results, we performed control classification experiments in which we examined the generalization on shuffled data (cf. Supplementary Fig. 5). Accuracies of shuffled data were significantly lower than those computed in the original LEM space (Wilcoxon W = 116677979, n = 30375, p = 0 when comparing with neuron-shuffled, W = 115385202, p = 0 with time-shuffled, and W = 123943794, p = 0 with time-shifted data) and did not exceed the chance level (Supplementary Fig. 16). Furthermore, we computed LEM reductions in the non-binarized neural space and repeated the generalization experiment. Also in this case we could find significant generalization for multiple sessions, but with accuracies lower than when the analyses were performed on binarized spikes (Supplementary Fig. 16d, W = 186661468, p < 1e − 180). This is probably because the binarization acts as a form of normalization that reduces the impact of overall firing rate differences between neurons, and could suggest that similarities are best captured by the differentiation between sub-groups of units that are co-active during the different behaviors than by exact single-unit firing rate modulations.

## Discussion

In this study, we investigated single-neuron activity as well as population activity patterns in the rats’ sensorimotor cortex during unconstrained and self-paced behavior. The behavior was as closely related as possible to naturally occurring behavior, as it was based on foraging, but it was still performed in a limited arena to allow for reliable movement tracking. The first analyses were sanity checks to validate our approach of studying freely moving animals without a clear trial structure. Based on the chosen measure, STAPSSS, 54% of all neurons were significantly coupled to paw movements. This fraction of coupled neurons is in the range of previously reported numbers. For example, 60% of neurons in the hindlimb motor cortex reacted to different locomotion scenarios29, and 44% in M1 were body-coupled in freely moving rats25. Our multi-side recording approach allowed us to comprehensively test for differences in neuronal activity across the entire sensorimotor cortex. Previous research has found that the laterality of forelimb representations increases from M2 to M1 in a pedal task for head-restrained rats30. Here, we extended this laterality gradient to more posterior regions—in particular, S1. As we targeted the output layer of the cortex (layer V), we putatively biased our recordings toward pyramidal tract neurons, which have been described as being predominately involved in laterality30.

While the above-described findings refer to the general features of the sensorimotor cortex, the main finding of our study was based on conserved neuronal population structures. Experimental, computational, and theoretical work has identified a rich structure within the coordinated activity of interconnected neural populations in movement control, decision-making, and memory tasks. These findings are conceptualized within the framework of neural population dynamics, which can reveal general motifs2. Recurrent neural networks (RNNs) can be applied to neural data to reveal structural and geometric properties31. Multiple tasks can then be represented in different RNN models. In these networks, some clusters of units have been identified as specialized for subsets of tasks1. Alternatively, methods such as PCA and its variants dPCA and jPCA have been applied to identify the stability of motifs across modalities such as arm and speech control3, as well as within and across brain areas4. In contrast to the previously described studies, we focused on the existence of conserved neuronal structures across animals without any clear instructed task line but with several behavioral classes. These two points differentiate our study from previous publications in the field. We investigated population activity patterns, which are commonly assumed to reside on low-dimensional manifolds in the full neural state space14,16,32,33,34. In contrast to the (globally) linear method like PCA that most studies have used4,12,15,16,17,18,20,35, we assumed the preservation of local neighborhood relations in the data. Therefore, we employed LEM5,22 to reveal the presumed preserved low-dimensional structures. Remarkably, neuronal population activity during unconstrained behavior contained similar structures across animals and sessions, already visible in the first three dimensions. Furthermore, the distribution of different behaviors across low-dimensional neural structures was systematic, which we confirmed with our above chance, within-subject decoding results. The allocation of different behaviors on the population structures revealed strong similarities across rats. Particularly, movement and rest could be clearly visually distinguished in the first dimension. This is in line with results on clear separations in the neural state space for output-potent and output-null (e.g., preparatory) neural activity12,20.

To support our main claim that low-dimensional neural manifolds are comparable across sessions and animals even in the case of unconstrained behavior, we first showed with our polytope analysis that the relative positions of the neural representations associated with different behavioral classes were conserved across animals and sessions above chance. The analyses of the polytope structures compared the distance between the average neuronal representations of behaviors, neglecting their precise spatial extension on the manifold as determined by the variability in neuronal representation of each behavior. Conversely, cross-subject decoding was also affected by such variability and, therefore, tested an even stronger degree of similarity. Ultimately, we evaluated the performance of a classifier trained on the neuronal activity of one subject to predict the behavior of another based on its own neuronal activity; this provided us with a proxy to experimentally test and quantify the degree of universality of neural representations across subjects. Since the neuronal state space of different subjects cannot be directly compared (given the difference in number and identity of the recorded neurons), applying dimensionality reduction and alignment was necessary to achieve this goal. A simple, supervised, shape-preserving alignment procedure—namely, a Procrustes transformation between mean population vectors for different behavioral classes in the dimensionality-reduced neural space—sufficed for successful cross-subject generalization in a decoding task with distinct but related behavioral classes. Our procedure was applicable to sessions with sufficient recording quality (indicated by a high SNR of the recorded units) and enough units for robust population estimation. Further, the generalization accuracies of the sessions were closely related to their within-session accuracies. Generalization was considerably worse for population structure estimates based on fewer units. In line with the within-session decoding results, we also found that generalization significantly decreased after the removal of M1, which indicated consistent population responses especially in this area. The low relevance of the anterior motor cortex to information regarding behavioral categories is in line with our STAPSSS results. Nevertheless, in contrast to the encoding of paw movements, our results on the population decoding of the higher-level behavioral categories hinted at major contributions from M1, not only S1. Thus, our results close a gap in a previous study that investigated postural and behavioral encoding in the posterior parietal cortex and M236. While we mostly used LEM as a dimensionality-reduction method given its solid theoretical basis, we also showed that another non-linear dimensionality-reduction method, Isomap, can be used to reveal neural structures that are comparable across subjects.

A shared structure in neuronal activity across subjects has been shown mostly in fMRI studies, where even between-subject classification has been demonstrated27,37,38,39,40,41. While these works have focused on watching movies (an activity that can be conducted similarly for different subjects), EEG and EMG cross-subject decoding has been shown for hand movements42,43. In rodents, related work has shown the cross-subject decoding of odor sequences in the orbitofrontal cortex44 and place-cell activity in the hippocampus45. Here, the presentation of external stimuli and the presence of a precise trial structure might facilitate the detection of an operational manifold common to all animals. In contrast to these studies, we showed cross-subject classification in a more complex case where rats roamed freely without training or a trial structure in the underlying task. Therefore, our main finding of shared neural structures is consistent with recent findings but also extends them to more complex, less constrained behavior. To our knowledge, this is the first time that the conservation of neural structures across animals and for distinct, spontaneous behavioral classes has been shown. This finding implies that conserved neuronal structures occur without training. Therefore, the neuronal computations underlying these structures might be similarly realized across individuals, either from birth or during development.

The similarity of neural population structures in the sensorimotor cortex is of great relevance for the development of BCI systems, which are built to aid physically disabled people and, thus, often target the sensorimotor cortex46. Essentially, BCI systems aim to map neuronal activity to targeted movements by first transforming the noisy and non-stable recorded single-unit neuronal activity (which strongly varies across a longer recording time span) to a, supposedly, more stable lower-dimensional space, and then using a linear or non-linear decoder mapping to movement classes or trajectories. Gallego et al. showed that in monkey sensorimotor cortex, it is possible to find an alignment between low-dimensional manifolds during cursor tasks, even when they are reconstructed from recordings interspaced by long periods of time35. Following these lines, the topic of the alignment of low-dimensional neural manifolds across time for the long-term stability of BCI systems has recently gained focus, fostering the development of new analytical techniques47,48,49,50. Once the reconstructed manifolds are aligned to a common frame, the decoder mapping between neuronal activity and movement classes can be fixed in time, which diminishes patients’ discomfort caused by the constant re-training of the BCI system. While progress has been made in the alignment of manifolds across time for a single subject, the cross-subject generalization of neural motor patterns has not been shown so far. If the neuronal manifolds of the sensorimotor cortex were subject-specific, BCI systems would need to learn the mapping from the neuronal manifold to the movement space from scratch for each subject. For deep artificial neural networks, which, today, are often used as the most powerful non-linear mapping methods, the training phase is computationally expensive, often requiring hours of GPU training51; in contrast, once a network is trained, it can be used in the prediction phase with low computational effort. In this manuscript, we demonstrate that the neuronal manifolds of the sensorimotor cortex associated with spontaneous behavior are not only stable in time but are also conserved across animals. This implies that BCI training time can be substantially reduced by pre-using data from other subjects, in which case only a short fine-tuning phase would be necessary. Our work provides a proof-of-principle for the cross-subject generalization of complex self-paced behaviors that involve the whole body, with recordings of a few single-units (in the range of 100–200 neurons per session) from only partially overlapping areas of the sensorimotor cortex provided as data. While it has been hypothesized that cross-individual decoding might not be possible with increasing task complexity5, our results indicate that even during unconstrained behavior, the relationships among neural activity patterns are conserved across different animals. This conservation of population-level neural phenomena provides a foundation for cross-subject decoding, even in the difficult case of unconstrained behavior.

## Methods

### Animal surgery

We implanted six male Long Evans rats at the age of eight weeks with 22 tungsten electrodes (200 to 600 kOhm impedance, polyimide insulation, WHS Sondermetalle, Grünsfeld, Germany) at a 1.2 mm implantation depth in each hemisphere (implantation: January 2017 for Rat F, April 2017 for Rats A–E). Recordings were taken in June-August 2017 (see Supplementary Table 1). The relatively long implantation time (perfusion in November 2017), as well as the amount of electrode wires, led to tissue growth around the electrodes and changes in the original implantation depth. We, thus, cannot provide histological pictures. Electrode locations spanned from –2 to +5 mm in the anterior/posterior direction and from 1 to 4 mm in the lateral/medial direction. This resulted in three medial–lateral rows of six electrodes each, plus one row of four electrodes (see Fig. 1b). More information regarding the procedure can be found in ref. 52. The Regierungspräsidium Freiburg, Abteilung Landwirtschaft, Ländlicher Raum, Veterinär- und Lebensmittelwesen approved all animal procedures.

The rats were kept water-restricted for the time course of the experiments (free access to water for two days per week). For the experiments, the rats moved unconstrained on a mesh of 30 × 40 cm in a closed arena. Every 10 to 30 s, a waterspout pseudo-randomly positioned by two servo motors released a drop of water onto the mesh, which the animals could find and consume. To prevent the rats from merely following the movements of the waterspout, we included dummy movements that were not followed by a release of water. Even experienced animals were not able to predict the position of water drops without an active search, and the animals did not find all water drops throughout a session. This task has been previously described52. Here, we only used part of the dataset discussed in ref. 52; in particular, we only included sessions with a minimum duration of 30 min.

### Data acquisition and the preprocessing of extracellular recordings

Extracellular signals were recorded at 30 kHz and band-pass filtered, amplified, and digitized using a head stage (Intan Technologies, Los Angeles, California) situated at the head of the animal. Spike sorting was conducted on high-pass filtered signals (cut-off at 300 Hz) separately for each electrode. Spikes were defined as amplitude threshold crossings of four times the standard deviation of the signals. For each spike, we extracted the window of −0.5 to 2 ms around the peak amplitude (resulting in 76 values per spike). Spike sorting consisted of two phases for each unit. First, a seed spike was estimated. This was accomplished by calculating the spike neighborhoods (spikes within the average noise level, half a millisecond before the spike, across all units) for 500 randomly chosen spikes. The spike with the most neighbors was chosen as the seed spike. Second, we optimized the spike waveform through an iterative procedure. This was done by alternating the calculation of a new noise level for the neighboring spikes, the update of the neighborhood (spikes within the new noise level), and the update of the average waveform. This iterative procedure ended when the neighborhood assignments remained constant. The algorithm proceeded with the remaining spikes by choosing a new seed spike. For our single-unit analysis, we only kept single-units according to the distribution of inter-spike intervals. single-units with a firing rate lower than 0.1 Hz were not included in the analysis. Two cameras (Stingray, F033C IRF CSM, Allied Vision Technologies) positioned below the mesh tracked the movements of the colored paws. The videos were taken with a frame rate of 80 Hz and smoothed with a Gaussian filter before analysis.

### Single-unit STAPSSS analysis

Paw movements were labeled as “swing” for horizontal velocities higher than 0.3 mm per 10 ms (the bin size we used for our analysis) and “stance” otherwise. Spikes were also binned with a bin size of 10 ms. For each neuron and each paw, we defined the spike-triggered average paw swing–stance status (STAPSSS) as the behavioral average over all windows ±1 s around the spikes. We normalized each STAPSSS waveform by the mean. We defined the paw coupling of a neuron as the ratio of the standard deviation of the STAPSSS waveform to the statistical control standard deviation. The latter was defined as the 0.99 quantile standard deviation of a distribution constructed out of the standard deviations of the STAPSSS waveforms of 1000 randomly shifted spike trains. If a neuron was not related to a paw’s movement, its STAPSSS waveform would be flat and its standard deviation would not exceed the control standard deviation. We defined the contralateral bias as the ratio of contralateral to ipsilateral paw coupling. Statistical analyses were done using the anovan, multcompare, and ttest Matlab functions. The ANOVA tests always included the rat’s ID as an additional factor. The analyses of this manuscript were performed with Matlab 2019a, Python 3.6, pandas 0.24.1, matplotlib 3.0.0, numpy 1.14.5, scikit-learn 0.20.0, scipy 1.1.0, and tensorflow 1.10.0.

### Decoding from spike trains

We used fully connected neural networks with three hidden layers of 500 units each for decoding. The networks’ inputs were the Gaussian-smoothed (σ = 20 ms) binned spikes in ±400 ms, resulting in 81 input bins for each neuron. In contrast to the STAPSSS analysis, where only single-units were considered, we used all units as input for decoding. Each session was split into training, validation, and test sets (70/15/15%). Two of the 106 sessions were excluded from decoding because of insufficient data. Training was conducted with the Adam optimizer53, batch size 64, and an initial learning rate of 0.0001. A dropout rate of 75%, L2 regularization (λ = 1e − 4), and early stopping were applied to prevent overfitting. To deal with class imbalance, we used weighted cross-entropy loss to put more weight on the less frequent class (swing). The reported accuracies were mean per-class accuracies. The decoding accuracies of the deep neural network were significantly better than a baseline linear classifier (two-sided paired t-test, t = 6.55, p < 1e − 8). For the baseline, we used a logistic regression with three-fold cross-validation of the L2 regularization strength on the concatenated training and validation sets. The test sets for each session were the same as for the artificial neural network. Class weights were adjusted to be inversely proportional to class frequencies, as for the artificial neural network. The artificial neural network was implemented in Tensorflow. For the linear baseline, we used Python’s scikit-learn function LogisticRegressionCV.

### Dimensionality reduction

We used LEM5,22, an unsupervised non-linear dimensionality-reduction method, to investigate the low-dimensional structure of population activity. For each session, spike counts were binned in 100 ms bins and then binarized (1 for at least one spike per bin, 0 for no spikes). Single and multi units were used. Only time points with at least 15 active units were retained. Since we restricted further analysis to sessions with at least 5000 valid time points, we considered only 95 of the 106 sessions. For each session, we constructed an unweighted, mutual kNN graph based on the Hamming distance on the columns of the n × t matrix (n units, t time points). Our code for LEM was built on recent work5. Two iterations of the LEM algorithm were performed. However, in contrast to Rubin et al., we used the Hamming distance in the first iteration and reduced to 20 dimensions. In the first iteration, we used 0.5% of the time points as neighbors; in the second, this parameter was set to 7.5%. Furthermore, we applied a random walk normalized graph Laplacian instead of the symmetric normalized graph Laplacian, as proposed in a previous study54. In detail, we constructed the unnormalized graph Laplacian as L = D − W, with D as the diagonal degree matrix and W as the adjacency matrix of the kNN graph. Solving the generalized eigenvalue problem Lv = λDv corresponded to finding the first eigenvectors of the random walk normalized graph Laplacian Ln = D−1L54. Since the eigenvector corresponding to the smallest eigenvalue (zero) is constant, we discarded the first dimension of the LEM for all analyses and decoding studies. The other LEM eigenvectors (i.e., dimensions) were ordered by eigenvalue magnitude—that is, the “splitability” of the time points in different clusters (i.e., the dimensions that best divided the time points into clusters came first). For the LEM reductions on units from different sensorimotor areas, we randomly chose 20 units from each area as input (if fewer than 20 units for an area were available, the analysis was omitted). We chose to reduce to six dimensions in the LEM space, leaving us with five dimensions for decoding with deep neural networks (as mentioned above, the first dimension of the LEM must be discarded). For the ablation study on sessions with 20, 40, 60, or 80 units removed, we reduced to 20 dimensions in the first two and 10 dimensions in the second two cases (in these latter cases, we did not have enough neurons left to retain high dimensionality in the LEM space). For the study on LEM reductions after the removal of sensorimotor areas, we removed $${n}_{{{{{{\mathrm{max}}}}}}}=\max$$(#M1, #M2, #S1 units) from each area for each session. For underrepresented areas, we additionally discarded nmax − narea randomly chosen units. As before, given the lower number of neurons, we reduced to 10 dimensions. To investigate the dimensionality of the LEM space using the method of ref. 5, we computed the average number of neighbors of all time points in the 20-dimensional LEM space in circles with increasing radii. The dimensionalities were then obtained as the slope of a line around the steepest point in a log–log plot of neighbors against radii.

For the dimensionality reduction with Isomap, we used Landmark–Isomap55, which is more efficient for very large datasets. We set the number of neighbors to 0.5%, as for the LEM, and used 10% of the time points as landmarks. PCA reductions where computed on non-binarized spikes.

### Behavioral labeling

We used the freely available tool MuViLab for the behavioral labeling of the videos. Two human annotators who were blinded to the neural data manually labeled the 48 sessions divided into 500 ms snippets. The 48 sessions were chosen based on them having a clear saddle-like shape and at least five significantly coupled units: Rat A—13 sessions recorded between 2017/06/08 and 2017/08/03, Rat B—16 sessions recorded between 2017/06/01 and 2017/08/21, Rat C—seven sessions between 2017/06/01 and 2017/06/29, Rat D—six sessions between 2017/06/08 and 2017/07/11, Rat E—three sessions between 2017/06/08 and 2017/06/22, and Rat F—three sessions between 2017/06/07 and 2017/06/30. The criteria for the behavioral classes were as follows: Step—the rat moved at least one paw but did not drink or rear at the same time; turn—the rat moved its head; drink—the rat drank from the spout or collected water drops from the mesh with its mouth; groom—the rat performed typical grooming movements; rear—the rat stood on its hind paws; rest—the rat showed no obvious movements. In rare cases, samples were excluded from labeling when the behavior of the rat was not visible because it was located near the borders of the arena. Examples of the different behaviors can be found at https://gin.g-node.org/optophysiology/Conserved_structures_cortex.

### Single-unit behavioral coding

To establish the single-unit coding of a specific behavior or stimulus, it is common practice to compare the average firing rate of the unit prior to the event (baseline) and after it (response). In the case of self-initiated behaviors, however, it is difficult to unambiguously identify temporal windows that can be associated with a baseline or response. Thus, we tested whether a unit increased its firing rate during each of the six behavioral categories and compared this rate to the unit’s firing during the remainder of the recorded time. The test was performed using a Wilcoxon rank-sum test with Benjamini–Hochberg correction for multiple comparisons and α = 0.05.

In a second analysis, we aimed to compare the diversity in single-unit firing rates during two behaviors with the distance in the LEM space of the population vectors associated with such behaviors. To obtain the number of single-units that changed their firing rates during different behaviors, we divided the spike counts (500 ms binning) of each unit according to the six behavioral classes and performed a Kruskal–Wallis test. When the main effect was significant, we performed a post-hoc analysis to selectively compare the unit firing rates during each pair of behaviors. Significance was fixed at 0.05. Since the final aim of this analysis was to compare the average number of units that changed rates with the distance in the LEM space of the population vectors associated with different behaviors, we did not want the unequal sample size of the behavioral classes to affect the significance of the post-hoc tests. Therefore, before performing the Kruskal–Wallis test, we randomly selected an equal number of samples (equal to the sample size of the smallest class) from all behavioral classes for each unit. We then repeated the test 100 times and computed the average number (first across the 100 samplings and then across the session’s units) of significant post-hoc tests obtained for each class comparison and each session. Supplementary Fig. 9a displays their average across sessions.

### Similarities among behavioral representations across sessions

To investigate whether the relative positions of the neural representations associated with different behavioral classes were conserved across sessions and animals, we computed the Euclidean distance between the average behavioral population vectors of a session, then tested whether these distances were more similar to those observed in other sessions than to what would be expected by randomly shuffling the behavioral state labels. This was performed by first comparing the ranked distances between the polytope vertexes and then comparing the actual distance values. For each session v, we computed the Euclidean distance $${D}_{ij}^{v}$$ between the average population vector pi and pj of all pairs of behavioral classes i and j. Then, for each behavioral class i, we ranked the remaining classes j according to their distance $${D}_{ij}^{v}$$ from i. For each pair of sessions v and w and each class i, we accounted for the similarities in ranked distances by defining the statistic $${s}_{i}^{vw}$$ as the number of classes matching the same rank in the two sessions. For the six behavioral classes, $${s}_{i}^{vw}$$ ranged between a maximum value of 5 (perfect match) to a minimum value of 0 (no match). The distribution of $${s}_{i}^{vw}$$ across all sessions was compared with a bootstrap distribution in which the same statistic, sboot, was computed over two random permutations of the numbers from 1 to 5. With six possible classes, there are 5! = 120 possible permutations of the remaining five classes, giving $$\left(\begin{array}{c}5!\\ 2\end{array}\right)+5!=7260$$ unordered pairs of random permutations. We thus used the Kolmogorov–Smirnov test to compare the distribution between the observed $${s}_{i}^{vw}$$ (n = 990 session pairs) and bootstrapped sboot (n = 7200) similarities.

The analysis described in the previous paragraph tested whether the distances between the pairs of behaviors (polytope vertexes) had a similar order (e.g., from the closest to the furthest) for different sessions or animals. To compare the actual distance values, we computed the matrix of pairwise Euclidean distances Dv between the average class population vectors pi in the LEM space. Then, for each other session w, we performed a Procrustes transformation to rescale the behavioral population vectors of w with those of v and computed the distance matrix Dw on the rescaled vectors. The Procrustes transformation did not affect the relative distance between vertexes but prevented differences in scale between the polytopes of different sessions from obscuring the quantity of interest. To quantify whether the set of relative distances between behavioral classes was, to some extent, maintained across sessions, we computed the difference between Dv and Dw as the Jeffries–Matusita distance $${d}_{JM}({{{{{{{{\bf{D}}}}}}}}}^{v},{{{{{{{{\bf{D}}}}}}}}}^{w})=\sqrt{{\sum }_{i,j}{(\sqrt{{D}_{i,j}^{v}}-\sqrt{{D}_{i,j}^{w}})}^{2}}$$, where i and j were indexes running over the six classes, and compared this difference with what we would obtain by chance. We employed the Jeffries–Matusita metric because it reduces the effect of outliers, but similar results were found with a Euclidean metric as well. The distribution dJM(Dv, Dw) obtained with the original distance matrices was tested against the bootstrap distribution $${{d}_{JM}({{{{{{{{\bf{D}}}}}}}}}^{v},{{{{{{{{\bf{D}}}}}}}}}_{b}^{w})}_{1...n{{{{{\mathrm{bootstraps}}}}}}}$$ obtained by randomly permuting the behavioral labels associated with the population vectors of the session w. For each session pair (v, w), we then compared dJM(Dv, Dw) with those obtained on the bootstrapped $${{{{{{{{\bf{D}}}}}}}}}_{boot}^{w}$$ and computed a p-value for the H0 of dJM(Dv, Dw) that was obtained by chance. The bootstrap sample included all possible class label permutations (n = 720). Figure 3b and Supplementary Fig. 11d show the significance of the comparison of each session pair when computed on all six behavioral classes, and Supplementary Fig. 11e shows the same but with the “rest” class excluded. Finally, to ensure that the observed similarity between the polytopes of different animals was not the byproduct of somatotopy but reflected the generalization of the behavior-specific covariance structure at a finer scale, we repeated the analysis described in the previous paragraph, shuffling the identities of the units recorded within the same somatotopic region. In particular, for each session, we first divided the recorded units according to their somatotopic region (adapted from refs. 56,57) as indicated in Supplementary Fig. 11f. Then, to compute Dv, we composed the average population vectors pi associated with each behavior, this time by randomly shuffling the identities of the units recorded in the same somatotopic region separately for each behavior. For example, after shuffling, if the third entry of the vector pstep corresponded to a unit x recorded in the motor region for the control of forelimbs, the third entry of prear was a unit recorded in the same region but possibly with different identity y. In this way, we removed the relation between the fine coding information of each behavior within somatotopic regions, while maintaining the region-specific average firing rate and the relation of the average region information between behaviors. We repeated the analysis for 500 random permutations of unit dispositions on the pi vectors. As in Supplementary Fig. 11d, for each permutation, we obtained a p-value following the same bootstrap procedure described above. Supplementary Fig. 11g shows pair comparisons with average p-values below (white) or above (black) 0.05.

### Population-level decoding

We trained one deep neural network per session to classify the six behavioral classes given the 10-dimensional neural data in seven bins with 100 ms each as input. The data was min–max normalized (min and max were only calculated on training sets). The deep network architecture and training were almost identical to the network used for the decoding task above. However, we used only 200 units per layer and a dropout rate of 25%, and we chose a cross-validation strategy to deal with unbalanced classes. In the latter step, the available data was split into four parts of equal size. Four runs were conducted per session, using two parts as the training set, one as the validation set for early stopping, and the fourth as a test set. The final test results were calculated as the mean over all four test sets and runs. As for the decoding of the swing–stance status, we used weighted cross-entropy loss (more weight on less frequent classes) to deal with the class imbalance. All accuracies that we report were mean-per-class accuracies (balanced accuracies) to ensure that more frequent classes did not bias the results. While we used 10 dimensions for this behavioral decoding task—in line with the estimated dimensionality—only five dimensions remained for the area-specific dimensionality-reduced data since the lower number of neurons did not allow for a reduction in a higher-dimensional LEM space. For the supervised alignment procedure, we always restricted the analysis to four neural dimensions to avoid underdetermination (that is, the remaining dimensions provided by LEM were not used—no completely new dimensionality reduction was computed). We used Matlab’s Procrustes function to find a transformation between class means. Proper transformation was important because of the sign ambiguity of eigenvectors, which might otherwise have led to different orientations of the neural structures. Before alignment, both neural structures were normalized to the 0–1 range. An SVM with a Gaussian kernel (Matlab fitcecoc) was used as classifier. Training was conducted with an equalized number of samples per class (i.e., the class with the fewest samples determined the number of samples taken from each class) and default parameters (kernel size 1). For the SVM classification, we did not use four-fold cross-validation as we did for the classification of neural networks (see above). Instead, we performed 20 repetitions with different samplings of the training set (Monte Carlo cross-validation).

### Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.