## Main

The idea of a CAN has become one of the most influential concepts in theoretical systems neuroscience13,14,15. A CAN is a network in which recurrent synaptic connectivity constrains the joint activity of cells to a continuous low-dimensional repertoire of possible coactivation patterns in the presence of a wide range of external inputs. Few systems are more suitable for analysis of CAN dynamics than the spatial mapping circuits of the rodent brain, owing to the continuous, low-dimensional nature of space, and the availability and interpretability of data from these circuits1,2,3,4,5,6. In medial entorhinal cortex (MEC) and surrounding areas, head direction cells16 encode orientation whereas grid cells2 encode position. CAN models conceptualize the neural representations of these variables as spanning periodic one- or two-dimensional (1D or 2D) continua on a ring17,18,19 or a torus1,8,9,10,11, respectively. In this scheme, activity within the neural network stabilizes as a localized bump when cells are ordered according to their preferred firing directions or locations in physical space. The activity bump may be smoothly translated along the network continuum by speed and direction inputs, or by external sensory cues.

In agreement with CAN models1,8,9,10,11, head direction cells16,20,21 and modules of grid cells4,5,6,7 maintain fixed correlation structures. In head direction cells, cell samples of a few dozen have been sufficient to demonstrate that the network activity traverses a ring22,23,24, but for grid cells, the number of possible locations in the two-dimensional state space has been too large for the topology of the manifold to be uncovered. Here we take advantage of recently developed high-site-count Neuropixels silicon probes25,26 to determine in many hundreds of simultaneously recorded grid cells whether, as predicted by two-dimensional CAN models8,9,10,11, the population activity in an individual grid-cell module resides on a toroidal manifold, independently of behavioural tasks and states and decoupled from the position of the animal in physical space. We focused on individual modules because (i) these are the unit networks of CAN models1,8,9,10; and (ii) topological analysis of multi-module representations would require even larger numbers of cells27.

## Visualization of toroidal manifold

We recorded extracellular spikes of a total of 7,671 single units in layers II and III of the MEC–parasubiculum region in freely moving rats with unilateral or bilateral implants (total of 4 recordings, in 2 rats with bilateral single-shank probes and 1 rat with a unilateral 4-shank probe; from 546 to 2,571 cells per recording; Extended Data Fig. 1). During recordings, the rats were engaged in foraging behaviour in a square open-field (OF) enclosure or on an elevated track, or they slept in a small resting box. Using a clustering-based approach, we identified six grid modules across all rats (4 recording sessions, from 140 to 544 grid cells per session; 7.8% to 25.6% of total number of cells; Extended Data Fig. 2a–d, g, h). Each grid module cluster contained a mixture of nondirectional (‘pure’) grid cells and conjunctive grid × direction cells28, from 66 to 189 grid cells per module (total pure and conjunctive grid cells; Extended Data Fig. 2g). We initially limited our analyses to the subset of pure grid cells because (i) the expected toroidal topology might be distorted by additional directional modulation; and (ii) detection of topology in conjunctive cells may require a larger number of cells than recorded here27.

To visually inspect the structure of the population activity of grid cells for signatures of toroidal topology, we constructed a three dimensional (3D) embedding of the n-dimensional population activity of a module of n = 149 pure grid cells (Fig. 1a). For this, we applied a two-stage dimensionality reduction procedure on the matrix of firing rates. First, to improve robustness to noise, we conducted a principal component analysis (PCA). We retained the first six principal components, which explained a particularly large fraction of the variance for all grid modules in the OF condition (with a similar tendency seen during sleep; Extended Data Fig. 4a). Next, we applied uniform manifold approximation and projection (UMAP) to reduce the six principal components into a 3D visualization. This visualization revealed a torus-like structure (Fig. 1b, Supplementary Video 1). Movement of the rat in the OF was accompanied by similarly continuous movement of the population activity across the toroidal manifold (Fig. 1b). When the activity of individual cells was plotted with reference to the 3D population representation, spikes for each cell were localized within a single patch of the population state space (Fig. 1c). The offsets between the firing locations of individual cells in the arena corresponded with the relative firing locations of the cells in the toroidal state space.

## Quantification of toroidal topology

Although the UMAP projection allowed a toroidal point cloud to be visualized, the method does not lend itself to straightforward quantification of the topology of the state space or comparison of representations across experiments. We therefore turned to the framework of persistent cohomology, a toolset from topological data analysis in which the structure of neural data can be classified by identifying holes of varying dimensionality in topological spaces assigned to point clouds of the cells’ firing rates22,23. In applying this toolset, we replace each point of the point cloud by a ball of common radius. The union of balls results in a topological space in which the number of holes of different dimensions can be counted. By increasing this radius from zero until all the balls intersect, we observe the lifetime of each hole—the range of radii from when the hole first appears until it disappears (see Extended Data Fig. 3C). The lifetimes of the holes are depicted as bars and the totality of bars referred to as the barcode. For a torus, the barcode must display four bars of substantial length: a 0D hole (a single component connecting all points); two 1D holes (describing circular features); and a 2D hole (a cavity; Extended Data Fig. 3B).

Persistent cohomology analyses allowed us to classify the shape of the six-dimensional representation that serves as an intermediate step in UMAP (Extended Data Fig. 3A). We constructed barcodes for each of the six individual modules of grid cells recorded in the OF arena (three modules from rat ‘R’, 2 from rat ‘Q’ and 1 from rat ‘S’, henceforth named R1, R2, R3, Q1, Q2 and S1). The barcodes showed clear indications of toroidal characteristics. For all six modules, we detected four long-lived bars representing a single 0D hole, two 1D holes and a 2D hole. Their lifetimes were significantly longer than the lifetime of any bar obtained in 1,000 shuffles of the data in which spike times were randomly rotated (Fig. 1e, f, Extended Data Fig. 6Aa; P < 0.001). The findings suggest that network dynamics during OF foraging resides on a low-dimensional manifold with the same barcode as a torus. We noted the appearance of additional short bars in the barcodes for all modules, but these are expected for toroidal point clouds27, as we confirmed with simulated data from several CAN models10,11 and point clouds sampled from idealized tori, which in each case exhibited similar features (see Extended Data Fig. 7).

## Tori persist despite grid distortions

The appearance of a torus in the point cloud, and the mapping of the activity of individual grid cells onto the torus (Fig. 1c), are consistent with a relationship between position in 2D physical space and position in the dimensionality-reduced neural state space. However, in many environments, this relationship may not be isometric, as the grid pattern is distorted by geometrical features of the environment, such as walls and corners29,30,31 or discrete landmarks and reward locations32,33. We thus asked whether such geometric features could similarly distort the toroidal organization of network activity in the point cloud. We tested rats on an elevated running track shaped like a wagon wheel with four radial spokes (‘wagon-wheel track’ (WW); Fig. 1d, f). Spatial autocorrelation analyses confirmed that the strict periodicity of the grid pattern was compromised in this task (Extended Data Fig. 2e, f). Despite these distortions of the grid pattern in individual cells, toroidal tuning was maintained in the transformed population activity (Fig. 1d). The persistent cohomology analysis continued to identify one 0D hole, two 1D holes and one 2D hole with lifetimes that substantially exceeded those of shuffled data (Fig. 1f, Extended Data Fig. 6Ab). We also determined how the neural population activity mapped onto the torus by calculating angular coordinates from each of the two 1D holes identified by the barcode (‘cohomological decoding’; Extended Data Fig. 5). The two angular coordinates defined directions intersecting at 60°, identifiable as a twisted torus (Fig. 2a). Consistent with CAN models, the vast majority of grid cells were tuned to a single location on the torus in each module and across environments, independent of geometry and local landmarks (Fig. 2b, Extended Data Fig. 4f, Supplementary Information).

To test how faithfully location in the environment is mapped onto the toroidal representation, we next asked whether grid-cell activity is predicted better by the cells’ tuning to the inferred torus than by their tuning to physical space. For five out of six grid modules in OF and four out of six in WW, the information content conveyed about position, in bits per spike, was higher for position on the torus than for position in physical space (Fig. 2c; R2, R3, Q1, Q2: all P < 0.001, W > 1,932 in OF and WW; R1: P < 0.001, W = 4,010 in OF, P = 0.586, W = 2,129 in WW; S1: P = 1.000 in OF and WW, W = 620 in OF, W = 129 in WW; Wilcoxon signed-rank test). We verified this difference by comparing the cross-validated prediction of two Poisson generalized linear model (GLM)-based encoding models of each cell’s activity that included toroidal position (decoded as above) and 2D spatial position. For both environments (OF and WW), the toroidal covariate was closer to a perfectly fitted model of the data than was the physical position covariate in five out of six grid-cell modules (Fig. 2d; R1, R2, R3, Q1, Q2: P < 0.001, W > 2,045 in OF and WW; S1: P < 0.001, W = 1,941 in OF, P = 1.000, W = 727 in WW; Wilcoxon signed-rank test). Together, these differences point to toroidal structure as the primary feature of the population activity of grid cells, superior to that of the 2D coordinates of the animal’s current position in the physical environment.

If grid cells operate on a toroidal manifold determined by intrinsic network features, this manifold may be expressed universally across environments, independently of sensory inputs. We tested this proposition by assessing, on the inferred tori, whether the locations of firing fields of different grid cells were maintained between OF and WW (Fig. 2b, Supplementary Information). To compare the toroidal parametrizations, we aligned the axes of the toroidal coordinates (Extended Data Fig. 5b). First, we compared, for each cell, the distance between the centres of mass of the toroidal rate maps in OF and WW (Fig. 2e, f, Extended Data Fig. 6Ba). This distance was substantially shorter (mean ± s.e.m. of mean distances for all modules: 31.5 ± 6.3 degrees) than that of control data in which the order of the rate maps in one environment was shuffled (135.8 ± 1.7 degrees; maximum possible distance √2∙180 ≈ 254.6 degrees; data versus shuffled: P < 0.001 in all modules). Second, we calculated the pairwise Pearson correlations of binned toroidal rate maps across the two environments (Fig. 2g, Extended Data Fig. 6Ba). Consistent with the centre-of-mass comparison, the correlations between OF and WW were higher in observed data (mean ± s.e.m. of mean r values for all modules: 0.79 ± 0.07) than in shuffled data (r = 0.01 ± 0.01; P < 0.001 for all modules). Very similar results were obtained when applying the toroidal parametrization from the same environment (either OF or WW) to activity from both environments (Fig. 2f, g, 16.0 ± 3.4 degrees; r = 0.95 ± 0.02; P < 0.001 for all modules and both mappings). Together, these findings suggest that physical space is mapped onto the same internal low-dimensional manifold irrespective of the specific environment.

## Toroidal topology persists during sleep

If population activity is mapped onto the same toroidal manifold independently of sensory inputs, the toroidal topology should also be maintained during sleep. To test this idea, the rats rested in a high-walled, opaque box placed in the centre of the OF or WW track. Periods of rapid-eye-movement (REM) sleep and slow-wave sleep (SWS) were identified on the basis of the low-frequency rhythmic content of the aggregated multi-unit activity in combination with prolonged behavioural immobility (Extended Data Fig. 9).

Persistent cohomology analysis of the sleep population activity suggested toroidal topology in five of the six grid modules during REM and four out of six modules during SWS (modules R2, R3, Q1 and Q2 for both sleep stages and module R1 only in REM; Fig. 3a, Extended Data Fig. 6Ac, d). In the remaining module (S1), there were no long-lived bars in dimensions 1 or 2 (Extended Data Fig. 6Ac, d), indicating an absence of toroidal structure during sleep, perhaps because of an insufficient number of cells in this module (72 cells; Extended Data Fig. 4e). The barcode results were supported by the toroidal mapping, which revealed sharply tuned firing fields on the REM and SWS tori (99.3 ± 1.6% and 99.1 ± 1.8%, respectively, of the grid cells in each module had higher information content than shuffled data, and in 95.3 ± 7.2% and 98.6 ± 2.4% of cells the toroidal tuning explained the activity better than a null model that assumes a constant firing rate; Fig. 3b, Extended Data Figs. 6C, 10, Supplementary Information). In addition, the spatial arrangements of toroidal firing locations of different cells were maintained between wake, REM and SWS states (Fig. 3c, Extended Data Fig. 6Bb, c). For between-condition pairs of rate maps, the mean distance (±s.e.m.) between the peak firing locations (OF versus REM 31.5 ± 15.4 degrees, OF versus SWS 29.8 ± 14.3 degrees) was well below the distribution of shuffled distances (Fig. 3d, Extended Data Fig. 6Bb, c; 135.8 ± 2.3 degrees in both REM and SWS, P < 0.001 for all 5 and 4 modules, respectively). Similarly, the mean correlations of pairs of toroidal rate maps (REM versus OF r = 0.80 ± 0.15, SWS versus OF r = 0.83 ± 0.12) were substantially larger than in shuffled versions of the data (Fig. 3e, Extended Data Fig. 6Bb, c; r = 0.01 ± 0.01 in both REM and SWS, P < 0.001 for all 5 and 4 modules, respectively). Thus, the toroidal structure is maintained in both sleep conditions, despite the lack of external spatial inputs.

## Classes of grid cells

We next investigated why toroidal structure was not visible during REM in module S1 and during SWS in modules R1 and S1 (Fig. 4a, Extended Data Fig. 6Ad). Previous studies of medial entorhinal spiking activity have described cell populations with distinct burst-firing and theta-modulation characteristics34,35,36; therefore, we asked whether a lack of toroidal structure was due to heterogeneity in the composition of the module. We quantified each cell’s temporal modulation characteristics using the spike train temporal autocorrelogram from the OF session, and by applying clustering to the matrix of autocorrelograms we obtained three cell classes (Fig. 4b). Each class was distributed across multiple modules (Fig. 4d). Within each module, cells from the three classes showed overlapping grid spacing and orientation properties (Extended Data Fig. 8a). We named the classes ‘bursty’ (B), ‘non-bursty’ (N) and ‘theta-modulated’ (T), following the most prominent autocorrelogram feature of each class (Fig. 4e). We also examined the spike waveforms of the cells, and found that each class showed a characteristic spike width (Fig. 4f, g), suggesting that they differ in morphology or biophysical properties.

The firing rates of the cells during SWS exhibited marked correlation structure within—but not between—classes (Extended Data Fig. 8b). Even though our classification strategy was not influenced by the cells’ directional tuning, class T contained 80% of all conjunctive grid cells and only 11% of all pure grid cells, supporting the idea that conjunctive grid cells are a distinct population. Accordingly, in modules R1 and S1, which contained the largest numbers of T cells, pairwise correlations of T cells’ spike trains were more strongly related to head-direction tuning than to toroidal tuning (Extended Data Fig. 8c). When we subdivided module R1 into the three classes (Fig. 4b), we found that during SWS toroidal topology was detectable only in B cells (Fig. 4c). By decoding toroidal position from B cells, we were able to recover the selectivity of each cell with respect to toroidal position in module R1 (Fig. 4h). The toroidal tuning locations were preserved between OF and SWS in each cell class in R1 (Extended Data Fig. 8d, B: distance of 26.4 ± 6.1 degrees and correlation of r = 0.85 ± 0.02, T: 43.6 ± 3.9 degrees and r = 0.74 ± 0.02, N: 29.9 ± 3.5 degrees and r = 0.80 ± 0.02; mean values of shuffled versions of each class were between 135.4 ± 5.2 and 136.4 ± 6.2 degrees, and between r = 0.00 ± 0.07 and r = 0.02 ± 0.03; comparison between observed and shuffled P < 0.001 for all 3 classes and both measures). However, in R1 as well as all other modules, toroidal spatial information and explained deviance were highest for B cells and lower for N and T cells in OF, REM and SWS (Extended Data Fig. 8e) (information content: P < 10−56, H > 255; Kruskal–Wallis test; P < 10−9, Z > 6.4; Dunn test with Bonferroni correction; explained deviance: P < 10−20, H > 96; Kruskal–Wallis test; P < 10−12, Z > 7.4; Dunn test with Bonferroni correction, for OF, REM and SWS). Collectively, these results show that the B cell population (containing the majority of our grid cells) represents the torus most robustly across behavioural conditions. The weaker toroidal representation in T cells may partly be an effect of the higher dimensionality of the code carried by conjunctive grid × direction cells. Indeed, running cohomology analysis on T cells from modules S1 and R1 (which contained the most T cells) revealed a circular feature that corresponded to the animal’s head direction (Extended Data Fig. 8f, g).

## Discussion

Our findings, from many hundreds of simultaneously recorded grid cells, show that population activity in grid cells invariably spans a manifold with toroidal topology, with movement on the torus matching the animal’s trajectory in the environment. The toroidal representation was most stably encoded by the bursting subclass of grid cells. Toroidal topology was not simply inherited from the encoded variable, as 2D space is not characterized by toroidal topology, as opposed to pitch and azimuth of head orientation, which in bats together span a torus and thus naturally map onto a toroidal neural code37. Using cohomological decoding, we were able to demonstrate, in each environment and in both sleep and awake states, that the toroidal coordinates of individual grid cells in individual grid modules were maintained, independently of external sensory inputs or environment-induced deformations of hexagonal symmetry in the rate maps29,30,31,32,33. The uniform and consistent toroidal structure of the manifold suggests that distortions in grid patterns occur in the mapping between physical space and the toroidal grid code rather than in the grid code itself.

The invariance of the toroidal manifold across environments and brain states is informative about the mechanisms that underlie grid-cell activity. Although toroidal topology can be generated by both CAN1,8,9,10 and feedforward12 mechanisms, the persistence of an invariant toroidal manifold under conditions that give rise to changes in the correlation structure of place-cell activity in the hippocampus6,7 is predicted only by CAN models. While the findings do not exclude co-existing feedforward mechanisms12,38, they point to intrinsic network connectivity as the mechanism that underlies the rigid toroidal dynamics of the grid-cell system. What kind of network architecture keeps the activity on a toroidal manifold—whether it is geometrically organized1,8,9,10 or acquired from random connectivity by synaptic weight adjustments through learning39,40,41—remains to be determined, as does the mode of connectivity with other CANs in the entorhinal–hippocampal system22,23.

## Methods

### Rats

The data were collected from three experimentally naive male Long Evans rats (Rats Q, R and S, 300–500 g at time of implantation). The rats were group-housed with three to eight of their male littermates before surgery and were singly housed in large Plexiglas cages (45 × 44 × 30 cm) thereafter. They were kept on a 12-h light–12-h dark schedule, with strict control of humidity and temperature. All procedures were performed in accordance with the Norwegian Animal Welfare Act and the European Convention for the Protection of Vertebrate Animals used for Experimental and Other Scientific Purposes. Protocols were approved by the Norwegian Food Safety Authority (FOTS ID 18011 and 18013).

### Electrode implantation and surgery

The rats were implanted with Neuropixels silicon probes25,26 targeting the MEC–parasubiculum (PaS) region. Two rats were implanted bilaterally with prototype Neuropixels ‘phase 3A’ single-shank probes and with one probe targeting MEC–PaS in each hemisphere; the third rat was implanted with a prototype Neuropixels 2.0 multi-shank probe in the left hemisphere. Probes were inserted at an angle of 25° from posterior to anterior in the sagittal plane. Implantation coordinates were AP 0.05–0.3 mm anterior to the sinus and 4.2–4.7 mm lateral to the midline. The probes were inserted to a depth of 4,200–6,000 µm. The implant was secured with dental cement. The detailed implantation procedure has been described elsewhere6,26. After surgery, the rats were left to recover for approximately 3 h before beginning recording. Postoperative analgesia (meloxicam and buprenorphine) was administered during the surgical recovery period.

### Recording procedures

The details of the Neuropixels hardware system and the procedures for freely moving recordings have been described previously. In brief, electrophysiological signals were amplified with a gain of 500 (for phase 3A probes) or 80 (for 2.0 probes), low-pass-filtered at 300 Hz (phase 3A) or 0.5 Hz (2.0), high-pass-filtered at 10 kHz, and then digitized at 30 kHz (all steps performed by the probe’s on-board circuitry). The digitized signals were multiplexed by an implant-mounted ‘headstage’ circuit board and were transmitted along a lightweight 5-m tether cable, made using either micro-coaxial (phase 3A) or twisted pair (2.0) wiring.

Three-dimensional motion capture (OptiTrack Flex 13 cameras and Motive recording software) was used to track the rat’s head position and orientation, by attaching a set of five retroreflective markers to implant during recordings. The 3D marker positions were projected onto the horizontal plane to yield the rat’s 2D position and head direction. An Arduino microcontroller was used to generate digital pulses, which were sent to the Neuropixels acquisition system (via direct TTL input) and the OptiTrack system (via infra-red LEDs), to permit precise temporal alignment of the recorded data streams.

### Behavioural procedures

Data were obtained from four recording sessions performed within the first 72 h after recovery from surgery. The recordings were performed while the rats engaged in three behavioural paradigms, each in a different arena within the same room. Abundant distal visual and sonic cues were available to the rat. On each day of recording, the rat remained continuously connected to the recording apparatus across the various behavioural sessions that were performed. Occasionally it was necessary to remove twists that had accumulated in the Neuropixels tether cable. In such cases, the ongoing behavioural task was paused while the experimenter gently turned the rat to remove the twists. During pre-surgical training, the rats were food-restricted, maintaining their weight at a minimum of 90% of their free-feeding body weight. Food was generally removed 12–18 h before each training session. Food restriction was not used at the time of recording.

Rats foraged for randomly scattered food crumbs (corn puffs) in a square open-field (OF) arena of size 1.5 × 1.5 m, with black flooring and enclosed by walls of height 50 cm. A large white cue card was affixed to one of the arena walls (height same as the wall; width 41 cm; horizontal placement at the middle of the wall). At the time of the surgery, each rat was highly familiar with the environment and task (10–20 training sessions lasting at least 20 min each).

The wagon-wheel (WW) track task was designed to function as a 1D version of the 2D OF foraging task. The track’s geometry comprised an elevated circular track with two perpendicular cross-linking arms spanning the circle’s diameter. The track was 10 cm wide and was bounded on both sides by a 1-cm-high lip. Each section of the track was fitted with a reward point, placed halfway between the two nearest junctions, in the centre of the section. Each reward point consisted of an elevated well that could be remotely filled with chocolate milk via attached tubing. To encourage foraging behaviour, a pseudorandom subset of the wells (between one and four of the eight wells) was filled at a given time, and the rat was allowed to explore the full maze freely and continuously. Wells were refilled as necessary when the rat consumed rewards. Each rat was trained to high performance on the foraging task before the surgery (collecting at least 30 rewards within a 30-minute session). Training to this level of performance took 5–10 half-hour sessions.

### Natural sleep

For sleep sessions, the rat was placed in a black acrylic ‘sleep box’ with a 40 × 40-cm square base and 80-cm-high walls. The black coating of walls was transparent to infrared, which allowed the 3D motion capture to track the rat through the walls. The bottom of the sleep box was lined with towels, and the rat had free access to water. During recording sessions in the sleep box, the main room light was switched on and pink noise was played through the computer speakers to attenuate disturbing background sounds. Sleep sessions typically lasted 2–3 h, but were aborted prematurely if the rat seemed highly alert and unlikely to sleep.

### Spike sorting and single-unit selection

Spike sorting was performed with KiloSort 2.526. In brief, the algorithm consists of three principal stages: (1) a raw-data alignment procedure that detects and corrects for shifts in the vertical position of the Neuropixels probe shank relative to the surrounding tissue; (2) an iterative template-matching procedure that uses low-rank, variable-amplitude waveform templates to extract and classify single-unit spikes; and (3) a curation procedure which detects appropriate template merging and splitting operations based on spike train auto- and cross-correlograms. Some customizations were made to the standard KiloSort 2.5 method to improve its performance on recordings from the MEC–PaS region, where there is a particularly high spatiotemporal overlap of spike waveforms owing to the high density of cells. Therefore, the maximum number of spikes extracted per batch in step 1 above was increased, as was the number of template-matching iterations in step 2. To improve the separation between cells with very similar-looking waveforms, the upper limit on template similarity was raised from 0.9 to 0.975 in step 2 and to 1.0 on step 3, while supervising manually all merge and split operations from step 3, using a custom-made GUI running in MATLAB. The manual supervision ensured that Kilosort 2.5 did not automatically merge pairs of units with a dip in the cross-correlogram, which in our data was often due to out-of-phase spatial tuning. The merge and split operations were repeated several times to ensure the best separation between single units.

Single units were discarded if more than 1% of their interspike interval distribution consisted of intervals less than 2 ms. In additions, units were excluded if they had a mean spike rate of less than 0.05 Hz or greater than 10 Hz (calculated across the full recording duration).

### Single-unit spike waveforms

During spike sorting, Kilosort assigned each unit with a 2 ms spike waveform template on each recording channel. To calculate a representative single waveform for each unit, the peak-to-peak amplitude of the template was calculated on every channel, and the templates from the three highest-amplitude channels were averaged to generate the representative spike waveform. To calculate spike width, a unit’s representative waveform was finely interpolated (from 61 to 1,000 points) using a cubic spline function. Spike width was defined as the time difference between the waveform’s negative peak (to which the waveform was aligned by Kilosort), and the following positive peak.

### Spatial position and direction tuning

During awake foraging sessions in the OF arena or wagon-wheel track, only time epochs in which the rat was moving at a speed above 2.5 cm s−1 were used for spatial or toroidal analyses. To generate 2D rate maps for the OF arena, position estimates were binned into a square grid of 3 × 3-cm bins. The spike rate in each position bin was calculated as the number of spikes recorded in the bin, divided by the time the rat spent in the bin. To interpolate the values of unvisited bins, two auxiliary matrices were used, M1 and M2, setting visited bins equal to the value of the original rate map in M1 and to 1 in M2, and setting unvisited bins to zero in both. One iteration of the image-processing ‘closing’ operation was then performed (binary dilation followed by erosion, filling out a subset of the non-visited bins) on M2, using a disk-shaped structuring element, first padding the matrix border by one bin. Both matrices were then spatially smoothed with a Gaussian kernel of smoothing width 2.75 bins. Finally, the rate map was obtained by dividing M1 by M2. Rate-map spatial autocorrelograms and grid scores were calculated as described previously28. The selectivity of each cell’s position tuning was quantified by computing its spatial information content42, measured in bits per spike (see ‘Information content’).

Head-direction tuning curves were calculated by binning the head-direction estimates into 6° bins. The spike rate in each angular bin was calculated as the number of spikes recorded in the bin divided by the time that the rat spent in the bin. The resultant tuning curve was smoothed with a Gaussian kernel with σ = 2 bins, with the ends of the tuning curve wrapped together. The selectivity of head-direction tuning was quantified using the mean vector length (MVL) of the tuning curve. This was calculated according to:

$${\rm{M}}{\rm{V}}{\rm{L}}=\,\frac{|{\sum }_{j=1}^{M}{{\bf{f}}}_{j}\exp (i{{\boldsymbol{\alpha }}}_{j})|}{{\sum }_{j=1}^{M}{{\bf{f}}}_{j}},$$

where vector f represents the tuning curve values (firing rates), vector α represents the corresponding angles, M is the number of tuning curve values, and |∙| represents the absolute value of the enclosed term.

### Grid module classification

A novel method was implemented to detect populations of cells corresponding to grid modules by finding clusters of cells that expressed similar spatially periodic activity in the open field (Extended Data Fig. 2). Contrary to previous clustering-based methods for grid modules3, this approach makes no assumptions about the specific geometry of the grid pattern, thus making it less susceptible to the detrimental effects of geometric distortions such as ellipticity3,30.

For each MEC–PaS cell in a given recording, a coarse-resolution rate map of the OF session was constructed, using a grid of 10 × 10-cm bins, with no smoothing across bins. The 2D autocorrelogram of this rate map was calculated, and the central peak was removed by excluding all bins located less than 30 cm from the autocorrelogram centre. Bins located more than 100 cm from the autocorrelogram centre were also excluded. The autocorrelograms for all cells were subsequently converted into column vectors, z-standardized, then concatenated to form a matrix with spatial bins as rows and cells as columns. The nonlinear dimensionality reduction algorithm UMAP43,44 was then applied to this matrix, yielding a two-dimensional point cloud in which each data point represented the autocorrelogram of one cell (Extended Data Fig. 2a–d; UMAP hyperparameters: 'metric'=‘manhattan’, ‘n_neighbors’=5, ‘min_dist’=0.05, ‘init’=‘spectral’). In the resultant 2D point cloud, cells with small absolute differences between their autocorrelogram values were located near to one another. The point cloud was partitioned into clusters using the DBSCAN clustering algorithm (MATLAB function ‘dbscan’, minimum 30 points per cluster, eta = 0.6–1.0). In every recording, the largest cluster was mainly composed of cells that either lacked strong spatial selectivity or were spatially selective but without clear periodicity. All remaining clusters contained cells with high grid scores, and with similar grid spacing and orientation (Extended Data Fig. 2a–d); cluster membership was therefore used as the basis for grid module classification. In one recording (rat ‘R’ day 1), two clusters were identified that had similar average grid spacing and orientation (labelled as ‘R1a’ and ‘R1b’ in Extended Data Fig. 2a–d), suggesting that they represented the same grid module. R1b appeared to comprise cells with higher variability in the within-field firing rates of the spatial rate maps, accompanied by more irregularities in the autocorrelograms. These two clusters were merged together in subsequent analysis (in which the resultant cluster is called ‘R1’).

A subset of the cells that were assigned to grid module clusters by the above procedure were tuned to both location and head direction (conjunctive grid × direction cells). These cells, which were defined as having a head-direction tuning curve with mean vector length above 0.3, were discarded from further analysis.

### Classification of sleep states

SWS and REM periods were identified on the basis of a combination of behavioural and neural activity, following previously described approaches6,45,46. First, sleep periods were defined as periods of sustained immobility (longer than 120 s with a locomotion speed of less than 1 cm s−1 and head angular speed of less than 6° s−1). Qualifying periods were then subclassified into SWS and REM on the basis of the amplitude of delta- and theta-band rhythmic activity in the recorded MEC–PaS cells. Spike times for each cell were binned at a resolution of 10 ms and the resultant spike counts were binarized, such that ‘0’ indicated the absence of spikes and ‘1’ indicated one or more spikes. The binarized spike counts were then summed across all cells (Extended Data Fig. 9A). The rhythmicity of this aggregated firing rate with respect to delta (1–4 Hz) and theta (5–10 Hz) frequency bands was quantified by applying a zero-phase, fourth-order Butterworth band-pass filter, then calculating the amplitude from the absolute value of the Hilbert transform of the filtered signal, which was smoothed using a Gaussian kernel with σ = 5 s and then standardized (‘z-scored’). The ratio of the amplitudes of theta and delta activity was hence calculated (theta/delta ratio, ‘TDR’). Periods in which TDR remained above 5.0 for at least 20 s were classified as REM; periods in which TDR remained below 2.0 for at least 20 s were classified as SWS (Extended Data Fig. 9B).

Spectral analysis was performed on 10-ms-binned multi-unit activity using the multi-tapered Fourier transform, implemented by the Chronux toolbox (http://chronux.org/, function ‘mtspectrumsegc’). Non-overlapping 5-second windows were used, with a frequency bandwidth of 0.5 Hz and the maximum number of tapers.

### Visualization of toroidal manifold

For each module of grid cells, spike times of co-recorded cells in the OF were binned for each cell at a resolution of 10 ms, and the binned spike counts were convolved with a Gaussian filter with σ = 50 ms. Time bins in which the rat’s speed was below 2.5 cm s−1 were then discarded. To account for variability of average firing rates across cells, the smoothed firing rate of each cell was z-scored. For computational reasons, the time bins were downsampled, taking every 25th time bin (equating to 250-ms intervals between selected samples). Collectively, the downsampled firing rates of the full population of cells formed a matrix with time bins in rows and cells in columns. PCA was applied to this matrix (treating time bins as observations and cells as variables), and the first six principal components were retained (Extended Data Figs. 3Aa–c, 4a–d). UMAP43,47 was then run on these six principal components (with time bins as observations and principal components as variables). The hyperparameters for UMAP were: ‘n_dims’=3, ‘metric’=‘cosine’, ‘n_neighbours’=5000, ‘min_dist’=0.8 and ‘init’=‘spectral’.

For visualizing the toroidal manifold during WW, smoothed firing rates were first calculated by the same procedure described above for OF. Subsequently, to allow comparison of the toroidal manifold between OF and WW, the same PCA and UMAP transformations calculated for the OF data were re-applied to the WW data, by supplying the fitted OF UMAP transformation as the argument ‘template_file’ to the ‘run_umap’ function in the MATLAB implementation47.

### Preprocessing of population activity

Each topological analysis was based on the activity of a single module of grid cells, during a single experimental condition in one recording session. Topological analysis of multi-module and conjunctive grid × direction cell activity was not considered as we expect such data to exhibit higher-dimensional topological structure requiring a higher number of cells27. The experimental conditions were: open-field foraging (OF), wagon-wheel track foraging (WW), slow-wave sleep (SWS), and rapid eye-movement sleep (REM). Sleep epochs of the same type were collected from across the recording and concatenated for analysis purposes. Similarly, in one case (rat 'S'), two WW task sessions were concatenated to increase the sample size.

In total there were 27 combinations of module (Q1, Q2, R1, R2, R3, S1) and experimental condition (OF day 1, OF day 2, WW, REM, SWS).

Preprocessing of spike trains began by computing delta functions centred on the spike times (valued 1 at time of firing; 0 otherwise), and convolving these temporally with a Gaussian kernel with σ = 50 ms (OF, WW and REM) or 25 ms (SWS). Samples of the smoothed firing rates of all cells (‘population activity vectors’) were then computed at 50-ms intervals. The awake states were further refined by excluding vectors which originated from time periods when the rat’s speed was below 2.5 cm s−1.

Computing the persistent cohomology of a point cloud is computationally expensive and may be sensitive to outliers (for example, spurious points breaking the topology of the majority of points in the point cloud). For this reason, it is common to preprocess the data by downsampling and dimension-reducing the point cloud. The same preprocessing procedure was used for all datasets in the present study.

First, the data points were downsampled by keeping the 15,000 most active population activity vectors (as measured by the mean population firing rate). During SWS, this selection criterion had the consequence of automatically discarding population activity vectors during down-states, when neural activity is near-silent. As noise is inherently more prevalent and cosine distances less reliable in high-dimensional spaces (“the curse of dimensionality”)48, dimensionality-reduction and a normalization of distances were subsequently performed. The reduced point cloud was z-scored and projected to its six first principal components, thus reducing noise while keeping much of the variance (see Extended Data Fig. 4a). This was supported by the lack of grid structure and the clear drop in explained deviance after six components (see Extended Data Fig. 4b, c). The explained deviance was computed by fitting a GLM model to each component individually, using the spatial coordinates as covariate, suggesting that the higher components are less spatially modulated and possibly better described by other (unknown) covariates. Consistent with this, the toroidal structure was most clearly detected in the barcodes when comparing the ratio of the lifetimes of the two most persistent H1 bars versus the third longest-lived H1 bar for the barcodes obtained when using different numbers of components in the analysis (see Extended Data Fig. 4d). These analyses both indicated that dimensionality reduction was required to firmly demonstrate the toroidal topology in the grid cells. The empirical findings are supported theoretically; see ‘Theoretical explanation of the six-dimensionality proposed by PCA’ in Supplementary Methods.

To further simplify the low-dimensional point cloud, a different downsampling technique was introduced, based on a point-cloud density strategy motivated by a topological denoising technique introduced previously49 and a fuzzy topological representation used in UMAP43,50. Parts of the open-source implementation of the latter were copied in this computation. This approach consisted of assigning, for each point, a neighbourhood strength to its k nearest neighbours, and subsequently sampling points that represent the most tight-knit neighbourhoods of the point cloud in an iterative manner. First, we defined $${m}_{i,{i}_{j}}^{{\prime} }=\exp \left(-\frac{{d}_{i,{i}_{j}}}{{\sigma }_{i}}\right),$$ where $${d}_{i,{i}_{j}}$$ is the cosine distance between point xi and its j-th nearest neighbour and σi is chosen to make $$\mathop{\sum }\limits_{j=1}^{k}{m}_{i,{i}_{j}}^{{\prime} }={\log }_{2}k,$$using k = 1,500. The neighbourhood strength was then obtained by symmetrizing: $${m}_{i,{i}_{j}}={m}_{i,{i}_{j}}^{{\prime} }+\,{m}_{j,{j}_{i}}^{{\prime} }-\,{m}_{i,{i}_{j}}^{{\prime} }\cdot {m}_{j,{j}_{i}}^{{\prime} }$$. Finally, the point cloud was reduced to 1,200 points by iteratively drawing the i-th point as: $$\mathop{\max }\limits_{{x}_{i}}\sum _{j\in \tilde{I}}{m}_{j,{j}_{i}},$$ where Ĩ denotes the indices of the points not already sampled. In other words, for each iteration, the sampled point is the one with the strongest average membership of the neighbourhoods of the remaining points.

To compute the persistent cohomology of the downsampled point cloud, the neighbourhood strengths were first computed for the reduced point cloud (using k = 800) and its negative logarithm was taken, obtaining a distance matrix. This matrix was then given as input to the Ripser implementation51,52 of persistent cohomology, returning a barcode. In short, the barcode gave an estimate of the topology of the fuzzy topological representation of the six principal components of the grid-cell population activity. Thus, in essence, the first step of UMAP was applied before describing the resulting representation with persistent cohomology, instead of using it to project each point of the point cloud to a representation of user-specified dimensionality for visualization (Extended Data Fig. 3Ad, e). This gives a more direct and stable quantification of the global data structure, without having to choose an initialization53 or optimize a lower-dimensional representation.

### Persistent cohomology

Persistent cohomology, a tool in topological data analysis, was used to characterize the manifold assumed to underlie the data. This has clear ties with persistent homology and the main result (the barcode) is identical, thus the two terms are often used interchangeably. Persistent cohomology was chosen because the computation is (to our knowledge) faster and is required to obtain cocycle representatives, which are necessary to perform decoding (see ‘Cohomological decoding’). Persistent (co-)homology has previously been successful in analysing neural data, describing the ring topology of head direction cell activity22,23,24, the spherical representation of population activity in primary visual cortex54, and the activity of place cells55,56,57,58.

The general outline of the algorithm is as follows. Each point in the cloud is replaced by a ball of infinitesimal radius, and the balls are gradually expanded in unison. Taking the union of balls at a given radius results in a space with holes of different dimensions. The range of radii for which each hole is detected is tracked; this is referred to as the ‘lifetime’ of the hole and is represented by the length of a bar. The totality of bars is referred to as the barcode.

The software package Ripser51,52 was used for all computations of persistent cohomology. Ripser computes the persistent cohomology of ‘Vietoris-Rips complexes’ (which approximate the union of balls for different radii), constructed based on the input distance matrix and a choice of coefficients (in our case, 47-coefficients), and outputs the barcode and cocycle representatives for all bars. The prime 47 was chosen as homology and cohomology coincide in this case and as it is unlikely that this divides the torsion subgroup of the homology of the space. Torsion may indicate, for example, orientability of a manifold and in choosing 47 as our prime, we disregard all but 47-torsion. Testing with other primes (for example, 43) gave similar results (data not shown) and the Betti numbers stayed the same regardless of choice of prime.

To verify that the lifetimes of prominent bars in the barcodes were beyond chance, shuffled distributions were generated for the persistence lifetimes in each dimension. In each shuffling, the spike train of each cell was shifted independently in time by rolling the firing rate arrays a random length between 0 and the length of the session. The same preprocessing and persistence analysis were then performed on the shifted spike trains as for the unshuffled data. This was performed 1,000 times, and each time a barcode was obtained. The barcodes were concatenated for all shuffles and the maximum lifetime was found for each dimension. This lifetime served as a significance criterion for the bar lifetimes. It is noted, however, that this is a heuristic and that statistics of barcodes are still not well established.

### Cohomological decoding

As there are other spaces with similar barcodes as for a torus, the results identified by the barcode were further investigated, using the ‘cohomological decoding’ procedure introduced previously59 to calculate a toroidal parametrization of the point clouds of population activity. This assigns to each point corresponding positions on each of the two circular features identified by the 1D bars with the longest lifetime, resulting in coordinates that further characterize the underlying shape of the data.

Cohomological decoding is motivated by the observation that the 1D cohomology (with integer coefficients) of a topological space X is equivalent to the set of homotopy-equivalent classes of continuous maps from X to the circle (S1)60; that is:

$${H}^{1}(X;{\mathbb{Z}})\cong [X,{S}^{1}].$$

This subsequently means that for each 1D bar existing at a given radius, there exists a corresponding continuous map from the Vietoris-Rips complex of that radius to the circle. Thus, we may first use persistent cohomology to detect which elements represent meaningful (long-lived) features of the data and choose a radius for which these features exist. As the vertices of the Vietoris-Rips complex are points in the point cloud, the circular values of the corresponding maps at the vertices describe circular coordinates of the data.

In the present case, persistent cohomology was first applied to the grid-cell population activity and X was identified as the Vietoris-Rips complex for which the two longest-lived one-dimensional bars in the barcode (representing each of the two circles of the torus) existed. To define the desired toroidal coordinates on a domain that was as large as possible, we chose the complex given at the scale of the birth plus 0.99 times the lifetime of the second longest-lived one-dimensional bar in the barcode22,59,61. Next, the cocycle representatives (given by the persistent cohomology implementation of Ripser51,52) of each of the chosen 1D bars defined 47-values for each of the edges in the complex. These edge values were then lifted to integer coefficients and subsequently smoothed by minimizing the sum over all edges (using the scipy implementation ‘lsmr’). The values on the vertices (points) of each edge followed from the edge values and gave the circular parametrizations of the point cloud. The product of the two parametrizations thus provided a mapping from the neural activity to the two-dimensional torus—that is, giving a toroidal coordinatization (decoding) of the data.

As persistent cohomology was computed for a reduced dataset of 1,200 points and therefore circular parametrizations were obtained only for this point cloud, each parametrization was interpolated to the population activity from the rest of the session(s). First, the 1,200 toroidal coordinates were weighted by the normalized (‘z-scored’) firing rates of the cells at those time points, obtaining a distribution of the coordinates for each grid cell. The decoded toroidal coordinates were then computed by finding the mass centre of the summed distributions, weighted by the population activity vector to be decoded. These activity vectors were calculated by first applying a Gaussian smoothing kernel of 15-ms standard deviation to delta functions centred on spike times, sampling at 10-ms intervals and then z-scoring the activity of each cell independently. Time intervals that contained no spikes from any cell were subsequently excluded. When decoding was used to assess or compare the tuning properties of single cells (for example, comparison of toroidal versus spatial description), the coordinates were computed using the weighted sum of the distributions of the other cells; that is, the contribution of the cell to be assessed or compared was removed. When comparing preservation of toroidal tuning across two sessions, coordinates were interpolated either using the toroidal parametrization in each session independently (‘Separate’) or using the same toroidal parametrization in both sessions (‘Common’).

### Toroidal rate map visualization

For visualization, toroidal firing rate maps were calculated in the same way as the physical space covariate (see ‘Spatial position and direction tuning’), first binning the toroidal surface into a square grid of 7.2° × 7.2° bins and computing the average spike rate in each position bin. However, for toroidal maps, it was necessary to address the 60° angle between the toroidal axes before smoothing. After binning the toroidal coordinates, the rate map was ‘straightened’ by shifting the bins along the x axis (‘horizontally’) the length of (y mod 2)/2 bins, where y is the vertical enumeration of the given bin. Copies of the rate map were then tiled in a three-by-three square (similar to Extended Data Fig. 5d), before applying the closing and smoothing operations as for the spatial firing rate map. The single toroidal rate map was finally recovered by cutting out the centre tile, rotating it 90° and defining 15° shear angles along both the x and the y axis to correct for the 60° offset between them.

### Comparison of spatial periodicity

Differences in grid periodicity between OF and WW environments were quantified for a given cell by comparing the grid scores in the two behavioural conditions. Two alternative methods were used to generate the spatial autocorrelograms for this comparison: (1) comparing the autocorrelograms for OF and WW directly; and (2) comparing autocorrelograms for OF and WW after first equalizing the spatial coverage between the two conditions.

For method (1), rate maps were calculated as specified in the above section ‘Spatial position and direction tuning’, using the same grid of 3 × 3-cm bins for both environments. This set of bins spanned the entirety of the OF arena and covered most of the WW track apart from some small regions at the outer extrema, which were discarded for the purpose of this analysis. For each of the two rate maps, the autocorrelogram was computed and the grid score was calculated.

Method (2) was similar to method (1), except that the cell’s OF rate map was converted into a ‘masked OF’ rate map, by removing all bins that were unvisited by the rat in the WW session. This effectively equalized the position coverage between the two conditions, and thus allowed for a more valid comparison.

### Toroidal versus spatial description

The explanatory significance of the toroidal description was evaluated by comparing statistical measures of how well the toroidal coordinates explained neural activity on the torus and in physical space. For a fair comparison, it was important to avoid overfitting, which might occur if a toroidal parametrization of a point cloud is used to describe that same set of data points. Two precautions were taken to avoid such overfitting: first, the data were decoded using the toroidal parametrization from a different condition (an OF session for a WW recording and a WW session for an OF recording), and second, the cell for which the statistical measurement was made was omitted from the decoding.

The comparison of toroidal and environmental representations also accounted for tracking error in the physical position estimate, which mainly resulted from the approximately 4 cm vertical offset of the tracking device above the rat’s head. This causes a discrepancy when the angle α between the animal’s zenith and the axis of gravitation is different from 0°, measured as 4 tan(α) cm. The mean discrepancy in the recorded position data was measured to 1.5 cm. To account for this error of the position estimate, proportional Gaussian noise was added to the toroidal coordinates, using a standard deviation of 1.5 cm/Ω, where Ω denotes the grid spacing of the particular grid-cell module, estimated from the mean period of the fitted cosine waves of the toroidal coordinates in the open field (see ‘Toroidal alignment’).

### Information content

The information content (I) was calculated as previously described42, to quantify and compare the amount of information carried by single-cell activity about the location on the torus and physical space per spike. Both covariates were binned in a M = 15 × 15 grid of square bins. For each bin  j, the average firing rate fj (given in spikes per second), and the occupancy ratio, pj, were computed. The information content for each grid cell was then given as:

$$I=\frac{1}{\bar{{\bf{f}}}}\mathop{\sum }\limits_{j=1}^{M}{{\bf{f}}}_{j}{\log }_{2}\frac{{{\bf{f}}}_{j}}{\bar{{\bf{f}}}}{{\bf{p}}}_{j},$$

where $$\bar{{\bf{f}}}$$ is the mean firing rate of the cell across the entire session.

Note that although the rate maps for physical space have multiple firing fields, whereas the toroidal rate maps have single firing fields, we expect the spatial information to be comparable, as the measure primarily depends on the ratio of bins with high firing activity. This number should be comparable as the firing field size (in bins) will be inversely related to the number of fields in the rate map, assuming that the discretization of the map captures the relevant firing rate variations. For example, given a similar binning of space, a larger OF environment will include more fields, but the number of bins per field will decrease correspondingly. The binning used should be sufficient to resolve the smallest fields, as the same discretization was used in classifying the grid cells in the recorded population.

### Deviance explained

Deviance explained was computed to measure how well a Poisson GLM model fitted to the spike count was at representing the data, using either the toroidal coordinates or the tracked position as regressors. A similar set-up was used to that of a previous study62, with a smoothness prior for the GLM to avoid overfitting.

Both the toroidal and spatial coordinates were binned into a 15 × 15 grid of bins, and GLM design matrices were built with entries Xi(t) = 1 if the covariate at time t fell in the i-th bin and Xi(t) = 0 otherwise.

The Poisson probability of recording k spikes in time bin t is:

$$P(k|\mu (t),\beta )=\exp (-\mu (t)){\frac{\mu (t)}{k!}}^{k},$$

where $$\mu (t)=\exp (\sum _{i}{\beta }_{i}{X}_{i}(t))$$ is the expected firing rate in time bin t. The parameters β of the Poisson GLM were optimized for each covariate by minimizing the cost function:

$$L(\beta |\mu (t),\gamma ,k)=-\sum _{t}\,{\rm{l}}{\rm{n}}(P(k(t)|\mu (t),\beta ))+\frac{1}{2}\gamma \sum _{i,j\in N}{({\beta }_{i}-{\beta }_{j})}^{2},$$

where N is the set of neighbour pairs. The first term is the negative log-likelihood of the spike count in the given time bin, whereas the second term puts a penalty on large differences in neighbouring parameters, enforcing smoothness in the covariate response of the predicted spike count.

The parameters, β, were initialized to zero and then modified to minimize the loss function by first running two iterations of gradient descent, before optimizing using the ‘l-bfgs-b’-algorithm (as implemented in the ‘scipy.optimize’-module) with ‘gtol’=1e-5 as the cut-off threshold, and finally running two more iterations of gradient descent. A three-fold cross validation procedure was used, repeatedly fitting the model to two-thirds of the data and testing on the held-out last third.

The smoothness hyperparameter γ was optimized a priori on each grid-cell module based on the summed likelihood, testing γ (1, √10, 10, √1,000), and found to be either 1 or √10 in all cases.

Similarly, after fitting a null model (using only the intercept term) and the saturated model (perfectly fitting each spike count), the deviance explained could be computed as:

$$1-\frac{{{\rm{ll}}}_{{\rm{s}}}-{{\rm{ll}}}_{{\rm{p}}}}{{{\rm{ll}}}_{{\rm{s}}}-{{\rm{ll}}}_{0}},$$

where llp, ll0 and lls denote the cross-validated log likelihood of the fitted model, the null model and the saturated model, respectively. This provides a normalized comparison describing the difference between the fitted model and the idealized model.

### Toroidal alignment

To infer a geometric interpretation of the tori, as characterized via the cohomological decoding, and compare the toroidal parametrizations across modules and conditions, two cosine waves of the form cos(ωt + k) were fitted to the OF mappings of the decoded circular coordinates (Extended Data Fig. 5a), where t is the centre 1002-bins of a 540° × 540°-valued 1502-bin grid rotated θ degrees. The parameters (ω, k, θ) were optimized by minimizing the square difference between the cosine waves and the cosine of the mean of the circular coordinates in 1002 bins of the physical environment (smoothed using a Gaussian kernel with 1-bin standard deviation). Estimates were first obtained by finding the minimum when testing all combinations in the following intervals, each discretized in 10 steps: ω [1,6], ϕ [0, 360) and θ [0,180). The parameters of the cosine waves were further optimized using the ‘slsqp’-minimization algorithm (as implemented in the ‘scipy.optimize’-module using default hyperparameters). The period of each cosine wave was computed as 1.5 m/ω, giving a spatial scale estimate of the grid-cell modules.

As circular coordinates have arbitrary origin and orientation (that is, clockwise or counterclockwise evolution) we needed to realign the directions of the circular coordinates to compare these across modules and sessions (see Extended Data Fig. 4b). The clockwise orientation of each circular coordinate was first determined by noting whether (ωt + k) or 360° − (ωt + k) best fit the spatial mapping of the circular means of the toroidal coordinates, and subsequently reoriented to obtain the same orientation for both coordinates. The coordinate for which cos(θ) was largest (intuitively, the ‘x axis’) was then defined as the first coordinate (denoted ϕ1, with parameters (ω1, k1, θ1)) and the other as the second coordinate (ϕ2). Although (ϕ1, ϕ2) fully describe the toroidal location, the hexagonal torus allows for three axes, and the two axes obtained are thus oriented at either 60° or 120° relative to each other (see Extended Data Fig. 5b). The difference in directions was given by θ1θ2 and if this difference was greater than 90°, ϕ2 was replaced with ϕ2 + 60° ϕ1. Finally, the origin of the coordinates was aligned to a fixed reference, by subtracting the mean angular difference between the decoded coordinates and the corresponding coordinates obtained when using the toroidal parametrization of the reference OF session.

For visualization (Extended Data Fig. 5), it was furthermore necessary, in some cases, to rotate both vectors of the rhombi 30 degrees depending on whether one of the axes was directed outside of the box.

### Preservation of toroidal tuning

Centre-to-centre distance and Pearson correlation were computed between toroidal tuning maps of different sessions to measure the degree of preservation between the toroidal descriptions.

First, the preferred toroidal firing location for each cell was computed as the centre of mass of the toroidal firing distribution:

$${T}_{{\rm{c}}}=\arctan \,2\left(\frac{{\sum }_{i}\,\sin \,{\theta }_{i}\cdot {{\bf{y}}}_{i}}{{\sum }_{i}{{\bf{y}}}_{i}},\frac{{\sum }_{i}\,\cos \,{\theta }_{i}\cdot {{\bf{y}}}_{i}}{{\sum }_{i}{{\bf{y}}}_{i}}\,\right),$$

where yi denotes the mean spike count of the given cell in the i-th bin whose binned toroidal coordinates are given by θi. The distance between mass centres found in two sessions (‘S1’ and “S2”) was then defined as:

$$d=\Vert \arctan \,2{(\sin ({T}_{{\rm{c}}}^{{S}_{2}}-{T}_{{\rm{c}}}^{{S}_{1}}),\cos ({T}_{{\rm{c}}}^{{S}_{2}}-{T}_{{\rm{c}}}^{{S}_{1}}))}_{2}\Vert$$

where || ||2 refers to the L2-norm.

Pearson correlation between two tuning maps was computed by flattening the smoothed 2D rate maps to 1D arrays and calculating the correlation coefficient, r, using the ‘pearsonr’-function given in the ‘scipy.stats’-library.

To determine how much the preservation of the toroidal representations across two sessions (measured with Pearson correlation and peak distance) differed from a random distribution, the indices of the cells in one of the sessions were randomly re-ordered before computing correlation and distance for the pair of conditions. This process was repeated 1,000 times, and the P value was calculated from the rank of the original r value or distance with respect to the shuffled distribution.

### Classification of grid cells

Temporal autocorrelograms were computed, for each cell, by calculating a histogram of the temporal lags between every spike and all surrounding spikes within a 200 ms window, using 1 ms bins. The histogram was then divided by the value of the zero-lag bin, which was subsequently set to zero. The autocorrelogram was smoothed using a gaussian kernel with smoothing window 4 ms. Considering the autocorrelograms of all modules during OF foraging (day 2 for R1–3) as a point cloud, the cosine distances between all points were calculated, and hence each point’s 80 nearest neighbours were found. This defined a graph in which each point described a vertex and the neighbour pairs gave rise to edges. A density estimate was then calculated as the exponential of the negative distances summed over each neighbour for each point. The graph and the density estimate were given as the input to the Gudhi implementation63 of ToMATo64. ToMATo uses a hill-climbing procedure to find modes of the density function and uses persistence to determine stable clusters. In the present case, the algorithm finds three long-lived clusters.

### Minimum number of cells for torus detection

To address the question of how many cells are minimally needed to expect to see toroidal structure, random samples of n = 10, 20, ..., 140 cells were taken from R2 (n = 149 cells) during OF foraging, and the same topological analysis was repeated as for the whole population. The cells were resampled 1,000 times for each number of cells in the subsample. To determine whether toroidal structure was detected, a heuristic was introduced based on the circular parameterization given by the two most persistent 1D bars in the barcode mapped onto physical space. An estimate of the resulting planar representation of the torus was obtained by fitting planar cosine waves to each mapping (see ‘Toroidal alignment’). For the analysis to be determined ‘successful’ in detecting toroidal structure, we required: (i) the mean value of the least-squares fitting (across bins of the mapping) to be less than 0.25; (ii) the angle of the rhombus to be close to 60° (between 50° and 70°); and (iii) the side lengths to be within 25% of each other.

### Toroidal peak detection

The number of peaks per toroidal rate map was detected to assert the number of grid cells whose toroidal rate map portrayed single fields. First, 1,000 points were sampled from the toroidal distribution given by the mean activity of each cell in 150 × 150 bins of the stacked toroidal surface (that is, as described in ‘Toroidal rate map visualization’, each 50 × 50-binned toroidal rate map is first ‘straightened’ and subsequently stacked in 3 × 3 to address the toroidal boundaries) and then spatially smoothed using a Gaussian kernel with smoothing widths 0, 1, 2, …, 10 bins with mode set to ‘constant’ in the ‘scipy.gaussian_filter’ function. Next, the points were clustered by computing a density estimate, using the Euclidean distance, and defining neighbours as points closer than 5 bins. Cluster labels were iteratively assigned to each point and all its neighbours in a downhill manner, instantiating a new cluster identity if the point was not already labelled. Finally, the centroids for each cluster were computed and counted as a peak depending on whether its position fell within the centre 50 × 50 bins of the stacked rate maps.

### Simulated CAN models

To confirm the expected outcomes of topological analyses of grid cell CAN models, grid cells were simulated using two different, noiseless CAN models (Extended Data Fig. 7).

First, a 56 × 44 grid cell network was simulated based on the CAN model proposed previously9, but using solely lateral inhibition (for details see ref. 11) in the connectivity matrix, W. The animal movement was given as the first 1,000 s of the recorded trajectory of rat ‘R’ during OF session, originally sampled at 10 ms, and interpolated to 2-ms time steps. The speed, v(t), and head direction θ(t) of the animal was calculated as the (unsmoothed) displacement in position for every time step. The activity, s, was updated as:

$${{\bf{s}}}_{i+1}={{\bf{s}}}_{i}+\frac{1}{\tau }(-{{\bf{s}}}_{i}+{(I+{{\bf{s}}}_{i}\cdot W+\alpha v(t)\cos (\theta (t)-\mathop{\theta }\limits^{ \sim }))}_{+}),$$

where (…)+ is the Heaviside function and $$\mathop{\theta }\limits^{ \sim }$$ is the population vector of preferred head directions. The following parameters were used: I = 1, α = 0.15, l = 2, W0 = −0.01, R = 20 and τ = 10, and let the activity pattern stabilize by first initializing to random and performing 2,000 updates, disregarding animal movement. For computational reasons, the activity was set to 0 if si < 0.0001. The simulation was subsequently downsampled keeping only every 5th time frame.

Next, a 20 × 20 grid-cell network was simulated, for a synthetically generated OF trajectory (‘random walk’), based on the twisted torus model formulated in a previous study10. The parameter values and the code for computing both the grid cell network (choosing a single grid scale by defining the parameter ‘grid_gain’ = 0.04) and the random navigation (using 5,000 time steps) were given by the implementation by Santos Pata65.

### Idealized torus models

To compare the results of both the original and simulated grid cell networks with point clouds where the topology is known, a priori, to be toroidal, points were sampled from a square and a hexagonal torus. First, a 50 × 50 (angle) mesh grid (θ1, θ2) was created in the square [0,2π)×[0,2π) and slight Gaussian noise (ϵ = 0.1N(0,1)) was added to each angle. The square torus was then constructed via the 4D Clifford torus parametrization: (cos(θ1), sin(θ1), cos(θ2), sin(θ2)). The hexagonal torus was constructed using the 6D embedding: (cos(θ1), sin(θ1), cos(a1θ1 + θ2), sin(a1θ1 + θ2), cos(a2θ1 + θ2), sin(a2θ1 + θ2)), where a1=1/√3 and a2 = −1/√3.

### Histology and recording locations

Rats were given an overdose of sodium pentobarbital and were perfused intracardially with saline followed by 4% formaldehyde. The extracted brains were stored in formaldehyde and a cryostat was used to cut 30-µm sagittal sections, which were then Nissl-stained with cresyl violet. The probe shank traces were identified in photomicrographs, and a map of the probe shank was aligned to the histology by using two reference points that had known locations in both reference frames: (1) the tip of the probe shank; and (2) the intersection of the shank with the brain surface. In all cases, the shank traces were near-parallel to the cutting plane, therefore it was deemed sufficient to perform a flat 2D alignment in a single section where most of the shank trace was visible. The aligned shank map was then used to calculate the anatomical locations of individual electrodes (Extended Data Fig. 1).

### Data analysis and statistics

Data analyses were performed with custom-written scripts in Python and MATLAB. Open-source Python packages used were: umap (version 0.3.10), ripser (0.4.1), numba (0.48.0), scipy (1.4.1), numpy (1.18.1), scikit-learn (0.22.1), matplotlib (3.1.3), h5py (2.10.0) and gudhi (3.4.1.post1). Samples included all available cells that matched the classification criteria for the relevant cell type. Power analysis was not used to determine sample sizes. The study did not involve any experimental subject groups; therefore, random allocation and experimenter blinding did not apply and were not performed. All statistical tests were one-sided.

The most intensive computations were performed on resources provided by the NTNU IDUN/EPIC computing cluster66.