Distributed encoding of curvilinear self-motion across spiral optic flow patterns

Self-motion along linear paths without eye movements creates optic flow that radiates from the direction of travel (heading). Optic flow-sensitive neurons in primate brain area MSTd have been linked to linear heading perception, but the neural basis of more general curvilinear self-motion perception is unknown. The optic flow in this case is more complex and depends on the gaze direction and curvature of the path. We investigated the extent to which signals decoded from a neural model of MSTd predict the observer’s curvilinear self-motion. Specifically, we considered the contributions of MSTd-like units that were tuned to radial, spiral, and concentric optic flow patterns in “spiral space”. Self-motion estimates decoded from units tuned to the full set of spiral space patterns were substantially more accurate and precise than those decoded from units tuned to radial expansion. Decoding only from units tuned to spiral subtypes closely approximated the performance of the full model. Only the full decoding model could account for human judgments when path curvature and gaze covaried in self-motion stimuli. The most predictive units exhibited bias in center-of-motion tuning toward the periphery, consistent with neurophysiology and prior modeling. Together, findings support a distributed encoding of curvilinear self-motion across spiral space.

Substantial progress has been made over the past several decades toward understanding how neurons in visual cortex encode information about the environment and state of the observer. The pioneering work of Hubel and Wiesel 1 exemplifies the successes that have been garnered in primary visual cortex and other areas early in the visual hierarchy 2 . Tuning properties and the function of neurons in areas further along in extrastriate cortex, however, have remained more elusive.
One such area is the dorsal medial superior temporal area (MSTd), which has long been associated with self-motion perception. Much neurophysiological work has focused on translation, the self-motion scenario wherein the observer moves along a straight path without eye or head movements. Forward translation generates a pattern of motion on the eye (optic flow) that radiates outward from a singularity (radial expansion; Fig. 1A) known as the focus of expansion (FoE), which specifies the observer's direction of travel (heading) 3 . Many MSTd neurons exhibit tuning to the position of the FoE in the radial optic flow pattern 4,5 and causal links have been established to heading perception [6][7][8] .
Translation, together with another type of self-motion known as rotation, fully characterize the possible optic flow patterns (to a first-order approximation) that may arise during movement through a rigid environment 9 . Rotation arises during eye movements and has three primary components: pitch, yaw, and roll. Vertical and horizontal eye movements are examples of pitch and yaw rotation, respectively, and yield mostly uniform motion with some perspective distortion at the periphery (Fig. 1B,C). Clockwise (CW) and counterclockwise (CCW) rotation of the head about the line of sight are examples of roll rotation and generates concentric optic flow patterns (Fig. 1D,E). Combinations of translational and rotational self-motion yield spiral patterns, as Fig. 1F depicts for forward translation and roll. The continuum of patterns has been said to represent a "spiral space" 10,11 . Neurons demonstrate systematic tuning to patterns that span the spiral space in MSTd [10][11][12] and in areas 7a 13,14 and VIP 15 , other areas at a similar depth along the dorsal stream 16 with extensive optic flow sensitivity.
Much of what is known about the role of these brain areas in self-motion perception presupposes movement along a straight path. The neural basis of more general curvilinear self-motion is presently unknown. In the present study, we performed a computational analysis to determine whether populations of spiral space-tuned neurons could encode self-motion along curvilinear paths. Curvilinear self-motion commonly arises during naturalistic self-motion and yields optic flow (Fig. 1G-J) with both a translational component, due to observer movement, and a rotational component, due to rotation of the view about a point in the world (e.g. center of circular path). In the present study, we examine the case where the observer moves along a circular path of selfmotion of constant radius. We focus on the neural encoding of path curvature, path sign, and gaze offset, three key parameters that describe the curvilinear self-motion of the observer. Path curvature is inversely related to the radius of the circular path traversed by the observer (Fig. 1H). Path sign indicates whether the observer moves along a CW or CCW path (Fig. 1I). Gaze offset refers to the horizontal offset in the observer's gaze (i.e. bias in camera orientation) relative to the straight-ahead (Fig. 1J).
The present study addresses the following questions: • Could units tuned to optic flow patterns in spiral space accurately encode the curvilinear self-motion of the observer? To what extent does the neural population plausibly account for human perceptual judgments? • Do contributions predominately come from units tuned to specific patterns or is the model consistent with a distributed representation across spiral space? Are units tuned to radial expansion sufficient to decode accurate curvilinear self-motion estimates? • What are the singularity tuning characteristics of the units that contribute to estimates of curvilinear selfmotion? Specifically, are they uniformly distributed or biased toward the center or periphery?
To address these questions, we fit linear decoders to predict gaze offset and path curvature parameters from the activations of model MSTd units and compared the accuracy of estimates with human judgments (Fig. 2). We used lasso regularization 17,18 to create sparse decoding models that select patterns that are most predictive by an observer moving straight-forward toward a frontoparallel wall. A stationary observer that performs a downward (B) or rightward (C) eye movement experiences approximately laminar optic flow due to pitch and yaw rotation, respectively. A stationary observer that rotates the head about the line of sight experiences CW (D) or CCW (E) concentric motion due to roll rotation. (F) Sample optic flow patterns in spiral space. The "spirality" metric defines the position in the spiral space and the proportion of roll rotation that is added to the radial translation field (center). Negative (positive) spirality yields CCW (CW) motion. (G-J) Optic flow fields generated by different types of linear and curvilinear self-motion over a ground plane. (G) Self-motion along straight-ahead (left) and rightward (right) linear trajectories. (H) Curvilinear self-motion along circular paths with different radii. (I) Curvilinear self-motion along CW (left) and CCW (right) 5 m radius circular paths. (J) Curvilinear self-motion along 5 m radius circular paths with the observer gaze offset from the path tangent (left: gaze directed outside the future path, center: gaze along path tangent as in (H,I); right: gaze directed inside the future path).

Results
We simulated a neural model that first processes the input optic flow (e.g. Figs. 1G-J and 2 left panel) with filters, which emulate the direction and speed tuning properties of the medial temporal (MT) area. We then matched the resulting activity distribution with template patterns that represent each unit's preferred spiral space pattern (Fig. 1F). This feedforward template matching paradigm is widely used to model optic flow selectivity in MSTd [21][22][23][24][25][26] . We examined the encoding of curvilinear self-motion in a population of model units tuned to optic flow patterns ("MSTd") sampled from spiral space (henceforth "Full model"). Because one of our datasets consists of optic flow generated by simulated curvilinear self-motion over a ground plane (Fig. 1H-J), the Full model has units tuned to both full-field and hemi-field versions of spiral space patterns, the latter of which only included sensitivity to motion at or below the position of the singularity, which we refer to as the center of motion (CoM). For each preferred optic flow pattern, we simulated units tuned to different CoM positions sampled across the visual field.
We compared the Full model with a simpler model consisting only of units tuned to radial expansion (henceforth "Radial Only model"). We fit the linear decoders associated with the Full and Radial Only MSTd models using the activations produced from 900 optic flow samples that varied with respect to path curvature (radius: 5-200 m), gaze offset (− 35 to + 35°), and path sign (CW, CCW). Path sign was predicted using a linear binary classifier. Unless noted otherwise, simulations involve optic flow from self-motion over a ground plane; we obtained similar results on a dataset consisting of simulated self-motion through a 3D dot cloud (see Supplementary Information).
The lasso yielded sparse models for decoding gaze offset and path curvature that included 758 (3.52%) and 718 (3.34%) units, respectively, of the simulated 21,504 MSTd units derived from the Full model (16 × 16 grid of CoM positions across the visual field tuned to 42 CW and 42 CCW spiral space patterns). The decoders derived from the Radial Only model to estimate gaze offset and path curvature included 83 (32.42%) and 87 (33.98%) units, respectively, of the 256 units (16 × 16 CoM) in the Radial Only model. Figure 3 summarizes the accuracy of estimates of gaze offset (Fig. 3A,B) and path curvature (Fig. 3C,D) decoded from the full (top row) and radial only (bottom row) models on the 500 novel test optic flow samples. To quantify error across the dataset, we use mean absolute error (MAE). Gaze offset estimates decoded from the Full model (MAE: 0.76°, Fig. 3A) were substantially more accurate than those obtained from the Radial Only model (4.06°, Fig. 3B). Path curvature estimates decoded from the Full model (0.0027 1/m Fig. 3C) were also much more accurate than those obtained from the Radial Only model (0.0070 1/m, Fig. 3D). Although the predictions of the Full model were based on more units than the predictions of the Radial Only model, the Full model achieves more accurate predictions than the Radial Only model even when we constrain it to have the same number of nonzero regression weights (i.e. free parameters). This analysis is described in the Model Comparison section below.
To better appreciate the practical significance of the discrepancy between the models, we expressed the error in path curvature estimates with respect to the difference in visual angle subtended between the true and decoded the paths ("path error"; Fig. 4A). Small angular differences imply close proximity between the two paths from the observer's vantage point. We measured this difference 10 m from the observer, a distance that has been used to obtain human curvilinear path judgments 27 . This analysis revealed that the Full model estimated paths that deviated by 0.72° (Fig. 3E)  Comparison to human judgments. We compared the accuracy of model estimates to that of human judgments of curvilinear self-motion. Our aim was not to perform an exhaustive analysis of the existing psychophysical literature; rather we sought to gauge the sufficiency of model parameter estimates for capturing human performance under the simulated conditions.   Comparison between path the error produced by human subjects in the "gaze-along-heading" condition from Li and Cheng 27 and that produced by the decoding models. In this condition, the gaze offset is 0°. Positive (negative) path error corresponds to an overestimation (underestimation) of path curvature. (C) Comparison between the gaze offset error produced by human subjects in the "envelop" condition from Burlingham and Heeger 28 and that produced by the decoding models. In this condition, gaze offset covaried with path curvature. X values are staggered for visual clarity. www.nature.com/scientificreports/ We simulated the "gaze-along-heading" condition from Li and Cheng 27 in which subjects judged path curvature while viewing simulated movement over a dot-defined ground plane. We used the previously fit decoders to estimate path curvature in this new condition. We expressed decoding error in terms of the previously used difference in visual angle ("path error"; Fig. 4A) measure to afford a direct comparison to the Li and Cheng data. Figure 4B suggests that both models produce human-like estimates when gaze is aligned with the instantaneous heading direction. Consistent with Li and Cheng, the positive error indicates an overestimation of the path curvature in both models. The difference between the path error predicted by the model and the path error based on the human data was significantly smaller for the Full model compared to the Radial Only model [t(59) = 2.25, p = 0.03]. However, inspection of Fig. 4B indicates that the advantage for the Full model is not consistent across path curvatures.
This prompted us to compare performance under more challenging conditions wherein gaze offset varied over a wider range. We simulated the "envelop" condition from Burlingham and Heeger 28 in which humans judged the gaze offset angle [− 20, 20°] in displays simulating curvilinear self-motion along 43 m and 107 m radius paths. We expressed these path radii with respect to the rotation rate in the optic flow to afford direct comparison to their study (see "Methods"). Figure 4C shows that the Radial Only model does not plausibly account for human gaze offset judgments. The Full model yields comparable error to humans on the high curvature path (2°/s rotation) condition and slightly greater error compared to humans on the straight and 107 m paths, but the discrepancy (1-2°) is quite small. The difference between the gaze offset error predicted by the model and the gaze offset error based on the human data was significantly smaller for the Full model compared to the Radial Only model [t(27) = 5.81, p < 0.001]. This suggests that cells tuned to non-radial patterns in spiral space may play a role in estimating gaze offset with human-like accuracy.
Model position and pattern tuning characteristics. Given the improved accuracy of the Full model and its consistency with human data, we analyzed the tuning properties of the model units that contribute to self-motion parameter estimates. Figure 5 shows the number of units with different CoM preferences that contribute to gaze offset ( Fig. 5A) and path curvature ( Fig. 5B) estimates. Far more units tuned to peripheral CoM positions contributed to estimates than those tuned to central CoM positions. That is, the lasso set many more units with central CoM preferences to zero than those with peripheral preferences in both decoding models. As we explain in the Discussion, this finding is compatible with findings from both neurophysiological and computational studies.
Next, we considered how the relative contribution of units varied across pattern within the spiral space. This is quantified in terms of the relative strength of regression weights. Units that share a large proportion of the overall weight make more substantial contribution to parameter estimates than others with a smaller share of the weight. We use the "spirality" metric to express the fraction that the preferred spiral space pattern tuning depends on rotational optic flow. For example, 0 implies sensitivity to radial expansion, ± 1 implies sensitivity to concentric motion, and values in between that approach ± 1 imply sensitivity to spirals with increasing curvature (Fig. 1F). Figure 5C plots the proportion of the total weight that is shared among units tuned to different spiral space patterns in the gaze offset and path curvature decoders. Units tuned to concentric motion accounted for the most weight compared to units tuned to any other pattern type and therefore contribute most to estimates of gaze offset (21.7%; spirality 1) and path curvature (26.9%; spirality 1) in the Full model. The remaining proportion of the weight in both decoders is broadly shared across the other pattern types. This suggests a distributed representation across spiral space, which agrees with the findings of Xu and colleagues that show fairly even contributions among neurons tuned across spiral space toward monkey heading judgments 10 . Figure 5D shows the proportion of units tuned to each pattern type included in each decoder. Taken together with Fig. 5C, units tuned to concentric motion represent the most numerous group and contribute most to estimates compared to any other single group.
Model comparison. The simulation results from Figs. 3 and 4 reveal the insufficiency of a model constructed with units tuned only to radial patterns. However, findings depicted in Fig. 5C raise the possibility that units tuned to other narrow ranges of the spiral space could primarily mediate the estimates produced by the Full model. We examined this possibility further by constructing models that constrain decoding from focused portions of spiral space. Specifically, we compared the accuracy of gaze offset and path curvature estimates garnered by the Full model with separate decoders fit to units tuned only to radial expansion ('Radial Only'); to spirals associated with the lowest, middle, and upper third of spiralities; and to concentric motion ('Concentric Only'). We also sought to address the extent to which the Full model's superior accuracy to the Radial Only model may be attributed to the larger number of units that it includes. To that end, we included in the model comparison a version of the Full model constrained to possess the same number of nonzero weights as the Radial Only model ('Full Limited').
Consistent with the analysis shown in Fig. 3 and Fig. 6 shows that the Radial Only model garnered considerably worse accuracy than the Full model when estimating gaze offset and path curvature. Decoding from any one group of spiral-tuned units or the concentric units produced gaze offset estimates that were almost as accurate as those produced by the Full model (Fig. 6A). In the case of path curvature, the accuracy garnered by the spiral-tuned decoders was approximately halfway between that achieved by the Radial Only and Full models (Fig. 6B). In contrast to gaze offset, the accuracy of the path curvature estimates decoded from the concentric units was poor and similar to that decoded from the Radial Only model. The discrepancy in performance between concentric unit models could stem from the larger number of concentric motion units that contribute to gaze offset predictions (Fig. 5D). It is possible that sensitivity to concentric motion on its own may not capture enough information that is essential for predicting path curvature across a range of gaze offsets (Fig. 1J). www.nature.com/scientificreports/ The accuracy of estimates produced by the Full Limited model was in between those achieved from Radial Only and Full models and comparable to the models constructed from thirds of the spiral space. The improved accuracy of the Full Limited model compared to the Radial Only model suggests an advantage of decoding gaze offset and path curvature from neurons tuned to a broad, distributed set of spiral patterns, even when controlling for the number of free parameters.
We examined the tradeoff between free parameters and accuracy more formally by computing the Akaike Information Criterion (AIC) for each model (Fig. 6C,D). This analysis also helps contextualize the accuracy of the other models, each of which contained different numbers of nonzero weights. For gaze offset, the numbers of nonzero weights were 385, 243, 83, and 73 for the Lowest Third, Middle Third, Upper Third, and Concentric Only models, respectively. For path curvature, these numbers were 268, 407, 342, and 171 for the Lowest Third, Middle Third, Upper Third, and Concentric Only models, respectively. Figure 6C,D reveals that the Full Limited model is favored according to the AIC metric for both gaze offset and path curvature estimates. Taken together, these results support a distributed representation of curvilinear self-motion across spiral space.

Discussion
While neurophysiology has linked the perception of linear self-motion to the activity of neurons in MSTd, the neural basis of more general curvilinear self-motion perception has been unclear. Our computational analysis suggests that tuning to combinations of translational and rotational optic flow in spiral space could support estimates of curvilinear self-motion. Decoding from units tuned to radial motion resulted in substantially less accurate and more variable estimates than the full spiral space model. While radial units in our simulations accounted for human judgments when gaze was aligned with the straight-ahead, they did not when gaze offset covaried with path curvature. The full spiral space model accounted for human judgments in both scenarios. Several of our analyses indicated a distributed representation of gaze offset and path curvature across spiral www.nature.com/scientificreports/ space. We found considerable redundancy in the representation of gaze offset-decoding from units tuned to non-radial patterns in spiral space yielded estimates that closely approximately those obtained by the Full model. This was mostly true of path curvature, albeit to a weaker extent and decoding from units tuned to concentric motion alone yielded poor estimates that were similar to those produced by the radial subpopulation. Overall, our findings support the possibility that MSTd could jointly signal the observer's linear and curvilinear self-motion. We used lasso regularization to create sparse linear decoding models that select the units that are most predictive of each self-motion parameter. In the resulting models, we found an overrepresentation of units tuned to peripheral centers of motion (Fig. 5) regardless of whether the optic flow simulated self-motion both over a ground plane and through a 3D dot cloud (see Supplementary Information). This agrees with the peripheral bias in the distribution of MSTd heading preferences reported by neurophysiological studies 4,29 . Peripheral bias in the population tuning increases sensitivity to central headings 30 and enhances the overall accuracy of heading estimates 25 . Both outcomes may arise because broad, Gaussian-like tuning curves of MSTd neurons tuned to peripheral headings change most rapidly for central headings 30,31 . Our model is compatible with this hypothesis since the MSTd units are center-weighted about the preferred singularity position (Eq. 16). Beyeler and colleagues have proposed a computational model that explains many physiological properties of MSTd neurons, including the population peripheral bias, as an emergent outcome of a dimensionality reduction process 19 . The model produces a sparse representation of optic flow fields encountered during random combinations of translation and rotation. It is interesting that both their model and ours yield peripheral bias in CoM selectivity despite the difference in approach. The lasso regularization used in the present study also favors a model fit with a sparse set of predictors. This converging finding further supports the idea that peripheral bias in CoM selectivity may result from a process that promotes a sparse distributed representation of optic flow. It is noteworthy that the peripheral bias was found despite uniformity in the parameters used to generate the stimuli and regardless of whether the range of gaze offsets was restricted to ± 35 deg (in the present study) or spanned the entire 360° range (in Beyeler et al.). The relationship between the input statistics, tuning curving properties, and peripheral bias should be explored further in future research. www.nature.com/scientificreports/ The use of a linear decoder in the present study also advances neural modeling of optic flow processing by expanding the range of self-motion properties that can be estimated. Within the literature on this topic 21,23,24,[32][33][34][35] , which spans almost 30 years, the standard approach to "reading out" the model's heading estimate is to find the focus of expansion (FoE) location of the most active MSTd unit. This approach makes sense for pure translational self-motion because there is a simple one-to-one correspondence between self-motion direction and FoE location. However, when we expand the scope to include complex self-motion trajectories and the full gamut of MSTd cells types, it no longer makes sense to assume that the model's best estimate of self-motion is reflected in the activity of a single unit. A more effective approach is to look at the global pattern of activity across units to extract self-motion estimates. The approach adopted in the present study offers a way to decode multiple selfmotion parameters (e.g., gaze angle, path curvature) from the global activity across cells tuned to radial, spiral, and concentric flow patterns.
We emphasize that although neurons in MSTd, VIP, and 7a demonstrate spiral space pattern selectivity, no existing neurophysiological study has specifically linked this tuning to curvilinear self-motion perception. Therefore, it remains an open question whether these cortical areas possess specialized mechanisms that subserve curvilinear path perception. Cheng and Gu have shed some light on this issue, demonstrating that neurons in MSTd, VIP, and VPS exhibit tuning to translation, rotation, and the combinations thereof that arise during curvilinear movement on a 6 DOF platform. Interestingly, the analysis performed by Cheng and Gu revealed a distributed representation of translation and rotation, which supports our findings. The conditions in this study involve vestibular stimulation without accompanying optic flow, so it is unclear how these cells might relate to spiral space optic flow tuning. Nevertheless, the robust vestibular tuning and ~ 80% linear decoder accuracy suggests that vestibular signals could contribute significantly toward the neural basis of curvilinear self-motion perception when the information is available. Further neurophysiological work is required to clarify the connection between spiral space and vestibular tuning.
It should be noted that the present study does not address the influence of eye rotation, which have been shown to modulate MSTd signals during self-motion [36][37][38] and human self-motion judgments [39][40][41] . Eye rotations introduce an endogenous source of rotation into the optic flow field that is independent from the rotation caused by curvilinear self-motion. To estimate curvilinear self-motion during eye movements, the prevalent efference copy theory and related models propose that the visual system relies on nonvisual signals (e.g. proprioceptive or motor signals specifying eye rotation rate) to factor out the rotational flow due to eye movements, leaving on the rotational flow due curvilinear self-motion 42,43 . Such a mechanism could be used to remove the effects of eye rotation prior to decoding from our model. However, the extent to which MSTd decomposes the optic flow field into distinct translational and rotational components visually 44 or in the presence of non-visual signals 36 has been questioned. The mechanism by which the visual system tolerates rotation in optic flow during selfmotion remains unclear.
Some have examined linear heading perception when gaze remains fixed and eye movements are simulated within the virtual environment of the display ("simulated eye movement condition") 39 . In the absence of proprioceptive or motor signals that accompany real eye movements and when the scene lacks dense motion parallax 45 , humans appear to treat the simulated rotation as curvilinear self-motion without eye movements 46,47 . This suggests that rotation due to curvilinear self-motion is not factored out, decomposed, or removed, at least for the purposes of estimating curvilinear self-motion 48 . The system presented here is consistent with this proposal, since neither the neural nor the decoding models explicitly remove rotation due to curvilinear self-motion. Indeed, pattern tuning in MSTd may emerge through the dimensionality reduction that neurons perform on their motion inputs 19 rather than a process whose purpose is to segment translation from rotation.

Methods
Optic flow datasets. We created an optic flow dataset consisting of 10 frame digital sequences of simulated curvilinear self-motion over a ground plane (2000 dots) at 64 × 64 pixel resolution. In each video, the simulated observer translated at 3 m/s along a circular path. The virtual camera had a 90° horizontal and vertical field of view and a height of 1.61 m above the ground. On each video frame, we computed the optic flow using a pinhole camera model 49 and standard analytic equations 9 . We clipped and replaced dots that exited the field of view or valid relative depth range (1-50 m) to ensure that the same number of dots always remained visible. Figure 1G-J shows some example optic flow fields from the dataset.
Training sets. The training set used to fit the decoders consisted of 900 videos. We sampled the gaze offset parameter on a regular grid to ensure balanced coverage (− 35 to 35° in steps of 8.7°). We sampled path radius differently than the other parameters due to the nonlinear relationship between circular path radius and curvature (Eq. 1). We accounted for this by sampling small radii more finely than large radii using the following recursive generating formula: We used a starting radius of 5 m and selected the coefficient so that Eq. (1) would generate 50 radii between 5 and 200 m. We doubled the number of radii by including both CW and CCW versions of each path.

Test sets.
We generated a 500-video test set to assess the quality of predictions made by the fit decoders.
Gaze offset was sampled from a uniform distribution [− 35, 35°]. We used kernel density estimation (KDE) to randomly sampled path radii in proportion to the training set values. As in the training set, this biased values toward smaller radii that produce larger variation in path curvature. We fit the path radii used in the training set to an empirical probability density function via the MATLAB function ksdensity, which we then sampled (1) r next = 1.078r prev . www.nature.com/scientificreports/ within the training set range with 0.1 m granularity. As in the training set, we also included CCW versions of these paths. For Fig. 4, we generated optic flow sequences that emulated the gaze-along-heading condition from Li and Cheng 45 : a gaze offset of 0° and 28, 38, and 58 m path radii. We simulated 20 repetitions of each path radius condition across which only the random placement of dots varied. The Li and Cheng test set consisted of 60 videos. 3D dot cloud environment. For the simulations described in Supporting Information, we used a dataset that satisfies the same train and test set parameters, except that we distributed the 2000 dots in a 3D cloud rather than on a ground plane. These simulations were the basis for our comparison with the Burlingham and Heeger 28 data. Burlingham and Heeger used simulated curvilinear self-motion with 1.5 m/s translation and 0, 0.8, or 2.0°/s rotation. The latter two curvilinear cases correspond to path radii of 107 m and 43 m, respectively. We compared corresponding human judgments of gaze offset with model predictions obtained on the 3D cloud test set stimuli that had a path radius within 2.5 m of the radii used by Burlingham and Heeger. To make a comparison to the straight path condition (0°/s rotation), we considered the predictions made for test stimuli with path radii ≥ 180 m. For such large path radii, the influence of rotation is small (~ 0.45 • /s rotation) and the optic flow appears effectively radial.

Decoding.
We fit a linear model with lasso regularization to the neural activations produced by the neural model (described below) at the end of each video (frame 10) in the optic flow training set. The models for decoding path curvature and gaze offset made use of the lasso MATLAB function. We set the regularization parameter such that the mean squared error (MSE) of the fit was within 1 standard error of the minimum MSE computed with fivefold cross validation. To fit the Full Limited model, we used the DFMax argument of the lasso function to constrain the number of nonzero weights in the fit to the those from the Radial Only model. We classified path sign according to a support vector machine binary classifier using the fitcsvm MATLAB function configured with the default linear kernel.
The values depicted in Fig. 6A,B reflect the outcome of bootstrap procedure performed 50 times. For each bootstrap, we simulated each model with optic flow sampled with replacement from the original dataset while preserving the total number of samples. Figure 6A,B shows the MAE and its standard deviation computed over each set of 50 bootstrap estimates.

Neural model of MT and MSTd. We processed the optic flow datasets with the Competitive Dynamics
(CD) neural model of MT and MSTd (Fig. 7) 22,35,50,51 . Populations of model neurons that emulate a range of physiologically supported properties transform the optic flow into neural signals from which we subsequently decode the parameters describing the observer's self-motion. The CD model accounts for human judgments of self-motion, including in the presence of independently moving objects 22,52 , and object motion during selfmotion 50,51,53 . We made several key changes to the most recent version of the model 50,51 : • Given focus in the present work on self-motion estimation, we only simulated the requisite MT + -MSTd pathway of the model, excluding the MT --MSTv object motion pathway 50,51 . • We simulated units tuned to direction and speed only, omitting disparity tuning.
• Each MSTd unit received input from a random subset of the MT + units 20 . In prior versions of the model, MSTd units were fully connected to the constellation of MT + units that compose the preferred pattern. • For greater clarity and to better manage the greater diversity of cell types simulated (e.g. radial, spiral, concentric), we divided model MSTd into several processing stages (layers). The first stage integrates the match between each optic flow template over time. This is followed by spatial smoothing of the activation of units tuned to similar CoMs and patterns. Finally, units compete with one another in an on-center/off-surround contrast-enhancing network.
These changes and the complete model details will be fully described in the following sections. Figure 7 summarizes the architecture of the CD model simulated here and Table 1 specifies the values of parameters used to simulate the neural model.

Model area MT.
We simulated the population of direction and speed tuned neurons in MT with excitatory surrounds (MT + ), which project to MSTd 16,54,55 and influence self-motion signals therein 56 .
MT + unit receptive fields (RFs) tuned to like speeds and directions were arranged in a regular 64 × 64 spatial grid across the visual field.
MT speed tuning. We simulated units tuned to 5 (N s ) different preferred speeds. Each unit's speed tuning followed a log-normal distribution: where σ v defines the speed tuning bandwidth; s 0 defines a non-negative offset parameter to prevent the singularity in the logarithm at 0; and v pref defines the preferred speed of the model neuron. Given that MT neuron speed tuning varies considerably, we sampled values from probability distributions that approximate neurophysiological www.nature.com/scientificreports/ fits to these parameters. Based on Fig. 4 of Ref. 57 , we sampled σ v from a Gaussian distribution (mean: 1.16°, SD: 0.5°) and s 0 from an exponential distribution (λ: 0.25°/s). Consistent with Fig. 8 of Nover et al., we sampled v pref from five uniform distributions with endpoints that yielded octave spaced bins (see Table 1).
We computed optic flow speed at retinotopic position (x, y) as: where u x,y and v x,y represent the horizontal and vertical components of the optic flow vector.
MT direction tuning. We simulated units tuned to 24 (N d ) motion directions (Fig. 7). We modeled each unit's direction tuning width using the following von Mises distribution where d represents the direction of the optic flow in the receptive field, µ dir represents the unit's preferred direction, the shape parameter η controls the direction tuning width. We set η = 3 • to approximate the 90° full-width at half-maximum bandwidth of MT neurons 6,19 . We computed the direction of the optic flow vector − → v x,y = (u x,y , v x,y ) at position (x, y) via the two-argument form of the arctangent function: MT unit activation. The net input to the MT + unit tuned to speed s , direction d , and has its RF center at position (x, y) is the product of the directional V x,y ; d, η and speed S s x,y ; s, s 0 , σ v tuning curve outputs: We modeled the dynamics of MT + unit m s,x,y,d as a simple leaky integrator: (3) s x,y = u 2 x,y + v 2 x,y , x,y = atan2 v x,y , u x,y .  MT output. Among models that explain MSTd responses based on their feedforward input from MT, those that include a nonlinearity that compresses MT signals perform best 26 . The compressive nonlinearity could be explained by synaptic depression, the tendency for the same inputs to lose their efficacy over time. We modeled MT + synaptic depression h s,x,y,d as follows: where O s,x,y,d denotes the output signal from MT + units to MSTd, τ MT is the synaptic time constant, and κ MT represents the rate at which the efficacy of the input signal m s,x,y,d declines over time 58 .
MSTd net input. We simulated MSTd cells tuned to radial expansion ( − → � rad,i,j,x,y ) and spiral ( − → � spl,i,j,x,y ) optic flow patterns that have been used to characterize MSTd selectivity in neurophysiological studies 5 − → � rad,i,j,x,y = v rad,dx , v rad,dy = (x − i), y − j , www.nature.com/scientificreports/ In Eqs. (10)(11)(12), (i, j) indicates the location of the center of motion (CoM), which corresponds to the FoE in the case of radial expansion; (x, y) indicates visuotopic position of the local motion vector, ζ spl is 1 for CW spirals and − 1 for CCW spirals; and spl is a scalar selected in the range 0-1 that represents the spirality of the pattern. Setting spl = 0 produces a radial expansion flow field, spl = 1 produces a circular center flow field ( − → � cir,i,j,x,y ) , and 0 < spl < 1 creates spiral patterns spanning a continuum between these two extremes. We created MSTd units tuned to N spl spiral patterns by sampling regularly spaced values of spl (see Table 1). We simulated MSTd units tuned to both full-field and lower the hemifields. The latter was achieved by removing motion vectors above the CoM.
We implemented MSTd tuning to the preferred pattern using direction templates that select MT + signals when they appear in appropriate spatial locations. For example, in the case of a cell tuned to radial expansion with a centrally positioned FoE, the rightward direction template pools the responses of MT + cells tuned to rightward motion when their receptive fields coincide with the right side of the visual field. The following equations define the RF template T ψ,d,i,j,x,y for pattern ψ that integrates MT + cells tuned to direction d , normalized by the total number of pooled cells ( x y): In Eq. (14), d indexes the N d MT preferred directions. Within each pattern template, we sampled a small subset of the possible connections from MT + units (e.g. 200 instead of 64 × 64 × 24 = 98304 ). Sparse connectivity is more biologically plausible and substantially improves the algorithmic efficiency of the model. We implemented this by randomly selecting only N T connections to MT + units within each template tuned to pattern ψ with CoM positioned at (i, j) . After drawing N T pairs of (x, y) indices per template, we selected directions at each spatial location according to the uniform distribution U[1, N d ] . We denote the set of N T three-tuple (x, y, d) indices of MT + units sampled within each template . We repeated this process for each of the 5 (N s ) preferred speeds. This transformed the original 64 × 64 × 24 × 5 signal into one with shape N d × N T .
The following equation matches the direction templates with the output signals from MT + (O s,x,y,d ) at the set of sampled spatial and directional indices (x, y, d) ∈ � to compute the net input (R ψ,i,j ) to the MSTd unit that prefers the pattern ψ with a CoM positioned at (i, j): In Eq. (16), the exponential function makes MSTd units more sensitive to motion nearby the preferred CoM position and the parameter b MSTd modulates how sensitivity decreases with distance 22,32,35 . Dividing by the maximum directional input at each position ensures that dominant inputs make comparable contributions across visuotopic space before the exponential distance weighting from the CoM is applied.

MSTd layer 1a.
We arranged MSTd units into separate layers for organizational clarity. Note that these layers do not correspond to anatomical laminae. Layer 1 corresponds to MSTd units that are driven by their pattern tuning and perform spatiotemporal averaging of their input R ψ,i,j . We divided the temporal (Layer 1a) and spatial (Layer 1b) averaging into two sequential stages. Layer 2 includes recurrent competitive connections that normalize the total activation while enhancing the contrast between strongly and weakly active MSTd units. We rely on the signals produced by Layer 2 for decoding the parameters specifying the observer's self-motion.
(13) χ ψ,d,i,j,x,y = atan2 ψ,i,j,x,y,dy , ψ,i,j,x,y,dx , www.nature.com/scientificreports/ and convolving the pattern signals at each CoM position with a one-dimensional Gaussian: Because we arranged units tuned to similar patterns along a circular spiral pattern continuum (see Table 1), we wrapped the convolution in Eq. (21) around the boundaries.
The dynamics of Layer 1b units obey: MSTd layer 2. The input to Layer 2 undergoes an adaptive threshold to suppress signals weaker than the mean activation value c ψ,i,j achieved among units tuned to the same pattern over the recent time history where N MSTd,com represents the total number of units tuned to the same pattern with differing CoM positions. The net input to Layer 2 is the Layer 1b output signal with the threshold applied: where ⌈·⌉ + represents the half-wave rectification max(·, 0). Layer 2 integrates signals from Layer 1b and contains recurrent connections within the layer that normalize activation, facilitate competition, and stabilize signals under changing conditions 22 . Units obey the recurrent competitive field with a contrast-enhancing feedback transfer function f (w) = w 2 , where w represent the unit activation 59 : In Eq. (25), β MSTd,2 represents the excitatory upper bound of each unit and the terms involving f (·) implement competition among units tuned to the same pattern. The terms D ψ,i,j E and p =ψ m =i n =j D p,m,n describe excitatory and inhibitory recurrent connections, respectively, that subserve activity normalization across the layer 60 . In the former term, D ψ,i,j denotes the difference between each unit's current activation and the feedforward input signal: The variable E defines the discrepancy between input and activation across the entire layer: This means that the network attempts to conserve the total activation across the entire layer, not just among units tuned to the same pattern.