## Abstract

Neural activity exhibits complex dynamics related to various brain functions, internal states and behaviors. Understanding how neural dynamics explain specific measured behaviors requires dissociating behaviorally relevant and irrelevant dynamics, which is not achieved with current neural dynamic models as they are learned without considering behavior. We develop preferential subspace identification (PSID), which is an algorithm that models neural activity while dissociating and prioritizing its behaviorally relevant dynamics. Modeling data in two monkeys performing three-dimensional reach and grasp tasks, PSID revealed that the behaviorally relevant dynamics are significantly lower-dimensional than otherwise implied. Moreover, PSID discovered distinct rotational dynamics that were more predictive of behavior. Furthermore, PSID more accurately learned behaviorally relevant dynamics for each joint and recording channel. Finally, modeling data in two monkeys performing saccades demonstrated the generalization of PSID across behaviors, brain regions and neural signal types. PSID provides a general new tool to reveal behaviorally relevant neural dynamics that can otherwise go unnoticed.

## Access options

Subscribe to Journal

Get full journal access for 1 year

$209.00

only $17.42 per issue

All prices are NET prices.

VAT will be added later in the checkout.

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

## Data availability

The data used to support the results are available upon reasonable request from the corresponding author.

## Code availability

The code for the PSID algorithm is available online at https://github.com/ShanechiLab/PSID.

## References

- 1.
Schwartz, A. B., Cui, X. T., Weber, D. J. & Moran, D. W. Brain-controlled interfaces: movement restoration with neural prosthetics.

*Neuron***52**, 205–220 (2006). - 2.
Shenoy, K. V., Sahani, M. & Churchland, M. M. Cortical control of arm movements: a dynamical systems perspective.

*Annu. Rev. Neurosci.***36**, 337–359 (2013). - 3.
Shanechi, M. M. Brain–machine interface control algorithms.

*IEEE Trans. Neural Syst. Rehabil. Eng.***25**, 1725–1734 (2017). - 4.
Shanechi, M. M. Brain–machine interfaces from motor to mood.

*Nat. Neurosci.***22**, 1554–1564 (2019). - 5.
Herff, C. & Schultz, T. Automatic speech recognition from neural signals: a focused review.

*Front. Neurosci.***10**, 429 (2016). - 6.
Sani, O. G. et al. Mood variations decoded from multi-site intracranial human brain activity.

*Nat. Biotechnol.***36**, 954–961 (2018). - 7.
Mante, V., Sussillo, D., Shenoy, K. V. & Newsome, W. T. Context-dependent computation by recurrent dynamics in prefrontal cortex.

*Nature***503**, 78–84 (2013). - 8.
Hoang, K. B., Cassar, I. R., Grill, W. M. & Turner, D. A. Biomarkers and stimulation algorithms for adaptive brain stimulation.

*Front. Neurosci.***11**, 564 (2017). - 9.
Kaufman, M. T. et al. The largest response component in the motor cortex reflects movement timing but not movement type.

*eNeuro*https://doi.org/10.1523/ENEURO.0085-16.2016 (2016). - 10.
Gallego, J. A. et al. Cortical population activity within a preserved neural manifold underlies multiple motor behaviors.

*Nat. Commun.***9**, 4233 (2018). - 11.
Russo, A. A. et al. Motor cortex embeds muscle-like commands in an untangled population response.

*Neuron***97**, 953–966.e8 (2018). - 12.
Allen, W. E. et al. Thirst regulates motivated behavior through modulation of brainwide neural population dynamics.

*Science***364**, eaav3932 (2019). - 13.
Stringer, C. et al. Spontaneous behaviors drive multidimensional, brainwide activity.

*Science***364**, eaav7893 (2019). - 14.
Susilaradeya, D. et al. Extrinsic and intrinsic dynamics in movement intermittency.

*eLife***8**, e40145 (2019). - 15.
Cunningham, J. P. & Yu, B. M. Dimensionality reduction for large-scale neural recordings.

*Nat. Neurosci.***17**, 1500–1509 (2014). - 16.
Gallego, J. A., Perich, M. G., Miller, L. E. & Solla, S. A. Neural manifolds for the control of movement.

*Neuron***94**, 978–984 (2017). - 17.
Remington, E. D., Egger, S. W., Narain, D., Wang, J. & Jazayeri, M. A dynamical systems perspective on flexible motor timing.

*Trends Cogn. Sci.***22**, 938–952 (2018). - 18.
Sadtler, P. T. et al. Neural constraints on learning.

*Nature***512**, 423–426 (2014). - 19.
Gao, P. & Ganguli, S. On simplicity and complexity in the brave new world of large-scale neuroscience.

*Curr. Opin. Neurobiol.***32**, 148–155 (2015). - 20.
Gao, P. et al. A theory of multineuronal dimensionality, dynamics and measurement. Preprint at

*bioRxiv*https://doi.org/10.1101/214262 (2017). - 21.
Churchland, M. M. et al. Neural population dynamics during reaching.

*Nature***487**, 51–56 (2012). - 22.
Kao, J. C. et al. Single-trial dynamics of motor cortex and their applications to brain–machine interfaces.

*Nat. Commun.***6**, 7759 (2015). - 23.
Pandarinath, C. et al. Inferring single-trial neural population dynamics using sequential auto-encoders.

*Nat. Methods***15**, 805–815 (2018). - 24.
Kao, J. C., Stavisky, S. D., Sussillo, D., Nuyujukian, P. & Shenoy, K. V. Information systems opportunities in brain–machine interface decoders.

*Proc. IEEE***102**, 666–682 (2014). - 25.
Wallis, J. D. Decoding cognitive processes from neural ensembles.

*Trends Cogn. Sci.***22**, 1091–1102 (2018). - 26.
Kobak, D. et al. Demixed principal component analysis of neural population data.

*eLife***5**, e10989 (2016). - 27.
Svoboda, K. & Li, N. Neural mechanisms of movement planning: motor cortex and beyond.

*Curr. Opin. Neurobiol.***49**, 33–41 (2018). - 28.
Gründemann, J. et al. Amygdala ensembles encode behavioral states.

*Science***364**, eaav8736 (2019). - 29.
Wu, W., Kulkarni, J. E., Hatsopoulos, N. G. & Paninski, L. Neural decoding of hand motion using a linear state-space model with hidden states.

*IEEE Trans. Neural Syst. Rehabil. Eng.***17**, 370–378 (2009). - 30.
Aghagolzadeh, M. & Truccolo, W. Inference and decoding of motor cortex low-dimensional dynamics via latent state-space models.

*IEEE Trans. Neural Syst. Rehabil. Eng.***24**, 272–282 (2016). - 31.
Yang, Y., Connolly, A. T. & Shanechi, M. M. A control-theoretic system identification framework and a real-time closed-loop clinical simulation testbed for electrical brain stimulation.

*J. Neural Eng.***15**, 066007 (2018). - 32.
Abbaspourazad, H., Hsieh, H. & Shanechi, M. M. A multiscale dynamical modeling and identification framework for spike–field activity.

*IEEE Trans. Neural Syst. Rehabil. Eng.***27**, 1128–1138 (2019). - 33.
Yang, Y., Sani, O. G., Chang, E. F. & Shanechi, M. M. Dynamic network modeling and dimensionality reduction for human ECoG activity.

*J. Neural Eng.***16**, 056014 (2019). - 34.
Yang, Y. et al. Model-based prediction of large-scale brain network dynamic response to direct electrical stimulation.

*Nat. Biomed. Eng*. (in the press). - 35.
Van Overschee, P. & De Moor, B.

*Subspace Identification for Linear Systems*(Springer US, 1996). - 36.
Markowitz, D. A., Curtis, C. E. & Pesaran, B. Multiple component networks support working memory in prefrontal cortex.

*Proc. Natl Acad. Sci. USA***112**, 11084–11089 (2015). - 37.
Buesing, L., Macke, J. H. & Sahani, M. in

*Advances in Neural Information Processing Systems**25*(eds Pereira, F. et al) 1682–1690 (Curran Associates, 2012). - 38.
Yu, B. M. et al. Gaussian-process factor analysis for low-dimensional single-trial analysis of neural population activity.

*J. Neurophysiol.***102**, 614–635 (2009). - 39.
Semedo, J. D., Zandvakili, A., Machens, C. K., Yu, B. M. & Kohn, A. Cortical areas interact through a communication subspace.

*Neuron***102**, 249–259.e4 (2019). - 40.
Cunningham, J. P. & Ghahramani, Z. Linear dimensionality reduction: survey, insights, and generalizations.

*J. Mach. Learn. Res.***16**, 2859–2900 (2015). - 41.
Hsieh, H.-L., Wong, Y. T., Pesaran, B. & Shanechi, M. M. Multiscale modeling and decoding algorithms for spike–field activity.

*J. Neural Eng.***16**, 016018 (2018). - 42.
Shanechi, M. M., Orsborn, A. L. & Carmena, J. M. Robust brain–machine interface design using optimal feedback control modeling and adaptive point process filtering.

*PLoS Comput. Biol.***12**, e1004730 (2016). - 43.
Shanechi, M. M. et al. Rapid control and feedback rates enhance neuroprosthetic control.

*Nat. Commun.***8**, 13825 (2017). - 44.
Stavisky, S. D., Kao, J. C., Nuyujukian, P., Ryu, S. I. & Shenoy, K. V. A high performing brain–machine interface driven by low-frequency local field potentials alone and together with spikes.

*J. Neural Eng.***12**, 036009 (2015). - 45.
Bighamian, R., Wong, Y. T., Pesaran, B. & Shanechi, M. M. Sparse model-based estimation of functional dependence in high-dimensional field and spike multiscale networks.

*J. Neural Eng.***16**, 056022 (2019). - 46.
Wang, C. & Shanechi, M. M. Estimating multiscale direct causality graphs in neural spike–field networks.

*IEEE Trans. Neural Syst. Rehabil. Eng.***27**, 857–866 (2019). - 47.
Yang, Y. et al. Developing a personalized closed-loop controller of medically-induced coma in a rodent model.

*J. Neural Eng.***16**, 036022 (2019). - 48.
Ahmadipour, P., Yang, Y., Chang, E. F. & Shanechi, M. M. Adaptive tracking of human ECoG network dynamics.

*J. Neural Eng*. https://doi.org/10.1088/1741-2552/abae42 (2020). - 49.
Hsieh, H.-L. & Shanechi, M. M. Optimizing the learning rate for adaptive estimation of neural encoding models.

*PLoS Comput. Biol.***14**, e1006168 (2018). - 50.
Yun, K., Watanabe, K. & Shimojo, S. Interpersonal body and neural synchronization as a marker of implicit social interaction.

*Sci. Rep.***2**, 959 (2012). - 51.
Thura, D. & Cisek, P. Deliberation and commitment in the premotor and primary motor cortex during dynamic decision making.

*Neuron***81**, 1401–1416 (2014). - 52.
Haroush, K. & Williams, Z. M. Neuronal prediction of opponent’s behavior during cooperative social interchange in primates.

*Cell***160**, 1233–1245 (2015). - 53.
Herzfeld, D. J., Kojima, Y., Soetedjo, R. & Shadmehr, R. Encoding of action by the Purkinje cells of the cerebellum.

*Nature***526**, 439–442 (2015). - 54.
Ramkumar, P., Dekleva, B., Cooler, S., Miller, L. & Kording, K. Premotor and motor cortices encode reward.

*PLoS ONE***11**, e0160851 (2016). - 55.
Whitmire, C. J., Waiblinger, C., Schwarz, C. & Stanley, G. B. Information coding through adaptive gating of synchronized thalamic bursting.

*Cell Rep.***14**, 795–807 (2016). - 56.
Christophel, T. B., Klink, P. C., Spitzer, B., Roelfsema, P. R. & Haynes, J.-D. The distributed nature of working memory.

*Trends Cogn. Sci.***21**, 111–124 (2017). - 57.
Takahashi, K. et al. Encoding of both reaching and grasping kinematics in dorsal and ventral premotor cortices.

*J. Neurosci.***37**, 1733–1746 (2017). - 58.
Menz, V. K., Schaffelhofer, S. & Scherberger, H. Representation of continuous hand and arm movements in macaque areas M1, F5, and AIP: a comparative decoding study.

*J. Neural Eng.***12**, 056016 (2015). - 59.
Wu, W., Gao, Y., Bienenstock, E., Donoghue, J. P. & Black, M. J. Bayesian population decoding of motor cortical activity using a Kalman filter.

*Neural Comput.***18**, 80–118 (2006). - 60.
Bansal, A. K., Truccolo, W., Vargas-Irwin, C. E. & Donoghue, J. P. Decoding 3D reach and grasp from hybrid signals in motor and premotor cortices: spikes, multiunit activity, and local field potentials.

*J. Neurophysiol.***107**, 1337–1355 (2011). - 61.
Obinata, G. & Anderson, B. D. O.

*Model Reduction for Control System Design*(Springer Science & Business Media, 2012). - 62.
Katayama, T.

*Subspace Methods for System Identification*(Springer Science & Business Media, 2006). - 63.
Shenoy, K. V. & Carmena, J. M. Combining decoder design and neural adaptation in brain–machine interfaces.

*Neuron***84**, 665–680 (2014). - 64.
Yang, Y. & Shanechi, M. M. An adaptive and generalizable closed-loop system for control of medically induced coma and other states of anesthesia.

*J. Neural Eng.***13**, 066019 (2016). - 65.
Yang, Y., Chang, E. F. & Shanechi, M. M. Dynamic tracking of non-stationarity in human ECoG activity. In

*2017 39th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC)*1660–1663 (2017). - 66.
Ahmadipour, P., Yang, Y. & Shanechi, M. M. Investigating the effect of forgetting factor on tracking non-stationary neural dynamics. In

*2019 9th International IEEE/EMBS Conference on Neural Engineering (NER)*291–294 (2019). - 67.
Fu, Z.-F. & He, J.

*Modal Analysis*(Elsevier, 2001). - 68.
Wong, Y. T., Putrino, D., Weiss, A. & Pesaran, B. Utilizing movement synergies to improve decoding performance for a brain machine interface. In

*2013 35th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC)*289–292 (2013). - 69.
Cisek, P., Crammond, D. J. & Kalaska, J. F. Neural activity in primary motor and dorsal premotor cortex in reaching tasks with the contralateral versus ipsilateral arm.

*J. Neurophysiol.***89**, 922–942 (2003). - 70.
Ames, K. C. & Churchland, M. M. Motor cortex signals for each arm are mixed across hemispheres and neurons yet partitioned within the population response.

*eLife***8**, e46159 (2019). - 71.
Putrino, D., Wong, Y. T., Weiss, A. & Pesaran, B. A training platform for many-dimensional prosthetic devices using a virtual reality environment.

*J. Neurosci. Methods***244**, 68–77 (2015). - 72.
Flint, R. D., Ethier, C., Oby, E. R., Miller, L. E. & Slutzky, M. W. Local field potentials allow accurate decoding of muscle activity.

*J. Neurophysiol.***108**, 18–24 (2012). - 73.
Bundy, D. T., Pahwa, M., Szrama, N. & Leuthardt, E. C. Decoding three-dimensional reaching movements using electrocorticographic signals in humans.

*J. Neural Eng.***13**, 026021 (2016). - 74.
Oppenheim, A. V. & Schafer, R. W.

*Discrete-Time Signal Processing*(Pearson Higher Education, 2011). - 75.
Williams, A. H. et al. Unsupervised discovery of demixed, low-dimensional neural dynamics across multiple timescales through tensor component analysis.

*Neuron***98**, 1099–1115.e8 (2018). - 76.
Trautmann, E. M. et al. Accurate estimation of neural population dynamics without spike sorting.

*Neuron***103**, 292–308.e4 (2019). - 77.
Gallego, J. A., Perich, M. G., Chowdhury, R. H., Solla, S. A. & Miller, L. E. Long-term stability of cortical population dynamics underlying consistent behavior.

*Nat. Neurosci.***23**, 260–270 (2020). - 78.
Sadras, N., Pesaran, B. & Shanechi, M. M. A point-process matched filter for event detection and decoding from population spike trains.

*J. Neural Eng.***16**, 066016 (2019). - 79.
Ghahramani, Z. & Hinton, G. E.

*Parameter Estimation for Linear Dynamical Systems*. Technical Report CRG-TR-92-2, 1–6 (University of Toronto, 1996); https://www.cs.toronto.edu/~hinton/absps/tr96-2.html - 80.
Bishop, C. M.

*Pattern Recognition and Machine Learning*(Springer, 2011). - 81.
Archer, E. W., Koster, U., Pillow, J. W. & Macke, J. H. Low-dimensional models of neural population activity in sensory cortical circuits. In

*Advances in Neural Information Processing Systems*27 (eds Ghahramani, Z. et al.) 343–351 (Curran Associates, 2014). - 82.
Chang, C.-C. & Lin, C.-J. LIBSVM: a library for support vector machines.

*ACM Trans. Intell. Syst. Technol.***2**, 1–27 (2011). - 83.
Medsker, L. & Jain, L. C.

*Recurrent Neural Networks: Design and Applications*(CRC Press, 1999). - 84.
Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing.

*J. R. Stat. Soc. Ser. B Methodol.***57**, 289–300 (1995).

## Acknowledgements

This work was supported in part by the following organizations and grants: the Army Research Office (ARO) under contract W911NF-16-1-0368 as part of the collaboration between the US DOD, the UK MOD and the UK Engineering and Physical Research Council (EPSRC) under the Multidisciplinary University Research Initiative (MURI) (to M.M.S.); the Office of Naval Research (ONR) Young Investigator Program (YIP) under contract N00014-19-1-2128 (to M.M.S.); the National Science Foundation (NSF) CAREER Award CCF-1453868 (to M.M.S.); ARO contract W911NF1810434 under the Bilateral Academic Research Initiative (BARI) (to M.M.S.); US National Institutes of Health (NIH) BRAIN grant R01-NS104923 (to B.P. and M.M.S.); and a University of Southern California Annenberg Fellowship (to O.G.S).

## Author information

### Affiliations

### Contributions

O.G.S. and M.M.S. conceived the study and developed the new PSID algorithm. O.G.S. performed all the analyses. H.A. performed the muscle activation inference used in Supplementary Fig. 14. Y.T.W. and B.P. provided all the nonhuman primate data. O.G.S. and M.M.S. wrote the manuscript with input from B.P.

### Corresponding author

## Ethics declarations

### Competing interests

The authors declare no competing interests.

## Additional information

**Peer review information** *Nature Neuroscience* thanks Carsen Stringer and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

**Publisher’s note** Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

## Extended data

### Extended Data Fig. 1 Visualization of the PSID algorithm.

(**a**) The extraction of future and past neural activity and future behavior from data is shown (see Supplementary Note 1 for the general definition). Matrices are depicted as colored rectangles. Past and future neural activity matrices *Y*_{p} and *Y*_{f} are of the same size, with columns of *Y*_{f} containing neural data for one step into the future relative to the corresponding columns of *Y*_{p}. Future behavior matrix *Z*_{f} includes the time-series of behavior at the same time steps as *Y*_{f}. (**b**) PSID learning algorithm. In stage one of PSID, performing SVD on the projection of future behavior *Z*_{f} onto past neural activity *Y*_{p} gives the behaviorally relevant latent states \(\hat X^{\left( 1 \right)}\). These states can be used on their own to learn the parameters for a model that only includes behaviorally relevant latent states. Optionally, stage two of PSID can be used to also extract behaviorally irrelevant latent states \(\hat X^{\left( 2 \right)}\). In stage two, residual future neural activity \(Y_f^\prime\) is obtained by subtracting from *Y*_{f} its projection onto \(\hat X^{\left( 1 \right)}\). Performing SVD on the projection of residual future neural activity \(Y_f^\prime\) onto past neural activity *Y*_{p} gives the behaviorally irrelevant latent states \(\hat X^{\left( 2 \right)}\). These states can then be used together with the behaviorally relevant latent states \(\hat X^{\left( 1 \right)}\) to learn the parameters for a model that includes both sets of states. Once model parameters (Equation. 1) are learned using only the neural and behavior training data, the extraction of latent states and the decoding of behavior in the test data are done purely from neural activity and using a Kalman filter and linear regression as shown in Fig. 1c (the Kalman filter and linear regression are specified by the learned model parameters). (**c**) A brief sketch of the main derivation step to obtain the PSID algorithm in (b). In the derivation of PSID (Supplementary Note 6), we show that for the model in Equation. 1, the prediction of future behavior *Z*_{f} using past neural activity *Y*_{p} (that is \(\hat Z_f\)) has the same row space as the behaviorally relevant latent states \(\hat X^{\left( 1 \right)}\). Similarly, we show that the prediction of the residual future neural activity \(Y_f^\prime\) using past neural activity *Y*_{p} (that is \(\hat Y_f^\prime\)) has the same row space as the behaviorally irrelevant latent states \(\hat X^{\left( 2 \right)}\) (Supplementary Note 6). Thus, in (b), we can empirically extract the latent states \(\hat X^{\left( 1 \right)}\) and \(\hat X^{\left( 2 \right)}\) from training data by first computing the predictions \(\hat Z_f\) and \(\hat Y_f^\prime\) as shown in (b) via projections, and then finding their row space using SVD.

### Extended Data Fig. 2 PSID correctly learns model parameters at a rate of convergence similar to that of SID while also being able to prioritize behaviorally relevant dynamics.

(**a**) Normalized error for identification of each model parameter using PSID (with 10^{6} training samples) across 100 random simulated models. Each model had randomly selected state, neural activity, and behavior dimensions as well as randomly generated parameters (Methods). The parameters *A*, *C*_{y}, *C*_{z} from Equation. 1 together with the covariance of neural activity \({\Sigma} _y \buildrel \Delta \over = {\boldsymbol{E}}\left\{ {y_ky_k^T} \right\}\) and the cross-covariance of neural activity with the latent state \(G_y \buildrel \Delta \over = {\boldsymbol{E}}\left\{ {x_{k + 1}y_k^T} \right\}\) fully characterize the model (Methods). Here, the same model structure parameters *n*_{x} (total latent state dimension) and *n*_{1} (dimension of the latent states extracted during the first stage of PSID) as the true model were used when applying PSID to data for each model (see Supplementary Fig. 3 on how these model structure parameters are also accurately identified from data). The horizontal dark line on the box shows the median, box edges show the 25^{th} and 75^{th} percentiles, whiskers represent the minimum and maximum values (other than outliers) and the dots show the outlier values. Outliers are defined as in Fig. 3b. Using 10^{6} samples, all parameters are identified with a median error smaller than 1%. (**b**) Normalized error for all parameters as a function of the number of training samples for PSID. The normalized error consistently decreases as more samples are used for identification. Solid line shows the average log_{10} of the normalized error and the shaded area shows the s.e.m. (**c**)-(**d**) Same as (a)-(b), shown for the standard SID algorithm. The rate of convergence for both PSID and SID, and for all parameters is around 10 times smaller error for 100 times more training samples (that is slope of −0.5 on (b), (d)). *n* = 100 random models in all panels.

### Extended Data Fig. 3 PSID requires orders of magnitude fewer training samples to achieve the same performance as NDM that uses a larger latent state dimension, and NDM with the same latent state dimension as PSID or RM do not achieve a comparable performance to PSID even with orders of magnitude more samples.

(**a**) Normalized eigenvalue error is shown for 1000 random simulated models with 16-dimensional latent states out of which 4 are behaviorally relevant, when using RM, PSID, or NDM with similar or larger latent state dimension than PSID. Solid lines show the average and shaded areas show the s.e.m. (*n* = 1000 random models). For NDM, to learn the behaviorally relevant dynamics using a model with a high-dimensional latent state (*n*_{x} = 16), we first identify this model, then sort the dimensions of the extracted latent state in order of their decoding accuracy, and then reduce the model to keep the 4 most behavior predictive latent state dimensions (Methods). These operations provide the estimate of the 4 behaviorally relevant eigenvalues (Methods). For RM, the state dimension is the behavior dimension (here *n*_{z} = 5). (**b**) Cross-validated behavior decoding CC for the models in (a). Figure convention and number of samples are the same as in (a). Note that unlike in (a), here we provide decoding results using the NDM with a 16-dimensional latent state both with and without any model reduction, as the two versions result in different decoding while they don’t differ in their most behavior predictive dimensions and thus have the same eigenvalue error in (a). Optimal decoding using the true model is shown as black. For NDM with a 4-dimensional latent state (that is in the dimension reduction regime) and RM, eigenvalue identification in (a) and decoding accuracies in (b) almost plateaued at some final value below that of the true model, indicating that the asymptotic performance of having unlimited training samples has almost been reached. In both (a) and (b), even for an NDM with a latent state dimension as large as the true model (that is not performing any dimension reduction and using *n*_{x} = 16), (i) NDM was inferior in performance compared with PSID with a latent state dimension of only 4 when using the same number of training samples, and (ii) NDM required orders of magnitude more training samples to reach the performance of PSID with the smaller latent state dimension as shown by the magenta arrow. Parameters are randomized as in Methods except for the state noise (*w*_{t}), which is about 30 times smaller (that is −2.5 ≤ α_{1} ≤ −0.5), and the behavior signal-to-noise ratio, which is 2 times smaller (that is −0.3 ≤ α_{3} ≤ +1.7), both adjusted to make the decoding performances more similar to the results in real neural data (Fig. 3).

### Extended Data Fig. 4 PSID can be used to model neural activity for different neural signal types including LFP power activity or population spiking activity.

Modeling neural activity using PSID is demonstrated with example signals, extracted latent states, and decoded behavior for (**a**) LFP power activity (that is signal power in different frequency bands, which are shown with different colors, Methods) and (**b**) Population spiking activity (Methods). In both cases, regardless of neural signal type, after extracting the neural feature time-series, decoding consists of two steps: 1) applying Kalman filter to extract the latent states given the neural feature time-series, 2) computing a linear combination of the states to get the decoding of behavior. By learning the dynamic model parameters, PSID specifies the Kalman filter parameters as well as the linear combination. Joint name abbreviations are as in Supplementary Fig. 12.

### Extended Data Fig. 5 As the dimension of the latent state extracted by PSID increases, it first covers the subspace of neural dynamics that are behaviorally relevant and then covers the subspace of residual neural dynamics.

(**a**) For different state dimensions (or different number of principal components (PCs) in the case of PCA), the cross-validated behavior decoding CC is shown versus the cross-validated accuracy of reconstructing neural activity using the same states/PCs quantified by CC. For PSID, NDM, and RM, reconstruction of neural activity is done using a Kalman filter for one time step into the future (that is one-step-ahead self-prediction, Methods). For PCA, reconstruction is done for the same time step by multiplying the extracted PCs by the transpose (that is inverse) of the PCA decomposition matrix. Solid lines show the average decoding CC and shaded areas show the s.e.m. (*n* = 91 datasets). Multiple points on the curves associated with equal number of states/PCs are marked with the same symbol (plus/cross/asterisks). (**b**) Same as (a) for monkey C (*n* = 48 datasets). (**c**) Using canonical correlation analysis (CCA), average CC for the best linear alignment between the latent states extracted in the first and second stages of PSID with the latent states/PCs extracted using NDM/PCA is shown (see also Extended Data Fig. 1). The state/PC dimension for NDM/PCA was the same as the state dimension in the first stage of PSID. Bars, boxes and asterisks are defined in as in Fig. 3b. (**d**) Same as (c) for monkey C. Statistical tests in panels c,d are one-sided signed-rank with *n* (number of datasets) as in panels a,b, respectively, with the *P* values noted above asterisks in the plot. As expected, compared with the second stage of PSID, the latent states extracted in the first stage of PSID are significantly less aligned with latent states from NDM and PCA (panels c,d). This is consistent with the first few state dimensions extracted by the first stage of PSID being significantly more aligned to behavior compared with the states extracted by NDM or PCA in panels a,b; it is also consistent with PSID reaching similar neural self-prediction as NDM when also using those states extracted in the second stage and thus higher overall latent state dimension (panels a,b). The first stage of PSID learns behaviorally relevant neural dynamics resulting in better PSID decoding using lower-dimensional latent states while its second stage learns the residual dynamics in neural activity (panels a,b). That is why latent states from the first stage are significantly less aligned with states from PCA and NDM, which simply aim to fit the dynamics in neural activity agnostic to their relevance to behavior.

### Extended Data Fig. 6 Dynamic model learned by PSID using a subset of joints in the training data was more predictive of the remaining joints in the test data compared with the dynamic model learned by NDM.

We selected a subset of joints and excluded them from the PSID modeling. After learning the dynamic model and extracting the latent states in the training data, we fitted a linear regression from these latent states to predict the remaining joints that were unseen by PSID (that is the regression solution constituting the parameter *C*_{z} in the model). Similarly, NDM learned its dynamic model and extracted the latent states in the training data, and then fitted a linear regression from these latent states to predict the joints. We then evaluated the final learned models in the test data. We repeated this procedure for multiple random joint subsets while ensuring that overall, all joints are a member of the unseen subsets equal number of times. (**a**) The peak cross-validated decoding accuracy (CC) is shown for PSID as a function of the number of joints that were unseen when learning the dynamic model. In each dataset, the same latent state dimension as PSID is used for NDM. In NDM, joints are never used in learning the dynamic model, equivalent to having all joints in the unseen subset. Indeed, PSID reduces to NDM in the extreme case when no joint is provided to PSID in learning the dynamic model as evident from the green and red curves converging at the end (in this case only stage 2 of PSID is performed, Methods). Solid lines show the average decoding CC and shaded areas show the s.e.m. (*n* ≥ 91 joint subset datasets). (**b**) Same as (a), for monkey C (*n* ≥ 48 joint subset datasets). For both monkeys and in all cases (other than PSID not seeing any joint for which it reduces to NDM), PSID decoding was significantly better than NDM decoding (*P* < 10^{−6}; one-sided signed-rank; *n* ≥ 91 and *n* ≥ 48 joint subset datasets in monkeys J and C, respectively). To investigate why training PSID with a subset of joints helps in decoding of a different unseen subset of joints in the test data, we computed the correlation coefficient between each pair of joint angles within our datasets and found an absolute correlation coefficient value of 0.31 ± 0.0097 (mean ± s.e.m., *n* = 351 joint pairs) and 0.32 ± 0.011 (*n* = 300 joint pairs), for monkeys J and C respectively. This result may suggest that since all joints are engaged in the same task, there are correlations between them that allow PSID to improve decoding even for joints that it does not observe during learning the dynamic model in training data.

### Extended Data Fig. 7 Extraction of bidirectional rotational dynamics using PSID was robust to brain region and held also when modeling neural activity within different cortical regions separately.

(**a**) Average trajectory of 2D states identified by PSID during reach and return epochs, when neural activity within different cortical areas is modeled separately. Figure convention is the same as in Fig. 5c. (**b**) Decoding accuracy of using the 2D PSID states or the 2D NDM states (from (a)) to decode behavior. Figure convention is the same as in Fig. 5e. (**c**)-(**d**) Same as (a)-(b) for monkey C. In both monkeys, similar to the results in Fig. 5, PSID again extracted latent states that, unlike the latent states extracted using NDM, rotated in opposite directions during reach and return (panels a,c) and resulted in more accurate decoding of behavior (panels b,d; *P* < 10^{−11} with the exact values noted above asterisks in the plot; one-sided signed-rank; *n* = 70 and *n* = 60 datasets for monkeys J and C, respectively).

### Extended Data Fig. 8 Similar to NDM, PCA and jPCA extract rotations that are in the same direction during reach and return epochs.

(**a**) Figure convention is the same as in Fig. 5c for projections to the 2D space extracted using PCA (that is top two PCs). Decoding for these and higher-dimensional PCA-extracted states is provided in Supplementary Fig. 6. (**b**) Same as (a) for monkey C. (**c**) Same as (a) for projections to 2D spaces extracted using jPCA^{21}. (**d**) Same as (c) for monkey C.

### Extended Data Fig. 9 PSID again achieved better decoding using lower-dimensional latent states when RNN-based nonlinear NDM always used a dynamic latent state with much higher dimension of 64 and/or when RNN-based nonlinear NDM used a Poisson observation model with a faster time step.

(**a**)-(**h**) Figure convention and number of datasets in all panels is the same as in Fig. 6, with additional configurations for the RNN-based nonlinear NDM method (that is LFADS) added to the comparison (Methods). As in Fig. 6, the dimension of the initial condition for LFADS is always 64. The alterations from Fig. 6 are as follows. First, in Fig. 6, the state dimension for LFADS—which we use to refer to generator RNN’s state dimension^{23} since it has the same role as the state dimension in a state-space model and determines how many numbers are used to represent the generator state at a given time step and generate the dynamics at the next time step (Methods)—was set to the number of factors to provide a directly comparable result with other methods (Methods; with this choice, number of LFADS factors is equal to its state dimension and is thus comparable with the state dimension in other methods). Here, instead, we also consider always setting the LFADS generator state dimension to 64 regardless of the number of factors. Thus, for this configuration of LFADS, the horizontal axis in panels a,e and the vertical axis in panels b,f only refer to number of factors, which is always smaller than the LFADS state dimension of 64. Again in this case where nonlinear NDM always uses 64-dimensional states to describe the dynamics, PSID reveals a markedly lower dimension than the number of factors in nonlinear NDM, and achieves better decoding than nonlinear NDM. Second, in Fig. 6, to provide a directly comparable result with PSID, the same Gaussian smoothed spike counts with 50 ms bins were used for both PSID and LFADS as input (Methods). Here, instead, we also allow LFADS to use non-smoothed spike counts with 10 ms bins and a Poisson observation model (Methods). For nonlinear NDM, switching the observation model from Gaussian to Poisson improved the peak decoding in monkey J (*P* < 10^{−3}; one-sided signed-rank; *n* = 26 datasets), while both observation models achieved similar decoding in monkey C (*P* >0.07; two-sided signed-rank; *n* = 16 datasets). Nevertheless, comparisons with PSID remained as before for all these nonlinear NDM configurations (regardless of it using Poisson or Gaussian observations): PSID revealed a markedly lower dimension than the number of factors in nonlinear NDM (panels b,f; *P* < 0.004; one-sided signed-rank; *n* ≥ 16 datasets) and achieved better decoding than even a nonlinear NDM with a larger number of factors than the PSID state dimension (panels a,c,d,e,g,h; *P* < 0.03; one-sided signed-rank; *n* ≥ 16 datasets). **P* < 0.05, ***P* < 0.005, ****P* < 0.0005. Statistical test details and exact *P*-values are as in Fig. 6 for linear NDM and RM and are provided in Supplementary Table 1 for the nonlinear NDM variations. This result is because nonlinear NDM, similar to linear NDM and unlike PSID, only considers neural activity when learning the dynamic model. This shows that the PSID advantage is in its novel formulation and two-stage approach for *learning* the dynamic model by considering both neural activity and behavior.

### Extended Data Fig. 10 PSID reveals low-dimensional behaviorally relevant dynamics in prefrontal raw LFP activity during a task with saccadic eye movements.

(**a**)-(**h**) Figure convention for all panels is the same as in Fig. 3a–d, shown here for a completely different behavioral task, brain region, and neural signal type. Here monkeys perform saccadic eye movements while PFC activity is being recorded (Methods). Raw LFP activity is modeled and the behavior consists of the 2D position of the eye. Similar results hold with PSID more accurately identifying the behaviorally relevant neural dynamics than both NDM and RM. PSID again reveals a markedly lower dimension for behaviorally relevant neural dynamics than NDM. Also, note that RM provides no control over the dimension of dynamics and is forced to use a state dimension equal to the behavior dimension (*n*_{z} =2), which in this case is an underestimation of dimension of behaviorally relevant dynamics in neural activity as evident by RM’s much worse decoding accuracy compared with PSID. Statistical tests are one-sided signed-rank for which the *P*-values are noted above the asterisks (*n* = 27 and *n* = 43 datasets in monkeys A and S, respectively).

## Supplementary information

### Supplementary Information

Supplementary Figs. 1–14, Table 1 and Notes 1–8.

### Supplementary Video 1

Visualization of how high-dimensional neural dynamics may contain 2D rotations both in the same and in opposite directions. The presented simulation depicts a hypothetical scenario whereby three dimensions of neural activity traverse a manifold in 3D space of which different projections reveal rotations in the same or opposite directions during reach versus return epochs. Among all projections, PSID can find the projection corresponding to the behaviorally relevant neural dynamics (for example, here, the (*y*_{2}, *y*_{3}) plane if behavior is best predicted using the activity in this plane), whereas the standard behavior-agnostic NDM methods may find other projections (for example, the (*y*_{1}, *y*_{2}) plane). A similar hypothetical manifold squeezed to varying degrees—resulting in three versions of the manifold—has been used in prior work to demonstrate the concept of tangling^{11}. Here, our goal is instead to demonstrate a distinct concept of how, for the same traversal on exactly the same manifold in 3D, different 2D projections can show rotations in different directions—that is, either rotations that keep the same direction or rotations that reverse their direction during the traversal. This observation is similar to how neural dynamics extracted by PSID in our motor dataset show different rotations compared with those extracted by NDM (Fig. 5). This result demonstrates the importance of PSID in performing dynamic dimensionality reduction while preserving behavior information, which, in this example, corresponds to which lower-dimensional 2D projection plane to pick for modeling the dynamics of the 3D neural activity. Here, for simplicity, projections are visualized as static, but the same concept holds for PSID projections, which are dynamic—that is, to get the projected latent variable at a given time, PSID can aggregate information not only from the same time step of neural activity but also from all the past neural activity. Moreover, unlike this simple hypothetical example, the dimension of the neural space is in general much higher than three. Thus, discovering and modeling the low-dimensional dynamic projection that preserves behaviorally relevant dynamics is a major challenge because these dynamics are hidden within the overall high-dimensional neural space. PSID addresses this challenge by discovering these behaviorally relevant dynamics within the high-dimensional neural space—that is, where (which subspace) they are in this high-dimensional dynamic space—finding their dimensionality and, finally, explicitly modeling their temporal evolution (that is, dynamics).

## Rights and permissions

## About this article

### Cite this article

Sani, O.G., Abbaspourazad, H., Wong, Y.T. *et al.* Modeling behaviorally relevant neural dynamics enabled by preferential subspace identification.
*Nat Neurosci* (2020). https://doi.org/10.1038/s41593-020-00733-0

Received:

Accepted:

Published: