Modeling behaviorally relevant neural dynamics enabled by preferential subspace identification


Neural activity exhibits complex dynamics related to various brain functions, internal states and behaviors. Understanding how neural dynamics explain specific measured behaviors requires dissociating behaviorally relevant and irrelevant dynamics, which is not achieved with current neural dynamic models as they are learned without considering behavior. We develop preferential subspace identification (PSID), which is an algorithm that models neural activity while dissociating and prioritizing its behaviorally relevant dynamics. Modeling data in two monkeys performing three-dimensional reach and grasp tasks, PSID revealed that the behaviorally relevant dynamics are significantly lower-dimensional than otherwise implied. Moreover, PSID discovered distinct rotational dynamics that were more predictive of behavior. Furthermore, PSID more accurately learned behaviorally relevant dynamics for each joint and recording channel. Finally, modeling data in two monkeys performing saccades demonstrated the generalization of PSID across behaviors, brain regions and neural signal types. PSID provides a general new tool to reveal behaviorally relevant neural dynamics that can otherwise go unnoticed.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Fig. 1: PSID enables learning of dynamics shared between the recorded neural activity and the measured behavior.
Fig. 2: Unlike standard methods, PSID correctly learns the behaviorally relevant neural dynamics even when using lower-dimensional latent states and performing dimensionality reduction.
Fig. 3: PSID reveals a markedly lower dimension for behaviorally relevant neural dynamics and extracts them more accurately in motor cortex LFP activity during 3D reach, grasp and return movements.
Fig. 4: PSID more accurately learns the behaviorally relevant neural dynamics in each recording channel across premotor, primary motor and prefrontal areas.
Fig. 5: PSID reveals rotational neural dynamics with opposite directions during 3D reach and return movements, which is not found by standard methods.
Fig. 6: PSID reveals a markedly lower dimension for behaviorally relevant neural dynamics and extracts them more accurately in motor cortex population spiking activity.

Data availability

The data used to support the results are available upon reasonable request from the corresponding author.

Code availability

The code for the PSID algorithm is available online at


  1. 1.

    Schwartz, A. B., Cui, X. T., Weber, D. J. & Moran, D. W. Brain-controlled interfaces: movement restoration with neural prosthetics. Neuron 52, 205–220 (2006).

    CAS  Article  Google Scholar 

  2. 2.

    Shenoy, K. V., Sahani, M. & Churchland, M. M. Cortical control of arm movements: a dynamical systems perspective. Annu. Rev. Neurosci. 36, 337–359 (2013).

    CAS  Article  Google Scholar 

  3. 3.

    Shanechi, M. M. Brain–machine interface control algorithms. IEEE Trans. Neural Syst. Rehabil. Eng. 25, 1725–1734 (2017).

    Article  Google Scholar 

  4. 4.

    Shanechi, M. M. Brain–machine interfaces from motor to mood. Nat. Neurosci. 22, 1554–1564 (2019).

    CAS  Article  Google Scholar 

  5. 5.

    Herff, C. & Schultz, T. Automatic speech recognition from neural signals: a focused review. Front. Neurosci. 10, 429 (2016).

    Article  Google Scholar 

  6. 6.

    Sani, O. G. et al. Mood variations decoded from multi-site intracranial human brain activity. Nat. Biotechnol. 36, 954–961 (2018).

    CAS  Article  Google Scholar 

  7. 7.

    Mante, V., Sussillo, D., Shenoy, K. V. & Newsome, W. T. Context-dependent computation by recurrent dynamics in prefrontal cortex. Nature 503, 78–84 (2013).

    CAS  Article  Google Scholar 

  8. 8.

    Hoang, K. B., Cassar, I. R., Grill, W. M. & Turner, D. A. Biomarkers and stimulation algorithms for adaptive brain stimulation. Front. Neurosci. 11, 564 (2017).

    Article  Google Scholar 

  9. 9.

    Kaufman, M. T. et al. The largest response component in the motor cortex reflects movement timing but not movement type. eNeuro (2016).

  10. 10.

    Gallego, J. A. et al. Cortical population activity within a preserved neural manifold underlies multiple motor behaviors. Nat. Commun. 9, 4233 (2018).

    Article  CAS  Google Scholar 

  11. 11.

    Russo, A. A. et al. Motor cortex embeds muscle-like commands in an untangled population response. Neuron 97, 953–966.e8 (2018).

    CAS  Article  Google Scholar 

  12. 12.

    Allen, W. E. et al. Thirst regulates motivated behavior through modulation of brainwide neural population dynamics. Science 364, eaav3932 (2019).

    CAS  Article  Google Scholar 

  13. 13.

    Stringer, C. et al. Spontaneous behaviors drive multidimensional, brainwide activity. Science 364, eaav7893 (2019).

    CAS  Article  Google Scholar 

  14. 14.

    Susilaradeya, D. et al. Extrinsic and intrinsic dynamics in movement intermittency. eLife 8, e40145 (2019).

    Article  Google Scholar 

  15. 15.

    Cunningham, J. P. & Yu, B. M. Dimensionality reduction for large-scale neural recordings. Nat. Neurosci. 17, 1500–1509 (2014).

    CAS  Article  Google Scholar 

  16. 16.

    Gallego, J. A., Perich, M. G., Miller, L. E. & Solla, S. A. Neural manifolds for the control of movement. Neuron 94, 978–984 (2017).

    CAS  Article  Google Scholar 

  17. 17.

    Remington, E. D., Egger, S. W., Narain, D., Wang, J. & Jazayeri, M. A dynamical systems perspective on flexible motor timing. Trends Cogn. Sci. 22, 938–952 (2018).

    Article  Google Scholar 

  18. 18.

    Sadtler, P. T. et al. Neural constraints on learning. Nature 512, 423–426 (2014).

    CAS  Article  Google Scholar 

  19. 19.

    Gao, P. & Ganguli, S. On simplicity and complexity in the brave new world of large-scale neuroscience. Curr. Opin. Neurobiol. 32, 148–155 (2015).

    CAS  Article  Google Scholar 

  20. 20.

    Gao, P. et al. A theory of multineuronal dimensionality, dynamics and measurement. Preprint at bioRxiv (2017).

  21. 21.

    Churchland, M. M. et al. Neural population dynamics during reaching. Nature 487, 51–56 (2012).

    CAS  Article  Google Scholar 

  22. 22.

    Kao, J. C. et al. Single-trial dynamics of motor cortex and their applications to brain–machine interfaces. Nat. Commun. 6, 7759 (2015).

    CAS  Article  Google Scholar 

  23. 23.

    Pandarinath, C. et al. Inferring single-trial neural population dynamics using sequential auto-encoders. Nat. Methods 15, 805–815 (2018).

    CAS  Article  Google Scholar 

  24. 24.

    Kao, J. C., Stavisky, S. D., Sussillo, D., Nuyujukian, P. & Shenoy, K. V. Information systems opportunities in brain–machine interface decoders. Proc. IEEE 102, 666–682 (2014).

    Article  Google Scholar 

  25. 25.

    Wallis, J. D. Decoding cognitive processes from neural ensembles. Trends Cogn. Sci. 22, 1091–1102 (2018).

    Article  Google Scholar 

  26. 26.

    Kobak, D. et al. Demixed principal component analysis of neural population data. eLife 5, e10989 (2016).

    Article  CAS  Google Scholar 

  27. 27.

    Svoboda, K. & Li, N. Neural mechanisms of movement planning: motor cortex and beyond. Curr. Opin. Neurobiol. 49, 33–41 (2018).

    CAS  Article  Google Scholar 

  28. 28.

    Gründemann, J. et al. Amygdala ensembles encode behavioral states. Science 364, eaav8736 (2019).

    Article  CAS  Google Scholar 

  29. 29.

    Wu, W., Kulkarni, J. E., Hatsopoulos, N. G. & Paninski, L. Neural decoding of hand motion using a linear state-space model with hidden states. IEEE Trans. Neural Syst. Rehabil. Eng. 17, 370–378 (2009).

    Article  Google Scholar 

  30. 30.

    Aghagolzadeh, M. & Truccolo, W. Inference and decoding of motor cortex low-dimensional dynamics via latent state-space models. IEEE Trans. Neural Syst. Rehabil. Eng. 24, 272–282 (2016).

    Article  Google Scholar 

  31. 31.

    Yang, Y., Connolly, A. T. & Shanechi, M. M. A control-theoretic system identification framework and a real-time closed-loop clinical simulation testbed for electrical brain stimulation. J. Neural Eng. 15, 066007 (2018).

    Article  Google Scholar 

  32. 32.

    Abbaspourazad, H., Hsieh, H. & Shanechi, M. M. A multiscale dynamical modeling and identification framework for spike–field activity. IEEE Trans. Neural Syst. Rehabil. Eng. 27, 1128–1138 (2019).

    Article  Google Scholar 

  33. 33.

    Yang, Y., Sani, O. G., Chang, E. F. & Shanechi, M. M. Dynamic network modeling and dimensionality reduction for human ECoG activity. J. Neural Eng. 16, 056014 (2019).

    Article  Google Scholar 

  34. 34.

    Yang, Y. et al. Model-based prediction of large-scale brain network dynamic response to direct electrical stimulation. Nat. Biomed. Eng. (in the press).

  35. 35.

    Van Overschee, P. & De Moor, B. Subspace Identification for Linear Systems (Springer US, 1996).

  36. 36.

    Markowitz, D. A., Curtis, C. E. & Pesaran, B. Multiple component networks support working memory in prefrontal cortex. Proc. Natl Acad. Sci. USA 112, 11084–11089 (2015).

    CAS  Article  Google Scholar 

  37. 37.

    Buesing, L., Macke, J. H. & Sahani, M. in Advances in Neural Information Processing Systems 25 (eds Pereira, F. et al) 1682–1690 (Curran Associates, 2012).

  38. 38.

    Yu, B. M. et al. Gaussian-process factor analysis for low-dimensional single-trial analysis of neural population activity. J. Neurophysiol. 102, 614–635 (2009).

    Article  Google Scholar 

  39. 39.

    Semedo, J. D., Zandvakili, A., Machens, C. K., Yu, B. M. & Kohn, A. Cortical areas interact through a communication subspace. Neuron 102, 249–259.e4 (2019).

    CAS  Article  Google Scholar 

  40. 40.

    Cunningham, J. P. & Ghahramani, Z. Linear dimensionality reduction: survey, insights, and generalizations. J. Mach. Learn. Res. 16, 2859–2900 (2015).

    Google Scholar 

  41. 41.

    Hsieh, H.-L., Wong, Y. T., Pesaran, B. & Shanechi, M. M. Multiscale modeling and decoding algorithms for spike–field activity. J. Neural Eng. 16, 016018 (2018).

    Article  Google Scholar 

  42. 42.

    Shanechi, M. M., Orsborn, A. L. & Carmena, J. M. Robust brain–machine interface design using optimal feedback control modeling and adaptive point process filtering. PLoS Comput. Biol. 12, e1004730 (2016).

    Article  CAS  Google Scholar 

  43. 43.

    Shanechi, M. M. et al. Rapid control and feedback rates enhance neuroprosthetic control. Nat. Commun. 8, 13825 (2017).

    CAS  Article  Google Scholar 

  44. 44.

    Stavisky, S. D., Kao, J. C., Nuyujukian, P., Ryu, S. I. & Shenoy, K. V. A high performing brain–machine interface driven by low-frequency local field potentials alone and together with spikes. J. Neural Eng. 12, 036009 (2015).

    Article  Google Scholar 

  45. 45.

    Bighamian, R., Wong, Y. T., Pesaran, B. & Shanechi, M. M. Sparse model-based estimation of functional dependence in high-dimensional field and spike multiscale networks. J. Neural Eng. 16, 056022 (2019).

    Article  Google Scholar 

  46. 46.

    Wang, C. & Shanechi, M. M. Estimating multiscale direct causality graphs in neural spike–field networks. IEEE Trans. Neural Syst. Rehabil. Eng. 27, 857–866 (2019).

    Article  Google Scholar 

  47. 47.

    Yang, Y. et al. Developing a personalized closed-loop controller of medically-induced coma in a rodent model. J. Neural Eng. 16, 036022 (2019).

    Article  Google Scholar 

  48. 48.

    Ahmadipour, P., Yang, Y., Chang, E. F. & Shanechi, M. M. Adaptive tracking of human ECoG network dynamics. J. Neural Eng. (2020).

  49. 49.

    Hsieh, H.-L. & Shanechi, M. M. Optimizing the learning rate for adaptive estimation of neural encoding models. PLoS Comput. Biol. 14, e1006168 (2018).

    Article  CAS  Google Scholar 

  50. 50.

    Yun, K., Watanabe, K. & Shimojo, S. Interpersonal body and neural synchronization as a marker of implicit social interaction. Sci. Rep. 2, 959 (2012).

    Article  CAS  Google Scholar 

  51. 51.

    Thura, D. & Cisek, P. Deliberation and commitment in the premotor and primary motor cortex during dynamic decision making. Neuron 81, 1401–1416 (2014).

    CAS  Article  Google Scholar 

  52. 52.

    Haroush, K. & Williams, Z. M. Neuronal prediction of opponent’s behavior during cooperative social interchange in primates. Cell 160, 1233–1245 (2015).

    CAS  Article  Google Scholar 

  53. 53.

    Herzfeld, D. J., Kojima, Y., Soetedjo, R. & Shadmehr, R. Encoding of action by the Purkinje cells of the cerebellum. Nature 526, 439–442 (2015).

    CAS  Article  Google Scholar 

  54. 54.

    Ramkumar, P., Dekleva, B., Cooler, S., Miller, L. & Kording, K. Premotor and motor cortices encode reward. PLoS ONE 11, e0160851 (2016).

    Article  CAS  Google Scholar 

  55. 55.

    Whitmire, C. J., Waiblinger, C., Schwarz, C. & Stanley, G. B. Information coding through adaptive gating of synchronized thalamic bursting. Cell Rep. 14, 795–807 (2016).

    CAS  Article  Google Scholar 

  56. 56.

    Christophel, T. B., Klink, P. C., Spitzer, B., Roelfsema, P. R. & Haynes, J.-D. The distributed nature of working memory. Trends Cogn. Sci. 21, 111–124 (2017).

    Article  Google Scholar 

  57. 57.

    Takahashi, K. et al. Encoding of both reaching and grasping kinematics in dorsal and ventral premotor cortices. J. Neurosci. 37, 1733–1746 (2017).

    CAS  Article  Google Scholar 

  58. 58.

    Menz, V. K., Schaffelhofer, S. & Scherberger, H. Representation of continuous hand and arm movements in macaque areas M1, F5, and AIP: a comparative decoding study. J. Neural Eng. 12, 056016 (2015).

    Article  Google Scholar 

  59. 59.

    Wu, W., Gao, Y., Bienenstock, E., Donoghue, J. P. & Black, M. J. Bayesian population decoding of motor cortical activity using a Kalman filter. Neural Comput. 18, 80–118 (2006).

    Article  Google Scholar 

  60. 60.

    Bansal, A. K., Truccolo, W., Vargas-Irwin, C. E. & Donoghue, J. P. Decoding 3D reach and grasp from hybrid signals in motor and premotor cortices: spikes, multiunit activity, and local field potentials. J. Neurophysiol. 107, 1337–1355 (2011).

    Article  Google Scholar 

  61. 61.

    Obinata, G. & Anderson, B. D. O. Model Reduction for Control System Design (Springer Science & Business Media, 2012).

  62. 62.

    Katayama, T. Subspace Methods for System Identification (Springer Science & Business Media, 2006).

  63. 63.

    Shenoy, K. V. & Carmena, J. M. Combining decoder design and neural adaptation in brain–machine interfaces. Neuron 84, 665–680 (2014).

    CAS  Article  Google Scholar 

  64. 64.

    Yang, Y. & Shanechi, M. M. An adaptive and generalizable closed-loop system for control of medically induced coma and other states of anesthesia. J. Neural Eng. 13, 066019 (2016).

    Article  Google Scholar 

  65. 65.

    Yang, Y., Chang, E. F. & Shanechi, M. M. Dynamic tracking of non-stationarity in human ECoG activity. In 2017 39th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC) 1660–1663 (2017).

  66. 66.

    Ahmadipour, P., Yang, Y. & Shanechi, M. M. Investigating the effect of forgetting factor on tracking non-stationary neural dynamics. In 2019 9th International IEEE/EMBS Conference on Neural Engineering (NER) 291–294 (2019).

  67. 67.

    Fu, Z.-F. & He, J. Modal Analysis (Elsevier, 2001).

  68. 68.

    Wong, Y. T., Putrino, D., Weiss, A. & Pesaran, B. Utilizing movement synergies to improve decoding performance for a brain machine interface. In 2013 35th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC) 289–292 (2013).

  69. 69.

    Cisek, P., Crammond, D. J. & Kalaska, J. F. Neural activity in primary motor and dorsal premotor cortex in reaching tasks with the contralateral versus ipsilateral arm. J. Neurophysiol. 89, 922–942 (2003).

    Article  Google Scholar 

  70. 70.

    Ames, K. C. & Churchland, M. M. Motor cortex signals for each arm are mixed across hemispheres and neurons yet partitioned within the population response. eLife 8, e46159 (2019).

    Article  Google Scholar 

  71. 71.

    Putrino, D., Wong, Y. T., Weiss, A. & Pesaran, B. A training platform for many-dimensional prosthetic devices using a virtual reality environment. J. Neurosci. Methods 244, 68–77 (2015).

    Article  Google Scholar 

  72. 72.

    Flint, R. D., Ethier, C., Oby, E. R., Miller, L. E. & Slutzky, M. W. Local field potentials allow accurate decoding of muscle activity. J. Neurophysiol. 108, 18–24 (2012).

    Article  Google Scholar 

  73. 73.

    Bundy, D. T., Pahwa, M., Szrama, N. & Leuthardt, E. C. Decoding three-dimensional reaching movements using electrocorticographic signals in humans. J. Neural Eng. 13, 026021 (2016).

    Article  Google Scholar 

  74. 74.

    Oppenheim, A. V. & Schafer, R. W. Discrete-Time Signal Processing (Pearson Higher Education, 2011).

  75. 75.

    Williams, A. H. et al. Unsupervised discovery of demixed, low-dimensional neural dynamics across multiple timescales through tensor component analysis. Neuron 98, 1099–1115.e8 (2018).

    CAS  Article  Google Scholar 

  76. 76.

    Trautmann, E. M. et al. Accurate estimation of neural population dynamics without spike sorting. Neuron 103, 292–308.e4 (2019).

    CAS  Article  Google Scholar 

  77. 77.

    Gallego, J. A., Perich, M. G., Chowdhury, R. H., Solla, S. A. & Miller, L. E. Long-term stability of cortical population dynamics underlying consistent behavior. Nat. Neurosci. 23, 260–270 (2020).

    CAS  Article  Google Scholar 

  78. 78.

    Sadras, N., Pesaran, B. & Shanechi, M. M. A point-process matched filter for event detection and decoding from population spike trains. J. Neural Eng. 16, 066016 (2019).

    Article  Google Scholar 

  79. 79.

    Ghahramani, Z. & Hinton, G. E. Parameter Estimation for Linear Dynamical Systems. Technical Report CRG-TR-92-2, 1–6 (University of Toronto, 1996);

  80. 80.

    Bishop, C. M. Pattern Recognition and Machine Learning (Springer, 2011).

  81. 81.

    Archer, E. W., Koster, U., Pillow, J. W. & Macke, J. H. Low-dimensional models of neural population activity in sensory cortical circuits. In Advances in Neural Information Processing Systems 27 (eds Ghahramani, Z. et al.) 343–351 (Curran Associates, 2014).

  82. 82.

    Chang, C.-C. & Lin, C.-J. LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2, 1–27 (2011).

    Article  Google Scholar 

  83. 83.

    Medsker, L. & Jain, L. C. Recurrent Neural Networks: Design and Applications (CRC Press, 1999).

  84. 84.

    Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B Methodol. 57, 289–300 (1995).

    Google Scholar 

Download references


This work was supported in part by the following organizations and grants: the Army Research Office (ARO) under contract W911NF-16-1-0368 as part of the collaboration between the US DOD, the UK MOD and the UK Engineering and Physical Research Council (EPSRC) under the Multidisciplinary University Research Initiative (MURI) (to M.M.S.); the Office of Naval Research (ONR) Young Investigator Program (YIP) under contract N00014-19-1-2128 (to M.M.S.); the National Science Foundation (NSF) CAREER Award CCF-1453868 (to M.M.S.); ARO contract W911NF1810434 under the Bilateral Academic Research Initiative (BARI) (to M.M.S.); US National Institutes of Health (NIH) BRAIN grant R01-NS104923 (to B.P. and M.M.S.); and a University of Southern California Annenberg Fellowship (to O.G.S).

Author information




O.G.S. and M.M.S. conceived the study and developed the new PSID algorithm. O.G.S. performed all the analyses. H.A. performed the muscle activation inference used in Supplementary Fig. 14. Y.T.W. and B.P. provided all the nonhuman primate data. O.G.S. and M.M.S. wrote the manuscript with input from B.P.

Corresponding author

Correspondence to Maryam M. Shanechi.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information Nature Neuroscience thanks Carsen Stringer and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Visualization of the PSID algorithm.

(a) The extraction of future and past neural activity and future behavior from data is shown (see Supplementary Note 1 for the general definition). Matrices are depicted as colored rectangles. Past and future neural activity matrices Yp and Yf are of the same size, with columns of Yf containing neural data for one step into the future relative to the corresponding columns of Yp. Future behavior matrix Zf includes the time-series of behavior at the same time steps as Yf. (b) PSID learning algorithm. In stage one of PSID, performing SVD on the projection of future behavior Zf onto past neural activity Yp gives the behaviorally relevant latent states \(\hat X^{\left( 1 \right)}\). These states can be used on their own to learn the parameters for a model that only includes behaviorally relevant latent states. Optionally, stage two of PSID can be used to also extract behaviorally irrelevant latent states \(\hat X^{\left( 2 \right)}\). In stage two, residual future neural activity \(Y_f^\prime\) is obtained by subtracting from Yf its projection onto \(\hat X^{\left( 1 \right)}\). Performing SVD on the projection of residual future neural activity \(Y_f^\prime\) onto past neural activity Yp gives the behaviorally irrelevant latent states \(\hat X^{\left( 2 \right)}\). These states can then be used together with the behaviorally relevant latent states \(\hat X^{\left( 1 \right)}\) to learn the parameters for a model that includes both sets of states. Once model parameters (Equation. 1) are learned using only the neural and behavior training data, the extraction of latent states and the decoding of behavior in the test data are done purely from neural activity and using a Kalman filter and linear regression as shown in Fig. 1c (the Kalman filter and linear regression are specified by the learned model parameters). (c) A brief sketch of the main derivation step to obtain the PSID algorithm in (b). In the derivation of PSID (Supplementary Note 6), we show that for the model in Equation. 1, the prediction of future behavior Zf using past neural activity Yp (that is \(\hat Z_f\)) has the same row space as the behaviorally relevant latent states \(\hat X^{\left( 1 \right)}\). Similarly, we show that the prediction of the residual future neural activity \(Y_f^\prime\) using past neural activity Yp (that is \(\hat Y_f^\prime\)) has the same row space as the behaviorally irrelevant latent states \(\hat X^{\left( 2 \right)}\) (Supplementary Note 6). Thus, in (b), we can empirically extract the latent states \(\hat X^{\left( 1 \right)}\) and \(\hat X^{\left( 2 \right)}\) from training data by first computing the predictions \(\hat Z_f\) and \(\hat Y_f^\prime\) as shown in (b) via projections, and then finding their row space using SVD.

Extended Data Fig. 2 PSID correctly learns model parameters at a rate of convergence similar to that of SID while also being able to prioritize behaviorally relevant dynamics.

(a) Normalized error for identification of each model parameter using PSID (with 106 training samples) across 100 random simulated models. Each model had randomly selected state, neural activity, and behavior dimensions as well as randomly generated parameters (Methods). The parameters A, Cy, Cz from Equation. 1 together with the covariance of neural activity \({\Sigma} _y \buildrel \Delta \over = {\boldsymbol{E}}\left\{ {y_ky_k^T} \right\}\) and the cross-covariance of neural activity with the latent state \(G_y \buildrel \Delta \over = {\boldsymbol{E}}\left\{ {x_{k + 1}y_k^T} \right\}\) fully characterize the model (Methods). Here, the same model structure parameters nx (total latent state dimension) and n1 (dimension of the latent states extracted during the first stage of PSID) as the true model were used when applying PSID to data for each model (see Supplementary Fig. 3 on how these model structure parameters are also accurately identified from data). The horizontal dark line on the box shows the median, box edges show the 25th and 75th percentiles, whiskers represent the minimum and maximum values (other than outliers) and the dots show the outlier values. Outliers are defined as in Fig. 3b. Using 106 samples, all parameters are identified with a median error smaller than 1%. (b) Normalized error for all parameters as a function of the number of training samples for PSID. The normalized error consistently decreases as more samples are used for identification. Solid line shows the average log10 of the normalized error and the shaded area shows the s.e.m. (c)-(d) Same as (a)-(b), shown for the standard SID algorithm. The rate of convergence for both PSID and SID, and for all parameters is around 10 times smaller error for 100 times more training samples (that is slope of −0.5 on (b), (d)). n = 100 random models in all panels.

Extended Data Fig. 3 PSID requires orders of magnitude fewer training samples to achieve the same performance as NDM that uses a larger latent state dimension, and NDM with the same latent state dimension as PSID or RM do not achieve a comparable performance to PSID even with orders of magnitude more samples.

(a) Normalized eigenvalue error is shown for 1000 random simulated models with 16-dimensional latent states out of which 4 are behaviorally relevant, when using RM, PSID, or NDM with similar or larger latent state dimension than PSID. Solid lines show the average and shaded areas show the s.e.m. (n = 1000 random models). For NDM, to learn the behaviorally relevant dynamics using a model with a high-dimensional latent state (nx = 16), we first identify this model, then sort the dimensions of the extracted latent state in order of their decoding accuracy, and then reduce the model to keep the 4 most behavior predictive latent state dimensions (Methods). These operations provide the estimate of the 4 behaviorally relevant eigenvalues (Methods). For RM, the state dimension is the behavior dimension (here nz = 5). (b) Cross-validated behavior decoding CC for the models in (a). Figure convention and number of samples are the same as in (a). Note that unlike in (a), here we provide decoding results using the NDM with a 16-dimensional latent state both with and without any model reduction, as the two versions result in different decoding while they don’t differ in their most behavior predictive dimensions and thus have the same eigenvalue error in (a). Optimal decoding using the true model is shown as black. For NDM with a 4-dimensional latent state (that is in the dimension reduction regime) and RM, eigenvalue identification in (a) and decoding accuracies in (b) almost plateaued at some final value below that of the true model, indicating that the asymptotic performance of having unlimited training samples has almost been reached. In both (a) and (b), even for an NDM with a latent state dimension as large as the true model (that is not performing any dimension reduction and using nx = 16), (i) NDM was inferior in performance compared with PSID with a latent state dimension of only 4 when using the same number of training samples, and (ii) NDM required orders of magnitude more training samples to reach the performance of PSID with the smaller latent state dimension as shown by the magenta arrow. Parameters are randomized as in Methods except for the state noise (wt), which is about 30 times smaller (that is −2.5 ≤ α1 ≤ −0.5), and the behavior signal-to-noise ratio, which is 2 times smaller (that is −0.3 ≤ α3 ≤ +1.7), both adjusted to make the decoding performances more similar to the results in real neural data (Fig. 3).

Extended Data Fig. 4 PSID can be used to model neural activity for different neural signal types including LFP power activity or population spiking activity.

Modeling neural activity using PSID is demonstrated with example signals, extracted latent states, and decoded behavior for (a) LFP power activity (that is signal power in different frequency bands, which are shown with different colors, Methods) and (b) Population spiking activity (Methods). In both cases, regardless of neural signal type, after extracting the neural feature time-series, decoding consists of two steps: 1) applying Kalman filter to extract the latent states given the neural feature time-series, 2) computing a linear combination of the states to get the decoding of behavior. By learning the dynamic model parameters, PSID specifies the Kalman filter parameters as well as the linear combination. Joint name abbreviations are as in Supplementary Fig. 12.

Extended Data Fig. 5 As the dimension of the latent state extracted by PSID increases, it first covers the subspace of neural dynamics that are behaviorally relevant and then covers the subspace of residual neural dynamics.

(a) For different state dimensions (or different number of principal components (PCs) in the case of PCA), the cross-validated behavior decoding CC is shown versus the cross-validated accuracy of reconstructing neural activity using the same states/PCs quantified by CC. For PSID, NDM, and RM, reconstruction of neural activity is done using a Kalman filter for one time step into the future (that is one-step-ahead self-prediction, Methods). For PCA, reconstruction is done for the same time step by multiplying the extracted PCs by the transpose (that is inverse) of the PCA decomposition matrix. Solid lines show the average decoding CC and shaded areas show the s.e.m. (n = 91 datasets). Multiple points on the curves associated with equal number of states/PCs are marked with the same symbol (plus/cross/asterisks). (b) Same as (a) for monkey C (n = 48 datasets). (c) Using canonical correlation analysis (CCA), average CC for the best linear alignment between the latent states extracted in the first and second stages of PSID with the latent states/PCs extracted using NDM/PCA is shown (see also Extended Data Fig. 1). The state/PC dimension for NDM/PCA was the same as the state dimension in the first stage of PSID. Bars, boxes and asterisks are defined in as in Fig. 3b. (d) Same as (c) for monkey C. Statistical tests in panels c,d are one-sided signed-rank with n (number of datasets) as in panels a,b, respectively, with the P values noted above asterisks in the plot. As expected, compared with the second stage of PSID, the latent states extracted in the first stage of PSID are significantly less aligned with latent states from NDM and PCA (panels c,d). This is consistent with the first few state dimensions extracted by the first stage of PSID being significantly more aligned to behavior compared with the states extracted by NDM or PCA in panels a,b; it is also consistent with PSID reaching similar neural self-prediction as NDM when also using those states extracted in the second stage and thus higher overall latent state dimension (panels a,b). The first stage of PSID learns behaviorally relevant neural dynamics resulting in better PSID decoding using lower-dimensional latent states while its second stage learns the residual dynamics in neural activity (panels a,b). That is why latent states from the first stage are significantly less aligned with states from PCA and NDM, which simply aim to fit the dynamics in neural activity agnostic to their relevance to behavior.

Extended Data Fig. 6 Dynamic model learned by PSID using a subset of joints in the training data was more predictive of the remaining joints in the test data compared with the dynamic model learned by NDM.

We selected a subset of joints and excluded them from the PSID modeling. After learning the dynamic model and extracting the latent states in the training data, we fitted a linear regression from these latent states to predict the remaining joints that were unseen by PSID (that is the regression solution constituting the parameter Cz in the model). Similarly, NDM learned its dynamic model and extracted the latent states in the training data, and then fitted a linear regression from these latent states to predict the joints. We then evaluated the final learned models in the test data. We repeated this procedure for multiple random joint subsets while ensuring that overall, all joints are a member of the unseen subsets equal number of times. (a) The peak cross-validated decoding accuracy (CC) is shown for PSID as a function of the number of joints that were unseen when learning the dynamic model. In each dataset, the same latent state dimension as PSID is used for NDM. In NDM, joints are never used in learning the dynamic model, equivalent to having all joints in the unseen subset. Indeed, PSID reduces to NDM in the extreme case when no joint is provided to PSID in learning the dynamic model as evident from the green and red curves converging at the end (in this case only stage 2 of PSID is performed, Methods). Solid lines show the average decoding CC and shaded areas show the s.e.m. (n ≥ 91 joint subset datasets). (b) Same as (a), for monkey C (n ≥ 48 joint subset datasets). For both monkeys and in all cases (other than PSID not seeing any joint for which it reduces to NDM), PSID decoding was significantly better than NDM decoding (P < 10−6; one-sided signed-rank; n ≥ 91 and n ≥ 48 joint subset datasets in monkeys J and C, respectively). To investigate why training PSID with a subset of joints helps in decoding of a different unseen subset of joints in the test data, we computed the correlation coefficient between each pair of joint angles within our datasets and found an absolute correlation coefficient value of 0.31 ± 0.0097 (mean ± s.e.m., n = 351 joint pairs) and 0.32 ± 0.011 (n = 300 joint pairs), for monkeys J and C respectively. This result may suggest that since all joints are engaged in the same task, there are correlations between them that allow PSID to improve decoding even for joints that it does not observe during learning the dynamic model in training data.

Extended Data Fig. 7 Extraction of bidirectional rotational dynamics using PSID was robust to brain region and held also when modeling neural activity within different cortical regions separately.

(a) Average trajectory of 2D states identified by PSID during reach and return epochs, when neural activity within different cortical areas is modeled separately. Figure convention is the same as in Fig. 5c. (b) Decoding accuracy of using the 2D PSID states or the 2D NDM states (from (a)) to decode behavior. Figure convention is the same as in Fig. 5e. (c)-(d) Same as (a)-(b) for monkey C. In both monkeys, similar to the results in Fig. 5, PSID again extracted latent states that, unlike the latent states extracted using NDM, rotated in opposite directions during reach and return (panels a,c) and resulted in more accurate decoding of behavior (panels b,d; P < 10−11 with the exact values noted above asterisks in the plot; one-sided signed-rank; n = 70 and n = 60 datasets for monkeys J and C, respectively).

Extended Data Fig. 8 Similar to NDM, PCA and jPCA extract rotations that are in the same direction during reach and return epochs.

(a) Figure convention is the same as in Fig. 5c for projections to the 2D space extracted using PCA (that is top two PCs). Decoding for these and higher-dimensional PCA-extracted states is provided in Supplementary Fig. 6. (b) Same as (a) for monkey C. (c) Same as (a) for projections to 2D spaces extracted using jPCA21. (d) Same as (c) for monkey C.

Extended Data Fig. 9 PSID again achieved better decoding using lower-dimensional latent states when RNN-based nonlinear NDM always used a dynamic latent state with much higher dimension of 64 and/or when RNN-based nonlinear NDM used a Poisson observation model with a faster time step.

(a)-(h) Figure convention and number of datasets in all panels is the same as in Fig. 6, with additional configurations for the RNN-based nonlinear NDM method (that is LFADS) added to the comparison (Methods). As in Fig. 6, the dimension of the initial condition for LFADS is always 64. The alterations from Fig. 6 are as follows. First, in Fig. 6, the state dimension for LFADS—which we use to refer to generator RNN’s state dimension23 since it has the same role as the state dimension in a state-space model and determines how many numbers are used to represent the generator state at a given time step and generate the dynamics at the next time step (Methods)—was set to the number of factors to provide a directly comparable result with other methods (Methods; with this choice, number of LFADS factors is equal to its state dimension and is thus comparable with the state dimension in other methods). Here, instead, we also consider always setting the LFADS generator state dimension to 64 regardless of the number of factors. Thus, for this configuration of LFADS, the horizontal axis in panels a,e and the vertical axis in panels b,f only refer to number of factors, which is always smaller than the LFADS state dimension of 64. Again in this case where nonlinear NDM always uses 64-dimensional states to describe the dynamics, PSID reveals a markedly lower dimension than the number of factors in nonlinear NDM, and achieves better decoding than nonlinear NDM. Second, in Fig. 6, to provide a directly comparable result with PSID, the same Gaussian smoothed spike counts with 50 ms bins were used for both PSID and LFADS as input (Methods). Here, instead, we also allow LFADS to use non-smoothed spike counts with 10 ms bins and a Poisson observation model (Methods). For nonlinear NDM, switching the observation model from Gaussian to Poisson improved the peak decoding in monkey J (P < 10−3; one-sided signed-rank; n = 26 datasets), while both observation models achieved similar decoding in monkey C (P >0.07; two-sided signed-rank; n = 16 datasets). Nevertheless, comparisons with PSID remained as before for all these nonlinear NDM configurations (regardless of it using Poisson or Gaussian observations): PSID revealed a markedly lower dimension than the number of factors in nonlinear NDM (panels b,f; P < 0.004; one-sided signed-rank; n ≥ 16 datasets) and achieved better decoding than even a nonlinear NDM with a larger number of factors than the PSID state dimension (panels a,c,d,e,g,h; P < 0.03; one-sided signed-rank; n ≥ 16 datasets). *P < 0.05, **P < 0.005, ***P < 0.0005. Statistical test details and exact P-values are as in Fig. 6 for linear NDM and RM and are provided in Supplementary Table 1 for the nonlinear NDM variations. This result is because nonlinear NDM, similar to linear NDM and unlike PSID, only considers neural activity when learning the dynamic model. This shows that the PSID advantage is in its novel formulation and two-stage approach for learning the dynamic model by considering both neural activity and behavior.

Extended Data Fig. 10 PSID reveals low-dimensional behaviorally relevant dynamics in prefrontal raw LFP activity during a task with saccadic eye movements.

(a)-(h) Figure convention for all panels is the same as in Fig. 3a–d, shown here for a completely different behavioral task, brain region, and neural signal type. Here monkeys perform saccadic eye movements while PFC activity is being recorded (Methods). Raw LFP activity is modeled and the behavior consists of the 2D position of the eye. Similar results hold with PSID more accurately identifying the behaviorally relevant neural dynamics than both NDM and RM. PSID again reveals a markedly lower dimension for behaviorally relevant neural dynamics than NDM. Also, note that RM provides no control over the dimension of dynamics and is forced to use a state dimension equal to the behavior dimension (nz =2), which in this case is an underestimation of dimension of behaviorally relevant dynamics in neural activity as evident by RM’s much worse decoding accuracy compared with PSID. Statistical tests are one-sided signed-rank for which the P-values are noted above the asterisks (n = 27 and n = 43 datasets in monkeys A and S, respectively).

Supplementary information

Supplementary Information

Supplementary Figs. 1–14, Table 1 and Notes 1–8.

Reporting Summary

Supplementary Video 1

Visualization of how high-dimensional neural dynamics may contain 2D rotations both in the same and in opposite directions. The presented simulation depicts a hypothetical scenario whereby three dimensions of neural activity traverse a manifold in 3D space of which different projections reveal rotations in the same or opposite directions during reach versus return epochs. Among all projections, PSID can find the projection corresponding to the behaviorally relevant neural dynamics (for example, here, the (y2, y3) plane if behavior is best predicted using the activity in this plane), whereas the standard behavior-agnostic NDM methods may find other projections (for example, the (y1, y2) plane). A similar hypothetical manifold squeezed to varying degrees—resulting in three versions of the manifold—has been used in prior work to demonstrate the concept of tangling11. Here, our goal is instead to demonstrate a distinct concept of how, for the same traversal on exactly the same manifold in 3D, different 2D projections can show rotations in different directions—that is, either rotations that keep the same direction or rotations that reverse their direction during the traversal. This observation is similar to how neural dynamics extracted by PSID in our motor dataset show different rotations compared with those extracted by NDM (Fig. 5). This result demonstrates the importance of PSID in performing dynamic dimensionality reduction while preserving behavior information, which, in this example, corresponds to which lower-dimensional 2D projection plane to pick for modeling the dynamics of the 3D neural activity. Here, for simplicity, projections are visualized as static, but the same concept holds for PSID projections, which are dynamic—that is, to get the projected latent variable at a given time, PSID can aggregate information not only from the same time step of neural activity but also from all the past neural activity. Moreover, unlike this simple hypothetical example, the dimension of the neural space is in general much higher than three. Thus, discovering and modeling the low-dimensional dynamic projection that preserves behaviorally relevant dynamics is a major challenge because these dynamics are hidden within the overall high-dimensional neural space. PSID addresses this challenge by discovering these behaviorally relevant dynamics within the high-dimensional neural space—that is, where (which subspace) they are in this high-dimensional dynamic space—finding their dimensionality and, finally, explicitly modeling their temporal evolution (that is, dynamics).

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Sani, O.G., Abbaspourazad, H., Wong, Y.T. et al. Modeling behaviorally relevant neural dynamics enabled by preferential subspace identification. Nat Neurosci (2020).

Download citation


Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing