Response time data collected from cognitive tasks are a cornerstone of psychology and neuroscience research, yet existing models of these data either make strong assumptions about the data-generating process or are limited to modelling single trials. We introduce task-DyVA, a deep learning framework in which expressive dynamical systems are trained to reproduce sequences of response times observed in data from individual human subjects. Models fitted to a large task-switching dataset captured subject-specific behavioural differences with high temporal precision, including task-switching costs. Through perturbation experiments and analyses of the models’ latent dynamics, we find support for a rational account of switch costs in terms of a stability–flexibility trade-off. Thus, our framework can be used to discover interpretable cognitive theories that explain how the brain dynamically gives rise to behaviour.
This is a preview of subscription content, access via your institution
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 per month
cancel any time
Subscribe to this journal
Receive 12 digital issues and online access to articles
$119.00 per year
only $9.92 per issue
Rent or buy this article
Get just this article for as long as you need it
Prices may be subject to local taxes which are calculated during checkout
All data and models analysed in this manuscript are publicly available without restriction: https://doi.org/10.5281/zenodo.6368412.
All code used to train and analyse the models in this manuscript is publicly available without restriction: https://github.com/pauljaffe/task-dyva.
Yu, A. J. & Cohen, J. D. Sequential effects: superstition or rational behavior? Adv. Neural Inf. Process. Syst. 21, 1873–1880 (2008).
Remington, R. J. Analysis of sequential effects on choice reaction times. J. Exp. Psychol. 82, 250–257 (1969).
Ratcliff, R. & McKoon, G. The diffusion decision model: theory and data for two-choice decision tasks. Neural Comput. 20, 873–922 (2008).
Gregg, L. W. & Simon, H. A. Process models and stochastic theories of simple concept formation. J. Math. Psychol. 4, 246–276 (1967).
Mante, V., Sussillo, D., Shenoy, K. V. & Newsome, W. T. Context-dependent computation by recurrent dynamics in prefrontal cortex. Nature 503, 78–84 (2013).
Yamins, D. L. K. et al. Performance-optimized hierarchical models predict neural responses in higher visual cortex. Proc. Natl Acad. Sci. USA 111, 8619–8624 (2014).
Wang, J., Narain, D., Hosseini, E. A. & Jazayeri, M. Flexible timing by temporal scaling of cortical responses. Nat. Neurosci. 21, 102–110 (2018).
Yang, G. R., Joglekar, M. R., Song, H. F., Newsome, W. T. & Wang, X.-J. Task representations in neural networks trained to perform many cognitive tasks. Nat. Neurosci. 22, 297–306 (2019).
McClelland, J. L. et al. Letting structure emerge: connectionist and dynamical systems approaches to cognition. Trends Cogn. Sci. 14, 348–356 (2010).
Gilbert, S. J. & Shallice, T. Task switching: a PDP model. Cogn. Psychol. 44, 297–337 (2002).
Tversky, A. & Kahneman, D. Judgment under uncertainty: heuristics and biases. Science 185, 1124–1131 (1974).
Holmes, P. & Cohen, J. D. Optimality and some of its discontents: successes and shortcomings of existing models for binary decisions. Top. Cogn. Sci. 6, 258–278 (2014).
Song, H. F., Yang, G. R. & Wang, X.-J. Training excitatory–inhibitory recurrent neural networks for cognitive tasks: a simple and flexible framework. PLoS Comput. Biol. 12, e1004792 (2016).
Song, H. F., Yang, G. R. & Wang, X.-J. Reward-based training of recurrent neural networks for cognitive and value-based tasks. eLife 6, e21492 (2017).
Dezfouli, A., Griffiths, K., Ramos, F., Dayan, P. & Balleine, B. W. Models that learn how humans learn: the case of decision-making and its disorders. PLoS Comput. Biol. 15, e1006903 (2019).
Sussillo, D., Churchland, M. M., Kaufman, M. T. & Shenoy, K. V. A neural network that finds a naturalistic solution for the production of muscle activity. Nat. Neurosci. 18, 1025–1033 (2015).
Girin, L. et al. Dynamical variational autoencoders: a comprehensive review. Found. Trends Mach. Learn. 15, 1–175 (2021).
Monsell, S. Task switching. Trends Cogn. Sci. 7, 134–140 (2003).
Rogers, R. D. & Monsell, S. Costs of a predictable switch between simple cognitive tasks. J. Exp. Psychol. Gen. 124, 207–231 (1995).
Allport, D. A., Styles, E. A. & Hsieh, S. in Shifting Intentional Set: Exploring the Dynamic Control of Tasks (eds Umiltà, C. & Moscovitch, M.) 421–452 (The MIT Press, 1994).
Steyvers, M., Hawkins, G. E., Karayanidis, F. & Brown, S. D. A large-scale analysis of task switching practice effects across the lifespan. Proc. Natl Acad. Sci. USA 116, 17735–17740 (2019).
Musslick, S. & Cohen, J. D. Rationalizing constraints on the capacity for cognitive control. Trends Cogn. Sci. 25, 757–775 (2021).
Funahashi, K.-i & Nakamura, Y. Approximation of dynamical systems by continuous time recurrent neural networks. Neural Netw. 6, 801–806 (1993).
Kingma, D. P. & Welling, M. Auto-encoding variational Bayes. Preprint at https://arxiv.org/abs/1312.6114 (2013).
Rezende, D. J., Mohamed, S. & Wierstra, D. Stochastic Backpropagation and Approximate Inference in Deep Generative Models. In Proc. of the 31st International Conference on Machine Learning 32, 1278–1286 (Proceedings of Machine Learning Research, 2014).
Kray, J. & Lindenberger, U. Adult age differences in task switching. Psychol. Aging 15, 126–147 (2000).
Bugg, J. M. & Braver, T. S. Proactive control of irrelevant task rules during cued task switching. Psychol. Res. 80, 860–876 (2016).
Kiesel, A., Wendt, M. & Peters, A. Task switching: on the origin of response congruency effects. Psychol. Res. 71, 117–125 (2007).
Meiran, N. & Kessler, Y. The task rule congruency effect in task switching reflects activated long-term memory. J. Exp. Psychol. Hum. Percept. Perform. 34, 137–157 (2008).
Hockley, W. E. Analysis of response time distributions in the study of cognitive processes. J. Exp. Psychol. Learn. Mem. Cogn. 10, 598–615 (1984).
Ratcliff, R. & Murdock, B. B. Retrieval processes in recognition memory. Psychol. Rev. 83, 190–214 (1976).
Sussillo, D. & Barak, O. Opening the black box: low-dimensional dynamics in high-dimensional recurrent neural networks. Neural Comput. 25, 626–649 (2013).
Wilson, C. G., Nusbaum, A. T., Whitney, P. & Hinson, J. M. Age-differences in cognitive flexibility when overcoming a preexisting bias through feedback. J. Clin. Exp. Neuropsychol. 40, 586–594 (2017).
Axelrod, B. N. & Henry, R. R. Age-related performance on the Wisconsin Card Sorting, Similarities, and Controlled Oral Word Association tests. Clin. Neuropsychol. 6, 16–26 (1992).
Dreisbach, G. & Fröber, K. On how to be flexible (or not): modulation of the stability–flexibility balance. Curr. Dir. Psychol. Sci. 28, 3–9 (2019).
Roy, N. A. et al. Extracting the dynamics of behavior in sensory decision-making experiments. Neuron 109, 597–610.e6 (2021).
Karl, M., Soelch, M., Bayer, J. & van der Smagt, P. Deep variational Bayes filters: unsupervised learning of state space models from raw data. Preprint at https://arxiv.org/abs/1605.06432 (2016).
Peterson, J. C., Bourgin, D. D., Agrawal, M., Reichman, D. & Griffiths, T. L. Using large-scale experiments and machine learning to discover theories of human decision-making. Science 372, 1209–1214 (2021).
Pandarinath, C. et al. Inferring single-trial neural population dynamics using sequential auto-encoders. Nat. Methods 15, 805–815 (2018).
Musslick, S., Bizyaeva, A., Agaron, S., Leonard, N. & Cohen, J. D. Stability–flexibility dilemma in cognitive control: a dynamical system perspective. In Proc. of the 41st Annual Meeting of the Cognitive Science Society 2420–2426 (Cognitive Science Society, 2019).
Ueltzhöffer, K., Armbruster-Genç, D. J. N. & Fiebach, C. J. Stochastic dynamics underlying cognitive stability and flexibility. PLoS Comput. Biol. 11, e1004331 (2015).
Ardid, S. & Wang, X. J. A tweaking principle for executive control: neuronal circuit mechanism for rule-based task switching and conflict resolution. J. Neurosci. 33, 19504–19517 (2013).
Richards, B. A. et al. A deep learning framework for neuroscience. Nat. Neurosci. 22, 1761–1770 (2019).
Turner, B. M. et al. A Bayesian framework for simultaneously modeling neural and behavioral data. NeuroImage 72, 193–206 (2013).
Kramer, D., Bommer, P. L., Tombolini, C., Koppe, G. & Durstewitz, D. Reconstructing Nonlinear Dynamical Systems from Multi-Modal Time Series. In Proc. of the 39th International Conference on Machine Learning 162, 11613–11633 (Proceedings of Machine Learning Research, 2022).
Geiger, D., Verma, T. & Pearl, J. Identifying independence in Bayesian networks. Networks 20, 507–534 (1990).
Rezende, D. J. & Mohamed, S. Variational Inference with Normalizing Flows. In Proc. of the 32nd International Conference on Machine Learning 37, 1530–1538 (Proceedings of Machine Learning Research, 2015).
Leglaive, S., Alameda-Pineda, X., Girin, L. & Horaud, R. A recurrent variational autoencoder for speech enhancement. Preprint at https://arxiv.org/abs/1910.10942 (2019).
Reddi, S. J., Kale, S. & Kumar, S. On the convergence of Adam and beyond. Preprint at https://arxiv.org/abs/1904.09237 (2019).
We thank J. Cunningham, G. Huckins, M. Steyvers, D. Sussillo, L. Tian, the Lumos Labs research team and other members of the Poldrack Lab for helpful conversations and comments on the manuscript. We also thank M. Chávez for providing graphics files of the Ebb and Flow stimuli. The authors acknowledge the Texas Advanced Computing Center at The University of Texas at Austin for providing high performance computing and storage resources that contributed to the research results reported within this paper: http://www.tacc.utexas.edu. The authors received no specific funding for this work.
P.I.J. and R.J.S. are employed by Lumos Labs, Inc. and own stock in the company. R.A.P. and P.G.B. have no competing interests.
Peer review information
Nature Human Behaviour thanks Sebastian Musslick, Angela Radulescu, and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended Data Fig. 1 Probabilistic graphical models of task-DyVA.
Shaded nodes indicate observed variables; unshaded nodes indicate unobserved (latent) variables. Circles indicate variables that depend stochastically on their parent nodes; diamonds indicate variables that depend deterministically on their parent nodes. ut: model representation of the task stimuli at timestep t; xt: model representation of the task responses at timestep t; zt: the model’s latent state at timestep t; wt: noise added to the latent state at timestep t; ht: output of the backward RNN used in the encoder model at timestep t. See Methods for additional information. a, The generative model of task-DyVA. b, The encoder (inference) model of task-DyVA.
Extended Data Fig. 2 Example RT distributions from participants and fitted models.
Within each age bin (columns), participants were sorted by their mean RT; every other participant is shown (10 out of 20 participants within each age bin; top row = shortest RTs).
Extended Data Fig. 3 Comparison of model and participant accuracy, RT variability, and congruency/switch interaction.
a, Mean ± s.e.m. accuracy vs. stimulus noise SD (N = 140 participants/models). Note that participant accuracy (blue) was not assessed at different noise levels. At the noise level used to train the models (0.1SD), the mean ± s.e.m. accuracy was 0.99 ± 0.0033 for the models and 0.96 ± 0.0022 for the participants. b, As in panel a, but for the accuracy congruency effect (accuracy on congruent trials minus accuracy on incongruent trials). c, As in panel a, but for the accuracy switch cost (accuracy on stay trials minus accuracy on switch trials). d, RT standard deviation (SD) for participants and corresponding models (each point is one participant/model; black line: unity; red dashed line: best linear fit, slope = 0.40; Pearson’s r = 0.88, bootstrap 95% CI = (0.82, 0.92); N = 140 participants/models). The mean ± s.e.m. RT SD for the models was 69.8 ± 1.9ms and 122.5 ± 4.2ms for the participants. e, RT SD within each age bin (N = 20 participants/models per age bin). f, Mean RT split by trial type (N = 140 participant/models per condition; 2 × 2 repeated measures ANOVA with congruency and stay/switch as within-subject factors: model stay/switch: F(1, 139) = 360.39, p < 0.001; participant stay/switch: F(1, 139) = 433.15, p < 0.001; model congruency: F(1, 139) = 679.65, p < 0.001; participant congruency: F(1, 139) = 763.53, p < 0.001; model switch/congruency interaction: F(1, 139) = 235.79, p < 0.001; participant switch/congruency interaction: F(1, 139) = 490.35, p < 0.001). g, Accuracy split by trial type (N = 140 participant/models per condition; 2 × 2 repeated measures ANOVA with congruency and stay/switch as within-subject factors: model stay/switch: F(1, 139) = 13.66, p < 0.001; participant stay/switch: F(1, 139) = 116.90, p < 0.001; model congruency: F(1, 139) = 3.54, p = 0.062; participant congruency: F(1, 139) = 416.69, p < 0.001; model switch/congruency interaction: F(1, 139) = 1.74, p = 0.19; participant switch/congruency interaction: F(1, 139) = 184.40, p < 0.001). For the box plots in panels e-g, center line: median; box limits: upper and lower quartiles; whiskers: 1.5x interquartile range; points: outliers).
Extended Data Fig. 4 Strong correlations between participant and model behavioral metrics are preserved with elevated stimulus noise.
a, Mean RTs at 0.4SD stimulus noise (Pearson’s r for participants vs. models = 0.96, bootstrap 95% CI = (0.94, 0.97), best-fit slope = 0.71). b, Switch costs at 0.4SD stimulus noise (Pearson’s r for participants vs. models = 0.75, bootstrap 95% CI = (0.68, 0.82), best-fit slope = 1.1). c, Congruency effects at 0.4SD stimulus noise (Pearson’s r for participants vs. models = 0.79, bootstrap 95% CI = (0.66, 0.88), best-fit slope = 0.38). For panels a-c, each point is one participant/model; black line: unity; red dashed line: best linear fit; N = 140 participants/models. Note that participant behavior was not assessed at different noise levels.
Extended Data Fig. 5 Analysis of models fitted with early stage practice data and models trained with exGaussian smoothing kernels.
a, Model vs. participant mean RTs for models fitted with early practice data (Pearson’s r for participants vs. models = 0.99, bootstrap 95% CI = (0.98, 1.0), best-fit slope = 0.83). b, Model vs. participant switch costs for models fitted with early practice data (Pearson’s r for participants vs. models = 0.92, bootstrap 95% CI = (0.77, 0.97), best-fit slope = 0.80). c, Model vs. participant congruency effects for models fitted with early practice data (Pearson’s r for participants vs. models = 0.91, bootstrap 95% CI = (0.85, 0.96), best-fit slope = 0.66). d, Model vs. participant mean RTs for models fitted with exGaussian smoothing kernels (Pearson’s r for participants vs. models = 0.99, bootstrap 95% CI = (0.97, 1.0), best-fit slope = 0.87). e, Model vs. participant switch costs for models fitted with exGaussian smoothing kernels (Pearson’s r for participants vs. models = 0.92, bootstrap 95% CI = (0.83, 0.98), best-fit slope = 0.80). f, Model vs. participant congruency effects for models fitted with exGaussian smoothing kernels (Pearson’s r for participants vs. models = 0.98, bootstrap 95% CI = (0.96, 0.99), best-fit slope = 0.93). g, Empirical Kullback-Leibler divergence (DKL) from user RT distributions to model RT distributions for models trained with Gaussian and exGaussian smoothing kernels (mean ± s.e.m. DKL for exGaussian models: 0.20 ± 0.015, N = 35 models/participants; mean ± s.e.m. DKL for Gaussian models: 0.22 ± 0.0076; N = 140 models/participants; two-sided Wilcoxon rank-sum test for exGaussian vs. Gaussian DKL: U = − 1.13, p = 0.26). For panels a-f, each point is one participant/model; black line: unity; red dashed line: best linear fit; N = 35 randomly selected participants/models; participant samples from panels a-c and d-g are different.
Extended Data Fig. 6 Analysis of models trained to perform the task optimally.
a, Example stimulus sequence and model outputs for one of the task-optimized models. b, Model accuracy (mean ± s.e.m.: 0.967 ± 0.017). c, Model mean RT (mean ± s.e.m.: 995.2 ± 2.2ms). d, Model switch cost (mean ± s.e.m.: − 0.012 ± 0.41ms; two-sided signed-rank test vs. zero: W = 27, p = 1.0). e, Model congruency effect (mean ± s.e.m.: 0.46 ± 0.22ms; two-sided signed-rank test vs. zero: W = 10, p = 0.084). f, Model RT SD (mean ± s.e.m.: 16.9 ± 1.9ms). For panels b-f, N = 10 models, plots show mean ± s.e.m. (black) and individual models (blue).
Extended Data Fig. 7 Latent representations of additional trained models.
Each plot shows the trial-averaged latent state trajectories and stable fixed points (‘x’ marks) from one model. Within each age bin (columns), participants were sorted by their mean RT; every fourth participant is shown (5 out of 20 participants within each age bin; top row = shortest RTs). All plots are shown from the same perspective.
Extended Data Fig. 8 Additional analyses related to the separation of task representations.
a, Illustration of how greater separation between task regions could contribute to switch costs. The transition from task region A to task region B is costly (that is slow). This model predicts a positive correlation between the distance to task region A at stimulus onset and RTs for switch trials. b, Illustration of how slower dynamics within task region B could contribute to switch costs. The transition from task region A to task region B is fast. This model predicts that the correlation between the distance to task region A at stimulus onset and RTs for switch trials will be near zero. c, Example latent state trajectories from four models. For each model, the stay and switch trajectories were averaged over trials with a fixed stimulus configuration (same stimuli and task cues on the current trial; task cues on the previous trial differed to select stay vs. switch trials). The stimulus configurations varied across the four models. d, Pearson’s r for model switch cost vs. the normalized distance between task centroids at different noise levels. Center line shows the Pearson’s r at each noise level, error bars show 95% bootstrap confidence intervals (two-sided test for non-zero correlation using the exact distribution of r, 0.1SD: r = 0.65, 95% CI = (0.56, 0.73), p < 0.001, 0.2SD: r = 0.66, 95% CI = (0.55, 0.75), p < 0.001, 0.3SD: r = 0.38, 95% CI = (0.19, 0.54), p < 0.001, 0.4SD: r = 0.24, 95% CI = (0.06, 0.41), p = 0.0049, 0.5SD: r = 0.06, 95% CI = (-0.10, 0.24), p = 0.48, 0.6SD: r = -0.02, 95% CI = (-0.21, 0.20), p = 0.80, 0.7SD: r = -0.07, 95% CI = (-0.24, 0.17), p = 0.43, 0.8SD: r = -0.10, 95% CI = (-0.29, 0.14), p = 0.26, 0.9SD: r = -0.13, 95% CI = (-0.33, 0.11), p = 0.13, 1.0SD: r = -0.09, 95% CI = (-0.28, 0.13), p = 0.30; no adjustments were made for multiple comparisons, N = 140 models at each noise level). e, Normalized distance between task centroids vs. age (N = 20 models per age bin; center lines: median; box limits: upper and lower quartiles; whiskers: 1.5x interquartile range; points: outliers). Age bin means were significantly different (one-way ANOVA: F(6, 133) = 5.7, p < 0.001). Post- hoc comparisons between all age groups revealed that the 80-89 age group exhibited significantly greater task centroid separation vs. ages 20-29 (mean difference = 0.24, p = 0.001), ages 30-39 (mean difference = 0.25, p = 0.001), ages 40-49 (mean difference = 0.26, p = 0.001), ages 50-59 (mean difference = 0.23, p = 0.0021), and ages 60-69 (mean difference = 0.25, p = 0.001; Tukey’s range test with family-wise error rate = 0.05, adjusted p-values are reported). **: p < 0.01, ***: p < 0.001.
Extended Data Fig. 9 The sc- models have reduced switch costs.
a, Switch costs for the sc+ and sc- models (black: mean ± s.e.m.; green lines connect models trained on data from the same participant; mean ± s.e.m. for sc+ models: 111.5 ± 4.7ms; sc- models: 0.91 ± 4.5ms; N = 25 model pairs). b, Mean RTs for the sc+ and sc- models (mean ± s.e.m. for sc+ models: 888.4 ± 21.3ms; sc- models: 920.9 ± 21.5ms; N = 25 model pairs). c, Congruency effects for the sc+ and sc- models (mean ± s.e.m. for sc+ models: 79.8 ± 7.7ms; sc- models: 71.6 ± 8.5ms; N = 25 model pairs).
Extended Data Fig. 10 The reduced accuracy of the sc- models is consistent across noise levels and trial types.
a, Mean error rate for congruent trials. b, Mean error rate for incongruent trials. c, Mean error rate for stay trials. d, Mean error rate for switch trials. e, Correlation between model robustness and participant congruency effect (center line shows the Pearson’s r at each noise level, error bars show bootstrap 95% CIs). Model robustness was defined as the difference in model accuracy between a given noise level (x-axis) and baseline (0.1SD). The 95% CIs did not contain zero for noise values ≥0.4SD. For panels a-e, error bars show bootstrap 95% CIs. Panels a-d: N = 25 sc+ models and 25 sc- models; panel e: N = 140 participants/models.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Jaffe, P.I., Poldrack, R.A., Schafer, R.J. et al. Modelling human behaviour in cognitive tasks with latent dynamical systems. Nat Hum Behav (2023). https://doi.org/10.1038/s41562-022-01510-8