Contextual inference underlies the learning of sensorimotor repertoires

Heald, James B.; Lengyel, Máté; Wolpert, Daniel M.

doi:10.1038/s41586-021-04129-3

Article
Published: 24 November 2021

Contextual inference underlies the learning of sensorimotor repertoires

Nature volume 600, pages 489–493 (2021)Cite this article

22k Accesses
50 Citations
115 Altmetric
Metrics details

Subjects

Asbtract

Humans spend a lifetime learning, storing and refining a repertoire of motor memories. For example, through experience, we become proficient at manipulating a large range of objects with distinct dynamical properties. However, it is unknown what principle underlies how our continuous stream of sensorimotor experience is segmented into separate memories and how we adapt and use this growing repertoire. Here we develop a theory of motor learning based on the key principle that memory creation, updating and expression are all controlled by a single computation—contextual inference. Our theory reveals that adaptation can arise both by creating and updating memories (proper learning) and by changing how existing memories are differentially expressed (apparent learning). This insight enables us to account for key features of motor learning that had no unified explanation: spontaneous recovery¹, savings², anterograde interference³, how environmental consistency affects learning rate^4,5 and the distinction between explicit and implicit learning⁶. Critically, our theory also predicts new phenomena—evoked recovery and context-dependent single-trial learning—which we confirm experimentally. These results suggest that contextual inference, rather than classical single-context mechanisms^1,4,7,8,9, is the key principle underlying how a diverse set of experiences is reflected in our motor behaviour.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Contributions of contextual inference to motor learning in the COIN model.**

**Fig. 2: Memory creation and expression accounts for spontaneous and evoked recovery.**

**Fig. 3: Memory updating depends on contextual inference.**

**Fig. 4: Contextual inference underlies apparent changes in learning rate.**

Reinforcement learning establishes a minimal metacognitive process to monitor and control motor learning performance

Article Open access 08 July 2023

Taisei Sugiyama, Nicolas Schweighofer & Jun Izawa

Cortical preparatory activity indexes learned motor memories

Article 26 January 2022

Xulu Sun, Daniel J. O’Shea, … Krishna V. Shenoy

Time-dependent competition between goal-directed and habitual response preparation

Article 30 September 2019

Robert M. Hardwick, Alexander D. Forrence, … Adrian M. Haith

Data availability

All experimental data are publicly available at the Dryad repository (https://doi.org/10.5061/dryad.m63xsj42r). The data include the raw kinematics and force profiles of individual participants on all trials as well as the adaptation measures used to generate the experimental data shown in Figs. 2c, e and 3d.

Code availability

The code for the COIN model is available at GitHub (https://github.com/jamesheald/COIN).

References

Smith, M. A., Ghazizadeh, A. & Shadmehr, R. Interacting adaptive processes with different timescales underlie short-term motor learning. PLoS Biol. 4, e179 (2006).
Article Google Scholar
Kitago, T., Ryan, S., Mazzoni, P., Krakauer, J. W. & Haith, A. M. Unlearning versus savings in visuomotor adaptation: comparing effects of washout, passage of time, and removal of errors on motor memory. Front. Hum. Neurosci. 7, 307 (2013).
Article Google Scholar
Sing, G. C. & Smith, M. A. Reduction in learning rates associated with anterograde interference results from interactions between different timescales in motor adaptation. PLoS Comput. Biol. 6, e1000893 (2010).
Article ADS Google Scholar
Herzfeld, D. J., Vaswani, P. A., Marko, M. K. & Shadmehr, R. A memory of errors in sensorimotor learning. Science 345, 1349–1353 (2014).
Article ADS CAS Google Scholar
Gonzalez Castro, L. N., Hadjiosif, A. M., Hemphill, M. A. & Smith, M. A. Environmental consistency determines the rate of motor adaptation. Curr. Biol. 24, 1050–1061 (2014).
Article CAS Google Scholar
McDougle, S. D. et al. Credit assignment in movement-dependent reinforcement learning. Proc. Natl Acad. Sci. USA 113, 6797–6802 (2016).
Article CAS Google Scholar
Donchin, O., Francis, J. T. & Shadmehr, R. Quantifying generalization from trial-by-trial behavior of adaptive systems that learn with basis functions: theory and experiments in human motor control. J. Neurosci. 23, 9032–9045 (2003).
Article CAS Google Scholar
Thoroughman, K. A. & Shadmehr, R. Learning of action through adaptive combination of motor primitives. Nature 407, 742–747 (2000).
Article ADS CAS Google Scholar
Shadmehr, R., Smith, M. A. & Krakauer, J. W. Error correction, sensory prediction, and adaptation in motor control. Annu. Rev. Neurosci. 33, 89–108 (2010).
Article CAS Google Scholar
Wolpert, D. M. & Kawato, M. Multiple paired forward and inverse models for motor control. Neural Netw. 11, 1317–1329 (1998).
Article CAS Google Scholar
Oh, Y. & Schweighofer, N. Minimizing precision-weighted sensory prediction errors via memory formation and switching in motor adaptation. J. Neurosci. 39, 9237–9250 (2019).
Gershman, S. J., Radulescu, A., Norman, K. A. & Niv, Y. Statistical computations underlying the dynamics of memory updating. PLoS Comput. Biol. 10, e1003939 (2014).
Article ADS Google Scholar
Hulst, T. et al. Cerebellar degeneration reduces memory resilience after extended training. Preprint at bioRxiv https://doi.org/10.1101/2020.07.03.185959 (2021).
Pekny, S. E., Criscimagna-Hemminger, S. E. & Shadmehr, R. Protection and expression of human motor memories. J. Neurosci. 31, 13829–13839 (2011).
Article CAS Google Scholar
Rescorla, R. A. & Heth, C. D. Reinstatement of fear to an extinguished conditioned stimulus. J. Exp. Psychol. Anim. Behav. Process. 1, 88–96 (1975).
Article CAS Google Scholar
Berniker, M. & Kording, K. Estimating the sources of motor errors for adaptation and generalization. Nat. Neurosci. 11, 1454–1461 (2008).
Article CAS Google Scholar
Taylor, J. A., Wojaczynski, G. J. & Ivry, R. B. Trial-by-trial analysis of intermanual transfer during visuomotor adaptation. J. Neurophysiol. 106, 3157–3172 (2011).
Article Google Scholar
Kording, K. P., Tenenbaum, J. B. & Shadmehr, R. The dynamics of memory as a consequence of optimal adaptation to a changing body. Nat. Neurosci. 10, 779–786 (2007).
Article CAS Google Scholar
Heald, J. B., Ingram, J. N., Flanagan, J. R. & Wolpert, D. M. Multiple motor memories are learned to control different points on a tool. Nat. Hum. Behav. 2, 300–311 (2018).
Article Google Scholar
Coltman, S. K., Cashaback, J. G. A. & Gribble, P. L. Both fast and slow learning processes contribute to savings following sensorimotor adaptation. J. Neurophysiol. 121, 1575–1583 (2019).
Article Google Scholar
Huang, V. S., Haith, A., Mazzoni, P. & Krakauer, J. W. Rethinking motor learning and savings in adaptation paradigms: model-free memory for successful actions combines with internal models. Neuron 70, 787–801 (2011).
Article CAS Google Scholar
Keisler, A. & Shadmehr, R. A shared resource between declarative memory and motor memory. J. Neurosci. 30, 14817–14823 (2010).
Article CAS Google Scholar
McDougle, S. D., Ivry, R. B. & Taylor, J. A. Taking aim at the cognitive side of learning in sensorimotor adaptation tasks. Trends Cogn. Sci. 20, 535–544 (2016).
Article Google Scholar
McDougle, S. D., Bond, K. M. & Taylor, J. A. Explicit and implicit processes constitute the fast and slow processes of sensorimotor learning. J. Neurosci. 35, 9568–9579 (2015).
Article CAS Google Scholar
Miyamoto, Y. R., Wang, S. & Smith, M. A. Implicit adaptation compensates for erratic explicit strategy in human motor learning. Nat. Neurosci. 23, 443–455 (2020).
Article CAS Google Scholar
Haruno, M., Wolpert, D. M. & Kawato, M. MOSAIC model for sensorimotor learning and control. Neural Comput. 13, 2201–2220 (2001).
Article CAS Google Scholar
Gershman, S. J., Blei, D. M. & Niv, Y. Context, learning, and extinction. Psychol. Rev. 117, 197–209 (2010).
Article Google Scholar
Sanders, H., Wilson, M. A. & Gershman, S. J. Hippocampal remapping as hidden state inference. eLife 9, e51140 (2020).
Article Google Scholar
Collins, A. & Koechlin, E. Reasoning, learning, and creativity: frontal lobe function and human decision-making. PLoS Biol. 10, e1001293 (2012).
Article CAS Google Scholar
Collins, A. G. E. & Frank, M. J. Cognitive control over learning: creating, clustering, and generalizing task-set structure. Psychol. Rev. 120, 190–229 (2013).
Article Google Scholar
Howard, I. S., Ingram, J. N. & Wolpert, D. M. A modular planar robotic manipulandum with end-point torque control. J. Neurosci. Methods 181, 199–211 (2009).
Article Google Scholar
Milner, T. E. & Franklin, D. W. Impedance control and internal model use during the initial stage of adaptation to novel dynamics in humans. J. Physiol. 567, 651–664 (2005).
Article CAS Google Scholar
Scheidt, R. A., Reinkensmeyer, D. J., Conditt, M. A., Rymer, W. Z. & Mussa-Ivaldi, F. A. Persistence of motor adaptation during constrained, multi-joint, arm movements. J. Neurophysiol. 84, 853–862 (2000).
Article CAS Google Scholar
Teh, Y. W., Jordan, M. I., Beal, M. J. & Blei, D. M. Hierarchical Dirichlet processes. J. Am. Stat. Assoc. 101, 1566–1581 (2006).
Article MathSciNet CAS Google Scholar
Fox, E. B., Sudderth, E. B., Jordan, M. I. & Willsky, A. S. An HDP-HMM for systems with state persistence. In Proc. 25th International Conference on Machine Learning (eds. McCallum, A. & Roweis, S.) 312–319 (Omnipress, 2008).
Teh, Y. W. in Encyclopedia of Machine Learning (eds. Sammut, C. & Webb, G. I.) 280–287 (Springer, 2011).
Carvalho, C. M., Johannes, M. S., Lopes, H. F. & Polson, N. G. Particle learning and smoothing. Stat. Sci. 25, 88–106 (2010).
Article MathSciNet Google Scholar
Lopes, H. F., Carvalho, C. M., Johannes, M. S., & Polson, N. G. in Bayesian Statistics 9 (eds. Bernardo, J. M. et al.) (Oxford Univ. Press, 2011).
Houlsby, N. et al. Cognitive tomography reveals complex, task-independent mental representations. Curr. Biol. 23, 2169–2175 (2013).
Article CAS Google Scholar
Acerbi, L. & Ma, W. J. Practical Bayesian optimization for model fitting with Bayesian adaptive direct search. In Proc. Advances in Neural Information Processing Systems (eds. Guyon, I. et al.) 1836–1846 (Curran, 2017).
Jeffreys, H. The Theory of Probability (Oxford Univ. Press, 1998).
Li, J., Wang, Z. J., Palmer, S. J. & McKeown, M. J. Dynamic Bayesian network modeling of fMRI: a comparison of group-analysis methods. Neuroimage 41, 398–407 (2008).
Article Google Scholar

Download references

Acknowledgements

We thank J. N. Ingram for technical support and G. Hennequin for discussions. This work was supported by the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement no. 726090 to M.L.), the Wellcome Trust (Investigator Awards 212262/Z/18/Z to M.L. and 097803/Z/11/Z to D.M.W.), the Royal Society (Noreen Murray Professorship in Neurobiology to D.M.W), the National Institutes of Health (R01NS117699 and U19NS104649 to D.M.W.) and Engineering and Physical Sciences Research Council (studentship to J.B.H).

Author information

These authors contributed equally: Máté Lengyel, Daniel M. Wolpert

Authors and Affiliations

Mortimer B. Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY, USA
James B. Heald & Daniel M. Wolpert
Computational and Biological Learning Lab, Department of Engineering, University of Cambridge, Cambridge, UK
James B. Heald, Máté Lengyel & Daniel M. Wolpert
Department of Neuroscience, Columbia University, New York, NY, USA
James B. Heald & Daniel M. Wolpert
Center for Cognitive Computation, Department of Cognitive Science, Central European University, Budapest, Hungary
Máté Lengyel

Authors

James B. Heald
View author publications
You can also search for this author in PubMed Google Scholar
Máté Lengyel
View author publications
You can also search for this author in PubMed Google Scholar
Daniel M. Wolpert
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

J.B.H. developed the model, implemented the model, performed the experiments, analysed the data and performed simulations. J.B.H. and D.M.W. designed the behavioural experiments. All of the authors were involved in the conceptualization of the study, developed techniques for analysing the model, interpreted results and wrote the paper.

Corresponding author

Correspondence to James B. Heald.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information Nature thanks the anonymous reviewers for their contribution to the peer review of this work. Peer reviewer reports are available.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data figures and tables

Extended Data Fig. 1 Additional details of the COIN model (related to Fig. 1). a-b, Hierarchy and generalization in contextual inference.

a, Local transition probabilities are generated in two steps via a hierarchical Dirichlet process. In the first step (top), an infinite set of global transition probabilities β are generated via a stochastic stick-breaking process (Supplementary Information). Probabilities are represented by the width of bar segments with different colours indicating different contexts. In the second step (bottom), for each context (‘from context’), local transition probabilities to each context (‘to context’) are generated (a row of \({\Pi }\)) via a stochastic Dirichlet process and are equal to the global probabilities in expectation (bar a self-transition bias, which we set to zero here for clarity). (An analogous hierarchical Dirichlet process, not shown, is used to generate the global and local cue probabilities.) b, Contextual inference updates both the global and local transition probabilities. Context transition counts are maintained for all from-to pairs of known contexts and get updated based on the contexts inferred on two consecutive time points (responsibilities at time points t and \(t+1\)). These updated context transition counts are used to update the inferred global transition probabilities \(\hat{{\boldsymbol{\beta }}}\). The updated global transition probabilities and context transition counts produce new inferences about the inferred local transition probabilities \(\hat{{\Pi }}\). Note that although the model infers full (Dirichlet) posterior distributions over both the global and local transition probabilities, for clarity here we only show the means of these posterior distributions (indicated by the hat notation). In the example shown, only row 3 of the context transition counts is updated (as context 3 has an overwhelming responsibility at time \(t\)), but all rows of the local transition probabilities are updated due to the updating of the global transition probabilities (if the model were non-hierarchical, there would be no global transition probabilities, and so the local transition probabilities would only be updated for context 3 via the updated context transition contexts). Thus, inferences about transition probabilities generalise from one context (here context 3) to all other contexts (here contexts 1 and 2) due to the hierarchical nature of the generative model. Note that when a novel context is encountered for the first time, its local transition probabilities are initialised based on \(\hat{{\boldsymbol{\beta }}}\), thus allowing well-informed inferences about transitions to be drawn immediately. c–e, Parameter inference in the COIN model for the simulation shown in Fig. 1c–h. In addition to inferring states and contexts, the COIN model also infers context transition (c) and cue (d) probabilities, as well as the parameters of context-specific state dynamics (e). c, Transition probabilities. Top: Inferred global transition probabilities (solid lines) for transitioning into each known context (line colours) and the novel context (grey). Pale lines show inferred stationary probabilities for the same contexts, representing the expected proportion of time spent in each context given the current estimate of the local transition probabilities (below). Bottom three panels: inferred local transition probabilities from each context (colours as in top panel). Note that the local transition probability from context 1 to context 2 increases when cup 3 is handled (that is, when transitions from context 2 to itself are inferred to happen) due to the generalization of inferred transition probabilities across contexts. Also note that the local transition probabilities from context 3 are initialised based on the global transition probabilities (plus a self-transition bias). d, Inferred global (top panel) and local cue probabilities for the three known contexts (bottom three panels) and cues (line colours). Although the model infers full (Dirichlet) posterior distributions over both transition (c) and cue probabilities (d), for clarity here we only show the means of these posterior distributions. e, Posterior distributions of the state drift (left) and retention parameters (right) for the three known contexts (colours as in c, novel context not shown for clarity). Although the model infers the joint distribution of the drift and retention parameters for each context, for clarity here we show the marginal distribution of each parameter separately. Note that drift and retention are inferred to be larger for the red context that is associated with the largest perturbation.

Extended Data Fig. 2 Validation of the COIN model. a, Validation of the inference algorithm of the COIN model with a single context.

We computed inferences in the COIN model with a single context based on synthetic observations (state feedback) generated by its generative model (Fig. 1a). Plots show the cumulative distributions of the posterior predictive p-values for the state variable (left) and the parameters governing its dynamics (retention, middle; drift, right). The posterior predictive p-value is computed by evaluating the cumulative distribution function of the model’s posterior over the corresponding quantity at the true value of that quantity (as defined by the generative model). Empirical distributions of the posterior predictive p-values were collected across 4000 simulations (with different true retention and drift parameters), with 500 time steps in each simulation (during which the true state changes, but the true retention and drift parameters are constant). Note that although the true retention and drift parameters do not change during a simulation, inferences in the model about them still evolve in general, and so a new posterior predictive p-value is generated at each time step even for these quantities. If the model implements well-calibrated probabilistic inference under the correct generative model, the empirical distributions of the posterior predictive p-values should all be uniform. This is confirmed by all cumulative distributions of the posterior predictive p-values (orange and purple curves) approximating the identity line (thin black diagonal line). Orange curves show posterior predictive p-values under the corresponding marginals of the model’s posterior. To give additional information about the model’s joint posterior over the retention and drift parameters, we also show the cumulative distribution of the posterior predictive p-value for each parameter conditioned on the true value of the other parameter (retention | drift, and drift | retention, purple curves). b, Validation of the inference algorithm of the COIN model with multiple contexts. Simulations as in a but with additional synthetic observations (sensory cues) and multiple contexts allowed both during data generation and inference. Empirical distributions of the posterior predictive p-values were collected across 2000 simulations (with different true retention and drift parameters), with 500 time steps in each simulation (during which not only the true states change but also contexts transition, and sometimes novel contexts become active). Left column shows the true distributions of sensory cues, contexts and parameters. Inset shows the growth of the number of contexts over time both during generation (blue) and inference (orange). Middle and right columns show the cumulative distributions of the posterior predictive p-values (pooled across data sets and time steps) for the observations (top row), contexts and state (middle row) and parameters (bottom row). To calculate the posterior predictive p-values for the context, inferred contexts were relabelled by minimising the Hamming distance between the relabelled context sequence and the true context sequence (Supplementary Information). For the parameters, the posterior predictive p-values were calculated with respect to both the marginal distributions (retention and drift) and the conditional distributions (retention | drift, and drift | retention) as in a. The cumulative probability curves approximate the identity line (thin black diagonal line) showing that the inferred posterior probability distributions are well calibrated. c, Parameter recovery in the COIN model related to Fig. 2. Plots show the COIN model parameters that were recovered (y-axes) from fits to 10 synthetic data sets generated with the COIN model parameters (true, x-axes) obtained from the fits to each participant in the spontaneous (n = 8) and evoked (n = 8) recovery experiments (Extended Data Fig. 3). Vertical bars show the interquartile range of the recovered parameters for each participant. While several parameters are recovered with good accuracy \(({\sigma }_{{\rm{q}}},{\mu }_{{\rm{a}}},{\sigma }_{{\rm{d}}},{\sigma }_{{\rm{m}}})\), others are not (α, and in particular \({\sigma }_{{\rm{a}}}\) and \(\rho \)). We expect that with richer paradigms and larger data sets, all parameters would be recovered accurately. Most importantly, despite partial success with recovering individual parameters, model recovery shows that recovered parameter sets taken as a whole can still be used to accurately identify whether data was generated by the dual-rate or COIN model (d). Note that we make no claims about individual parameters in this study as our focus is on model class recovery. d-e, Model recovery for spontaneous (d) and evoked recovery experiments (e) related to Fig. 2. Synthetic data sets were generated using one of two models (COIN model, cyan; dual-rate model, green). Parameters used for each model were those obtained from the fits to each participant in the spontaneous (n = 8) and evoked (n = 8) recovery experiments (Extended Data Fig. 3), that is, for the COIN model, these were the same synthetic data sets as those used in c. Then the same model comparison method that we used on real data (Fig. 2c, e, insets) was used to recover the model that generated each synthetic data set (Methods). Arrows connect true models (used to generate synthetic data, disks on top) to models that were recovered from their synthetic data (pie-chart disks at bottom). Arrow colour indicates identity of recovered model, arrow thickness and percentages indicate probability of recovered model given true model. Bottom disk sizes and pie-chart proportions show total probability of recovered model and posterior probability of true model given recovered model (assuming a uniform prior over true models), respectively, with percentages specifically indicating posterior probability of the correct model. These results show that the model recovery process is generally very accurate and actually biased against the COIN model in favour of the dual-rate model.

Extended Data Fig. 3 COIN model parameters.

Left column: Parameters for illustrating the COIN model (I: purple), model validation (V: brown) and fits to individuals in the spontaneous (S: blue) and evoked (E: green) recovery experiments, to the average of both groups (A: cyan), and individuals in the memory-updating experiment (M: red). Right: scatter plots for all pairs of parameters for the six groups. The overlap of data points suggest parameters are similar across experiments. \({\sigma }_{{\rm{q}}}\): process noise s.d. (equation 3); \({\mu }_{{\rm{a}}}\), \({\sigma }_{{\rm{a}}}\): prior mean and s.d. for context-specific state retention factors (equation 10); \({\sigma }_{{\rm{d}}}\): prior s.d. for context-specific state drifts (equation 10); α: concentration of local transition probabilities (equation 8); \(\rho \): self-transition bias parameter (equation 18); \({\sigma }_{{\rm{m}}}\): motor noise s.d. (equation 17); \({\alpha }^{{\rm{e}}}\): concentration of local cue probabilities (equation 9). Parameters used in the figures is as follows. I: Fig. 1 and Extended Data Fig. 1c–e. V: Extended Data Fig. 2a, b. S: Fig. 2c, Extended Data Fig. 6f (column 1) and Extended Data Fig. 2d. E: Fig. 2e, Extended Data Fig. 6f (column 3) and Extended Data Fig. 2e. S & E: Extended Data Fig. 2c. A: Fig. 2b and d, Extended Data Fig. 5 and Extended Data Fig. 9 (bias added for visuomotor rotation experiments: Extended Data Fig. 5a–j, p–s and Extended Data Fig. 9e–l). M: Fig. 3 and Extended Data Fig. 7a–d. S, E & M: (all parameters, but \({\alpha }^{{\rm{e}}}\)): Fig. 4 and Extended Data Fig. 8. The robustness analyses (Extended Data Fig. 4) used perturbed versions of the same parameters as the corresponding unperturbed simulations. To reduce the number of free parameters in the model, we set the parameters of the hierarchical Dirichlet process that determine the expected effective number of contexts or cues, γ (equation 7) and \({\gamma }^{{\rm{e}}}\) (equation 9), respectively, both to 0.1, the prior mean for context-specific state drifts \({\mu }_{{\rm{d}}}\) to zero (equation 10) and the standard deviation of the sensory noise \({\sigma }_{{\rm{s}}}\) to 0.03 when fitting or simulating the model, with the variance of the observation noise (equations 5 and 19) set to \({\sigma }_{{\rm{r}}}^{2}={\sigma }_{{\rm{s}}}^{2}+{\sigma }_{{\rm{m}}}^{2}\). For visuomotor rotation experiments (Extended Data Fig. 5a–j, p–s and Extended Data Fig. 9e–l), we set the mean of the prior of the bias \({\mu }_{{\rm{b}}}\) to zero (equation 20) and its s.d. \({\sigma }_{{\rm{b}}}\) to 70⁻¹.

Extended Data Fig. 4 Robustness analysis of the main COIN model results.

To test how robust the behaviour of the COIN model is, we added noise to the parameters fit to the individual participants in the spontaneous recovery, evoked recovery and memory updating experiments and re-simulated the paradigms in Figs. 2–4: spontaneous recovery (a), evoked recovery (b), memory updating (c), savings (d), anterograde interference (e) and environmental consistency (f). For each experiment, we simulated the COIN model for the same participants as in Figs. 2–4 but perturbed each participant’s parameter values. That is, for each parameter (suitably transformed to be unbounded) we calculated the standard deviation across participants (relevant for the given paradigm or set of paradigms) and then perturbed each participant’s (transformed) parameter by zero-mean Gaussian noise whose standard deviation was a fraction (λ = 0, 0.05, 0.5 or 1.0) of this empirical standard deviation, after which we used the inverse transform to obtain the actual parameter used in these perturbed simulations. For parameters that are constrained to be non-negative (\({\sigma }_{{\rm{q}}}\), \({\sigma }_{{\rm{a}}}\), \({\sigma }_{{\rm{d}}}\), \(\alpha \), \({\alpha }^{{\rm{e}}}\), \({\sigma }_{{\rm{m}}}\)), we used a logarithmic transformation, whereas for parameters constrained to be on the unit interval (\({\mu }_{{\rm{a}}}\), \(\rho \)), we used a logit transformation. Column 1: experimental data (plotted as in Figs. 2–4). Columns 2-5: output of the COIN model for different amounts of noise added to the parameters. Note that the simulations were not conditioned on the actual adaptation data of individual participants (in contrast to the original simulations of Figs. 2 and 3) because these data are not available for the experiments shown in Fig. 4 (for which the original simulations were already performed using this ‘open-loop’ simulation approach). The robustness analysis shows that most predictions of the COIN model are robust to changes in the parameters and only start to deviate for large parameter changes (λ = 1) in some of their quantitative details (such as the magnitude of spontaneous recovery). Note that λ = 1 leads to changes in parameters that are of the same magnitude as randomly shuffling the parameters across participants.

Extended Data Fig. 5 History dependence of contextual inference. a-j, Contextual inference underlies the elevated level of spontaneous recovery after ‘overlearning’.

a, Spontaneous recovery paradigm for visuomotor learning in which the length of the exposure (P⁺) phase is tripled from 200 trials (‘standard’ paradigm, pink) to 600 trials (‘overlearning’ paradigm, green). For comparison, paradigms are aligned to the end of the exposure phase. b, Adaptation in the COIN model for the standard and overlearning paradigms (same parameters as in Fig. 2b and d but with the addition of a bias parameter; see Supplementary Information and also Extended Data Fig. 3, parameter set A). Adaptation corresponds to reach angle normalized by the size of the experimentally imposed visuomotor rotation. Note elevated level of spontaneous recovery after overlearning compared to the standard paradigm, qualitatively matching visuomotor learning data in Fig. 4a of Ref. ¹³. c-f, Internal representations of the COIN model for the standard paradigm. Inferred bias (c) and predicted state (d) distributions for each context (colours). e, Predicted probabilities of each context (with zoomed view starting from near the end of \({{\rm{P}}}^{+}\) exposure), colours as in c-d, grey is novel context as in Fig. 1f. f, Predicted state feedback (predicted state plus bias) distribution (purple), which is a mixture of the individual contexts’ predicted state feedback distributions (not shown) weighted by their predicted probabilities (e). Total adaptation (cyan line) is the mean of the predicted state feedback distribution. g-j, same as c-f for the overlearning paradigm. For comparison, the dashed horizontal lines in both paradigms show the final level of each variable for the red context in the standard paradigm. Note that overlearning leaves inferences about biases and states largely unchanged (compare 1 in c & g and 2 in d & h) but leads to higher predicted probabilities of the \({{\rm{P}}}^{+}\) context (red) in the channel-trial phase (compare 3 in e & i), reflecting the true statistics of the experiment in which \({{\rm{P}}}^{+}\) occurred more frequently. In turn, this makes the \({{\rm{P}}}^{+}\) bias and state contribute more to total adaptation in the channel-trial phase, thus explaining higher levels of spontaneous recovery. Therefore, differences between conditions are explained by contextual inference rather than by differences in bias or state inferences. The results are qualitatively similar when simulated as a force-field paradigm (that is, without bias, not shown). k-o, Contextual inference underlies reduced spontaneous recovery following pre-training with\({{\rm{P}}}^{-}\). k, Adaptation in the channel-trial phase of a typical spontaneous recovery paradigm (standard, pink, as in Fig. 2b) and two modified versions of the paradigm in which the \({{\rm{P}}}^{+}\) phase is preceded by a \({{\rm{P}}}^{-}\) (pre-training) phase in which \({{\rm{P}}}^{-}\) is either introduced and removed abruptly (\({{\rm{P}}}_{{\rm{abrupt}}}^{-}\), dark green) or gradually (\({{\rm{P}}}_{{\rm{g}}{\rm{r}}{\rm{a}}{\rm{d}}{\rm{u}}{\rm{a}}{\rm{l}}}^{-}\), light green). Data reproduced from Ref. ¹⁴. l-o, Simulation of the COIN model for the same paradigms (same parameters as in Fig. 2b and d; Extended Data Fig. 3, parameter set A), plotted as in Fig. 2b-c. In each paradigm, contexts are coloured according to their order of instantiation during inference (blue\(\to \)red\(\to \)orange). Note that pre-training with \({{\rm{P}}}^{-}\) (either abrupt or gradual) leaves inferences about states within each context largely unchanged at the beginning of the channel-trial phase (compare corresponding numbers 1-2 in column 2 across m-o). However, the pre-training leads to higher predicted probabilities of the \({{\rm{P}}}^{-}\) context initially (compare number 3 in m to number 3 in n & o) and throughout the channel-trial phase (compare number 4 across m-o) reflecting the true statistics of the experiment in which \({{\rm{P}}}^{-}\)occurred more frequently (compare column 1 across m-o). In turn, this makes the \({{\rm{P}}}^{-}\) state contribute more to total adaptation, thus explaining the reduction in both the initial and final levels of adaptation during the channel-trial phase in the \({{\rm{P}}}_{{\rm{a}}{\rm{b}}{\rm{r}}{\rm{u}}{\rm{p}}{\rm{t}}}^{-}\) and \({{\rm{P}}}_{{\rm{g}}{\rm{r}}{\rm{a}}{\rm{d}}{\rm{u}}{\rm{a}}{\rm{l}}}^{-}\) groups. Therefore, as in Fig. 4, differences between conditions are explained by contextual inference rather than state inference. p-s, Contextual inference underlies slower de-adaptation following a gradually introduced perturbation. p, Adaptation (normalized reach angle, as in b) in a paradigm in which a visuomotor rotation is introduced abruptly (pink) or gradually (green) and then removed abruptly. Data reproduced from Ref. ¹⁷. q-s, Simulation of the COIN model on the abrupt (q, pink, and r) and gradual (q, green, and s) paradigms (same parameters as in Fig. 2b and d but with the addition of a bias parameter; Extended Data Fig. 3, parameter set A) plotted as in b-j. Note that contexts are coloured according to their order of appearance during inference (blue\(\to \)red). In response to the abrupt introduction of the \({{\rm{P}}}^{+}\) perturbation, a new memory is created (1). In contrast, the gradual introduction of the \({{\rm{P}}}^{+}\) perturbation prevents the creation of a new memory, thus requiring changes in the inferred bias and state of the original memory associated with \({{\rm{P}}}^{0}\) (2, blue context) to account for the slowly increasing perturbation. Therefore, the ‘blue’ context is inferred to be active throughout the exposure phase (3) and becomes associated with a \({{\rm{P}}}^{+}\)-like state. However, at the beginning of the abruptly introduced post-exposure (P⁰) phase, a new memory is created (4), which has a low initial predicted probability that can only be increased by repeated experience with P⁰ (5). This leads to slower de-adaptation in the post-exposure phase compared to the abrupt paradigm (6), in which the original context associated with \({{\rm{P}}}^{0}\) (blue) is protected (7) and can be reinstated quickly (8) as the \({{\rm{P}}}^{0}\) local self-transition probability has been learned to be higher during the pre-exposure phase. Note that the smaller errors caused by the gradual perturbation relative to the abrupt condition are better accounted for by an error in the state rather than an error in the bias, and therefore the state is updated more than the bias. The results are qualitatively similar when simulated as a force field paradigm (that is, without bias, not shown).

Extended Data Fig. 6 Additional analyses of spontaneous and evoked recovery related to Fig. 2. a-c, Mathematical analysis of spontaneous and evoked recovery.

The channel-trial phase of spontaneous recovery and evoked recovery (after the two \({{\rm{P}}}^{+}\) trials) simulated in a simplified setting (Supplementary Information) with two contexts that are initialized to have equal but opposite state estimates (a) and equal (spontaneous recovery, solid) or highly unequal (evoked recovery, dashed) predicted probabilities (b). For the two contexts, the retention parameters are assumed to be constant and equal, and the drift parameters are assumed to be constant, of the same magnitude but opposite sign. Mean adaptation (c), which in the COIN model is the average of the state estimates (a) weighted by the corresponding predicted probabilities (b), shows the classic pattern of spontaneous recovery (solid, cf. Fig. 2b, c) and the characteristic abrupt rise of evoked recovery (dashed, cf. Fig. 2d, e). Note that although in the full model, state estimates are different between evoked and spontaneous recovery following the two \({{\rm{P}}}^{+}\) trials, here we assume they are the same (no separate solid and dashed lines in a) for simplicity and to demonstrate that the difference in mean adaptation between the two paradigms (c) can be accounted for by differences in contextual inference alone (b, cf. Fig. 2b and d, top right). Circles on the right show steady-state values of inferences and adaptation. Note that in both paradigms, adaptation is predicted to decay to a non-zero asymptote (see also e). d, State-space model fits to adaptation data from the spontaneous and evoked recovery groups. Solid lines show the mean fits across participants of the two-state model (5 parameters, top row) and the three-state model (7 parameters, bottom row) to the spontaneous recovery (left column) and evoked recovery (right column) data sets. Mean \(\pm \) s.e.m. adaptation on channel trials shown in black (same as in Fig. 2c and e). Insets show differences in BIC (nats) between the two-state model and the three-state model for individual participants (positive values in green indicate evidence in favour of the two-state model, and negative values in purple indicate evidence in favour of the three-state model). At the group level, the two-state model was far superior to the three-state model (\(\Delta \) group-level BIC of 64.2 and 78.4 nats in favour of the two-state model for the spontaneous and evoked recovery groups, respectively). Individual states are shown for the two-state model (top, blue and red). Both the fast and slow processes adapt to \({{\rm{P}}}^{+}\) during the extended initial learning period. The \({{\rm{P}}}^{-}\) phase reverses the state of the fast process, but not of the slow process, so that they cancel when summed resulting in baseline performance. Spontaneous recovery during the \({{\rm{P}}}^{{\mathsf{\text{c}}}}\) phase is then explained by the fast process rapidly decaying, revealing the state of the slow process that has remained partially adapted to \({{\rm{P}}}^{+}\). Note that this explanation arises because in multi-rate models all processes contribute equally to the motor output at all times. This is fundamentally different from the expression and updating of multiple context-specific memories in the COIN model, which are dynamically modulated over time according to ongoing contextual inference. e, Evoked recovery does not decay exponentially to zero. According to the COIN model, adaptation in the channel-trial phase of evoked recovery can be approximated by exponential decay to a non-zero (positive) asymptote (a-c, Fig. 2e, Supplementary Information). To test this prediction, we fit an exponential function that either decays to zero (light and dark green) or decays to a non-zero (constrained to be positive) asymptote (cyan) to the adaptation data of individual participants in the evoked recovery group after the two \({{\rm{P}}}^{+}\) trials (black arrow). The two zero-asymptote models differ in terms of whether they are constrained to pass through the datum on the first trial (light green) or not (dark green). The mean fits across participants for the models that decay to zero (green) fail to track the mean adaptation (black, \(\pm \) s.e.m. across participants), which shows an initial period of decay followed by a period of little or no decay. The mean fit for the model that decays to a non-zero asymptote (cyan) tracks the mean adaptation well and was strongly favoured in model comparison (\(\Delta \) group-level BIC of 944.3 and 437.7 nats compared to the zero-asymptote fits with constrained and unconstrained initial values, respectively). Note that fitting to individual participants excludes the confound of finding a more complex time course (e.g. one with a non-zero asymptote) only due to averaging across participants that each show a different simple time course (e.g. all with zero asymptote but different time constants). f, COIN and dual-rate model fits for individual participants in the spontaneous and evoked recovery groups. Data and model predictions are shown for individual participants as in Fig. 2c and e for across-participant averages. Participants in the S and E groups are ordered by decreasing BIC difference between the dual-rate and COIN model (that is, S1’s and E1’s data most favour the COIN model), as in insets of Fig. 2c and e. Note that the COIN model can account for much of the heterogeneity of spontaneous recovery (e.g. from large in S1 to minimal in S6) and evoked recovery (e.g. from large in E1 to minimal in E7).

Extended Data Fig. 7 Additional analyses of the memory updating experiment (related to Fig. 3). a-b, Memory updating experiment: time-course of learning.

a, Adaptation on channel trials at the end of each block of force-field trials in the training phase (purple), which occur before \({{\rm{P}}}^{0}\) washout trials, and on the first channel trial of triplets within each block (orange), which occurs after \({{\rm{P}}}^{0}\) washout trials. Data is mean \(\pm \) s.e.m. across participants and lines show mean of COIN model fits (8 parameters, Extended Data Fig. 3). b, Single-trial learning on triplets that were consistent with the training contingencies. Data (mean \(\pm \) s.e.m. across participants) with mean of COIN model fits across participants. Positive learning reflects changes in the direction expected based on the force field of the exposure trial (an increase following \({{\rm{P}}}^{+}\)and a decrease following \({{\rm{P}}}^{-}\)). c-d, Mathematical analysis of single-trial learning. Single-trial learning in the COIN model (column 1) for the four cue–perturbation triplets in the pre-training phase (c) and the post-training phase (d) in the memory updating experiment. The COIN model was fit to each participant and model fits are shown as mean \(\,\pm \) s.e.m. (single-trial learning, full model prediction) or mean (dot product, posterior, prior and likelihood) across n = 24 participants. Single-trial learning (column 1) is approximately proportional to a dot product (column 2) between the vector of posterior context probabilities (responsibilities) on the exposure trial of the triplet and the vector of predicted context probabilities on the subsequent channel trial (see the Supplementary Information for derivation). This dot product can be further approximated by collapsing the vector of predicted probabilities to a one-hot vector, that is, by the responsibility \(p({c}_{t}={c}^{\ast }|{q}_{t},{y}_{t},\ldots )\) (column 3) of the context that is predominantly expressed on the second channel trial of the triplet (c^*, the context with the highest predicted probability on the second channel trial of the triplet), where … denotes all observations before time t (as in Fig. 1). This responsibility is proportional to a product of two terms. The first term is the prior context probability \(p({c}_{t}={c}^{\ast }|{q}_{t},\ldots )\) (column 4), that is, the predicted context probability before experiencing the perturbation (as in Fig. 1f), which is already conditioned on the sensory cue visible from the outset of the trial. The second term expresses the likelihood of the state feedback in that context \(p({y}_{t}|{c}_{t}={c}^{\ast },\ldots )\) (column 5). Because prior to learning neither cues nor feedback are yet consistently associated with a particular context, the COIN model predicts that the prior and likelihood, and thus total single-trial learning should all be largely uniform across contexts before training. e-f, The effects of cue and perturbation on single-trial learning in individual participants. e, Single-trial learning (post-training) shown as a function of perturbation separated by cue (left) or as a function of cue separated by perturbation (right) for each participant (lines). Note a significant effect for both the perturbation and the cue. f, Scatter plot of cue effect (\({{\rm{P}}}_{1}^{+}+\) \({{\rm{P}}}_{1}^{-}-\) \({{\rm{P}}}_{2}^{+}-\) \({{\rm{P}}}_{2}^{-}\)) against perturbation effect (\({{\rm{P}}}_{1}^{+}+\) \({{\rm{P}}}_{2}^{+}-\) \({{\rm{P}}}_{1}^{-}-\) \({{\rm{P}}}_{2}^{-}\)) for each participant (dots). Solid lines show medians of corresponding effects. Note the lack of anti-correlation between two effects.

Extended Data Fig. 8 Additional analysis of the effect of environmental consistency on single-trial learning related to Fig. 4c.

Columns 1 & 2: experimental paradigm and data replotted from Ref. ⁵. Participants experienced repeating cycles of \({{\rm{P}}}^{+}\) trials of varying lengths (column 1: 20 \({{\rm{P}}}^{+}\) trials in P20, 7 in P7, 1 in P1 and 1 followed by 1 \({{\rm{P}}}^{-}\) trial in P1N1) in between \({{\rm{P}}}^{0}\) trials. To assess single-trial learning (column 2) during exposure to the environments, channel trials were randomly interspersed before and after the first \({{\rm{P}}}^{+}\) trial in a subset of the force-field cycles. Columns 3 to 5 show the output and internal inferences of the COIN model in the same format as Fig. 4c (same parameters as in Fig. 4; Extended Data Fig. 3, parameter set S, E & M). The COIN model qualitatively reproduced the pattern of changes in single-trial learning seen over repeated cycles in this paradigm. As in Fig. 4, differences in the apparent learning rate were not driven by differences in either the proper learning rate (Kalman gain) or the underlying state (column 4) but were instead driven by changes in contextual inference (column 5).

Extended Data Fig. 9 Cognitive processes and the COIN model. a-d, Maintenance of context probabilities may require working memory.

a, Adaptation in a spontaneous recovery paradigm in which a non-memory (pink) or working memory task (green) is performed at the end of the \({{\rm{P}}}^{-}\) phase before starting the channel-trial phase (data reproduced from Ref. ²²). Initial adaptation in the channel-trial phase (inset) shows the working memory task abolishes spontaneous recovery and leads to adaptation akin to evoked recovery (cf. Extended Data Fig. 6a–c). b-d, COIN model simulation in which the working memory task abolishes the (working) memory of the context responsibilities on the last trial of the \({{\rm{P}}}^{-}\) phase but not the context transition (and thus stationary) probabilities (same parameters as in Fig. 2b and d; Extended Data Fig. 3, parameter set A), plotted as in Fig. 2b, c. The circles on the predicted probability (zoomed view) show the values on the first trial in the channel-trial phase. d, as (c) but for the working memory task. The predicted probabilities on the first trial in the channel-trial phase are set to the values under the stationary distribution (shown on every trial in the simulation of Extended Data Fig. 1c). We calculate the stationary context distribution by solving \({\boldsymbol{\psi }}={\boldsymbol{\psi }}\hat{{\Pi }}\) for \({\boldsymbol{\psi }}\) (a row vector), subject to the constraint that \({\boldsymbol{\psi }}\) is a valid probability distribution (i.e. all elements of \({\boldsymbol{\psi }}\) are non-negative and sum to 1), where \(\hat{{\Pi }}\) is the expected local transition probability matrix. e-l, Explicit versus implicit learning in the COIN model. e, Results of a spontaneous recovery paradigm (as in Fig. 2b) for visuomotor learning. Adaptation is computed as participants’ reach angle normalized by the size of the experimentally imposed visuomotor rotation. Explicit learning (dark green) is measured by participants indicating their intended reach direction. Implicit learning (light green) is obtained as the difference between total adaptation (solid pink) and explicit learning. In the visual error-clamp phase (\({{\rm{P}}}^{{\mathsf{c}}}\)), participants were told to stop using any aiming strategy so that the direction they moved was taken as the implicit component of learning. A control experiment (dashed pink) was also performed in which there was no reporting of intended reach direction. Data reproduced from Ref. ²⁴. f-l, Simulation of the COIN model on the same paradigm (same parameters as in Fig. 2b and d but with the addition of a bias parameter; Extended Data Fig. 3, parameter set A). b, Predictions for experimentally observable quantities. Light green line: implicit learning is the average bias across contexts weighted by the predicted probabilities (cyan line in j). Dark green line: explicit learning is the state of the most responsible context on the previous trial (black line in h). Solid pink line: total adaptation for the reporting condition is the sum of explicit and implicit learning (as in e). Dashed pink line: total adaptation for the non-reporting condition is the average predicted state feedback across contexts weighted by the predicted probabilities (cyan line in l, as in all experiments that had no reporting element). g-h, Inferred bias (g) and predicted state (h) distributions for each context (colours), with black line showing the mean state of the most responsible context (coloured line below axis) for trials on which an explicit report was solicited. i, Predicted probability of each context. Colours as in g-h, grey is novel context as in Fig. 1f. j-k, Inferred bias (j) and predicted state (k) distributions (purple), obtained as mixtures of the respective distributions of individual contexts (g-h) weighted by their predicted probabilities (i), and their means (cyan lines). l, Predicted state feedback distribution (purple, computed as the sum of bias in j and predicted state in k) and its mean (cyan).

Extended Data Table 1 Comparison of the COIN model to other models

Full size table

Supplementary information

Supplementary Information

Supplementary Information 1–8. See the Supplementary Information contents page for details.

Reporting Summary

Peer Review File

Rights and permissions

Reprints and permissions

About this article

Cite this article

Heald, J.B., Lengyel, M. & Wolpert, D.M. Contextual inference underlies the learning of sensorimotor repertoires. Nature 600, 489–493 (2021). https://doi.org/10.1038/s41586-021-04129-3

Download citation

Received: 08 December 2020
Accepted: 13 October 2021
Published: 24 November 2021
Issue Date: 16 December 2021
DOI: https://doi.org/10.1038/s41586-021-04129-3

This article is cited by

Initial development of skill with a reversed bicycle and a case series of experienced riders
- Justine Magnard
- Timothy R. Macaulay
- Nicolas Schweighofer
Scientific Reports (2024)
Learning to stand with sensorimotor delays generalizes across directions and from hand to leg effectors
- Brandon G. Rasman
- Jean-Sébastien Blouin
- Patrick A. Forbes
Communications Biology (2024)
Quantifying motor adaptation in a sport-specific table tennis setting
- Daniel Carius
- Elisabeth Kaminski
- Patrick Ragert
Scientific Reports (2024)
Inferring neural activity before plasticity as a foundation for learning beyond backpropagation
- Yuhang Song
- Beren Millidge
- Rafal Bogacz
Nature Neuroscience (2024)
Reinforcement learning establishes a minimal metacognitive process to monitor and control motor learning performance
- Taisei Sugiyama
- Nicolas Schweighofer
- Jun Izawa
Nature Communications (2023)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.