A model for learning based on the joint estimation of stochasticity and volatility

Piray, Payam; Daw, Nathaniel D.

doi:10.1038/s41467-021-26731-9

Download PDF

Article
Open access
Published: 15 November 2021

A model for learning based on the joint estimation of stochasticity and volatility

Nature Communications volume 12, Article number: 6587 (2021) Cite this article

9760 Accesses
32 Citations
44 Altmetric
Metrics details

Subjects

Abstract

Previous research has stressed the importance of uncertainty for controlling the speed of learning, and how such control depends on the learner inferring the noise properties of the environment, especially volatility: the speed of change. However, learning rates are jointly determined by the comparison between volatility and a second factor, moment-to-moment stochasticity. Yet much previous research has focused on simplified cases corresponding to estimation of either factor alone. Here, we introduce a learning model, in which both factors are learned simultaneously from experience, and use the model to simulate human and animal data across many seemingly disparate neuroscientific and behavioral phenomena. By considering the full problem of joint estimation, we highlight a set of previously unappreciated issues, arising from the mutual interdependence of inference about volatility and stochasticity. This interdependence complicates and enriches the interpretation of previous results, such as pathological learning in individuals with anxiety and following amygdala damage.

Imprecise neural computations as a source of adaptive behaviour in volatile environments

Article 09 November 2020

Moving beyond generalization to accurate interpretation of flexible models

Article 26 October 2020

Adaptive learning under expected and unexpected uncertainty

Article 30 May 2019

Introduction

Among the successes of computational neuroscience is a level-spanning account of learning and conditioning, which has grounded biological plasticity mechanisms (specifically, error-driven updating) in terms of a normative analysis of the problem faced by the organism^1,2,3,4,5. These models recast learning as statistical inference: using experience to estimate the amount of some outcome (e.g., food) expected on average following some cue or action. This is an important subproblem of reinforcement learning, which uses such value estimates to guide choice. The statistical framing has motivated an influential program of investigating the brain’s mechanisms for tracking uncertainty about its beliefs, and how these impact learning^6,7,8,9,10.

Uncertainty, in turn, depends upon the noise properties of the values being learned, including both the degree of stochasticity in their measurement (observation noise, the variance of which we call stochasticity) and how quickly or how often they change (process noise, the variance of which is known as volatility). In general, then, a statistically efficient learner must estimate not only just the primary quantity of interest (e.g., the action value) but also parameters describing its noise properties. This perspective has inspired a series of hierarchical Bayesian inference models, which extend inference to either volatility or stochasticity, though typically while treating the other noise parameter as fixed and known to the model.

For volatility, a particularly influential series of theories extends a baseline model known as the Kalman filter to incorporate volatility estimation (but conditional on known stochasticity)^6,11,12. All else equal, when volatility is higher, the organism is more uncertain about the cue’s value (because the true value will on average have fluctuated more following each observation), and so the learning rate (the reliance on each new outcome) should be higher. A series of experiments have reported behavioral and neural signatures of these effects of volatility enhancing learning rate, and also their disruption in relation to psychiatric symptoms^{6,8,10,13,14,15,16,17,18,19,20,21,22,23}. Conversely, stochasticity also affects the learning rate, but in the opposite direction: all else equal, when individual outcomes are more stochastic (larger stochasticity), they are less informative about the cue’s true value and the learning rate, in turn, should be smaller. Here again, experiments confirm that people adjust their learning rates in the predicted direction^24,25, and this behavior has been captured by a model that estimates stochasticity (but treating the process noise parameter, in this case a hazard rate, as known).

Altogether, this work has led to a strong argument that the brain’s mechanisms for tracking uncertainty, and the inference of the noise parameters that govern it, are crucial to healthy and disordered learning²⁶. Although these components seem individually well understood, in this article we argue that important insights are revealed by considering in greater detail the full problem facing the learner: simultaneously estimating both volatility and stochasticity during learning. By introducing a model that performs such joint estimation, and studying its behavior in reinforcement learning tasks, we show that because of the interrelationship between these variables, a full account of any of them interacts with the other in consequential ways.

The key issue is that although the learner’s estimates of them play opposite roles on learning rates, from the learner’s perspective, they both similarly manifest in noisier, less reliably predictable outcomes. The observation that experienced noise can, to a first approximation, be explained by either volatility or stochasticity—and that these effects might be confused, either by experimenters or by learners—has implications. First, previous work apparently showing variation in volatility processing in different groups, such as various psychiatric patients^{14,16,17,19,21,22,27,28} (using a model and tasks that do not vary stochasticity), might instead reflect misidentified abnormalities in processing stochasticity. We suggest that future research should test both the dimensions of learning explicitly. Furthermore, from the perspective of a learner inferring volatility and stochasticity, these factors should compete or trade off against one another to best explain experienced noise. This means that any dysfunction or damage that impairs detection of stochasticity, should lead to a compensatory increase in inferred volatility, and vice versa: a classic pattern known in Bayesian inference as a failure of “explaining away.”

We argue that such compensatory tradeoffs may be apparent both in anxiety disorders and following damage to amygdala. Intolerance of uncertainty is thought to be a critical component of anxiety and a crucial risk factor for developing anxiety disorders^29,30. Although there has been recent interest in operationalizing this idea by connecting it to statistical learning models and tasks^{26,31,32,33,34,35,36,37,38}, we and others have focused on apparent abnormalities in processing volatility^13,20,26. The current model suggests a different interpretation, in which anxiety primarily disrupts inference about stochasticity, but with the additional result that the learner misinterprets noise due to stochasticity as a signal of change, i.e., volatility. We argue that the complementary pattern of explaining away, in which a failure to detect volatility leads to change misattributed to stochasticity, can be appreciated in studies of the amygdala’s role in modulating learning rates. In particular, our model suggests that a specific involvement of amygdala in volatility (and the explaining away pattern) explains effects of amygdala damage better than an involvement in learning rates more generally. These sorts of reciprocal interactions also give rise to a richer and subtler set of possible patterns of dysfunction that may help to understand a wide range of other neurological and psychiatric disorders, such as schizophrenia, in which there has been a tendency to study altered processing of uncertainty narrowly in the context of volatility.

The model also sheds light on experimental phenomena of learning rates in conditioning, and on two classic descriptive theories of conditioning in psychology that have been interpreted as predecessors of the statistical accounts^39,40. We suggest that seemingly contradictory effects of noise in different experiments, associated with these two theories, can be identified with effects on learning rates of inferred stochasticity vs. volatility, respectively. On this view, different effects will dominate in different experiments depending on which parameter the pattern of the noise suggests.

In the remainder of this article, we present i) a probabilistic model for the joint estimation of volatility and stochasticity from experience; and ii) volatility- and stochasticity-lesioned models in which the corresponding module is damaged. These models highlight the mutual interdependence of inference about volatility and stochasticity and show how the interdependence leads the model to predict paradoxical compensatory behaviors if inference about either factor is damaged. We use these lesioned models in a series of simulation experiments to explain aspects of pathological behavior observed in anxiety disorders and following amygdala damage.

Results

Model

We begin with the Kalman filter, which describes statistically optimal learning from data produced according to a specific form of noisy generation process. The model assumes that the agent must draw inferences (e.g., about true reward rates) from observations (individual reward amounts) that are corrupted by two distinct sources of noise: process noise or volatility and outcome noise or stochasticity (Fig. 1a, b). Volatility captures the speed by which the true value being estimated changes from trial to trial (modeled as Gaussian diffusion); stochasticity describes additional measurement noise in the observation of each outcome around its true value (modeled as Gaussian noise on each trial).

**Fig. 1: Statistical difference between volatility and stochasticity.**

For this data generating process, if the true values of volatility and stochasticity, ${v}_{t}$ and ${s}_{t}$ are known, then optimal inference about the underlying reward rate is tractable using a specific application of Bayes rule, here called the Kalman filter⁴¹. The Kalman filter represents its beliefs about the reward rate at each step as a Gaussian distribution with a mean, ${m}_{t}$, and variance (i.e., uncertainty about the true value), ${w}_{t}$. The update, on every trial, is driven by a prediction error signal, ${\delta }_{t}$, and learning rate, ${\alpha }_{t}$. This leads to simple update rules following observation of outcome ${o}_{t}$:

$${\delta }_{t}={o}_{t}-{m}_{t}$$

(1)

$${\alpha }_{t}=\frac{{w}_{t}+{v}_{t}}{{w}_{t}+{v}_{t}+{s}_{t}}$$

(2)

$${m}_{t+1}={m}_{t}+{\alpha }_{t}{\delta }_{t}$$

(3)

$${w}_{t+1}=(1-{\alpha }_{t})({w}_{t}+{v}_{t})$$

(4)

This derivation thus provides a rationale for the error-driven update (Eq. 3) prominent in neuroscience and psychology⁴², and adds to these a principled account of the learning rate, ${\alpha }_{t}$, which on this view should depend (Eq. 2) on the agent’s uncertainty and the noise characteristics of the environment. In particular, Eq. 2 shows that the learning rate is increasing and decreasing, respectively, with volatility and stochasticity. This is because higher volatility increases the chance that the true value will have changed since last observed (increasing the need to rely on the new observation), but higher stochasticity decreases the informativeness of the new observation relative to previous beliefs.

This observation launched a line of research focused on elucidating and testing the prediction that organisms adopt higher learning rates when volatility is higher⁶. But the very premise of these experiments violates the simplifying assumption of the Kalman filter—that volatility is fixed and known to the agent. To handle this situation, new models were developed^11,12 that generalize the Kalman filter to incorporate learning the volatility ${v}_{t}$ as well, arising from Bayesian inference in a hierarchical generative model in which the true ${v}_{t}$ is also changing. In this case, exact inference is no longer tractable, but approximate inference is possible and typically incorporates Eqs. 1–4 as a subprocess.

This line of work on volatility estimation inherited from the Kalman filter the view of stochasticity as fixed and known. Of course, in general, all the same considerations apply to stochasticity as well: it must be learned from experience, may be changing, and its value impacts learning rate. Indeed, estimating both parameters is critical for efficient learning, because they have opposite effects on learning rate: whereas volatility increases the learning rate, stochasticity reduces it (Fig. 1c). Although some algorithms have been explored for learning when both noise parameters are unknown^43,44,45, the main other application of this type of model in neuroscience has relied on a different simplification²⁵, which estimates the stochasticity while treating the hazard rate (the analogue of volatility for changepoint problems) as fixed and known to the model.

Learning these two parameters simultaneously is more difficult because, from the perspective of the agent, larger values of either volatility or stochasticity result in more surprising observations: i.e., larger outcome variance (Fig. 1e). However, there is a subtle and critical difference between the effects of these parameters on generated outcomes, whereas larger volatility increases the autocorrelation between outcomes (i.e., covariation between the outcomes on consecutive trials), stochasticity reduces the autocorrelation (Fig. 1f). This is the key point that makes it possible to dissociate and infer these two terms while only observing outcomes.

We developed a probabilistic model for learning under these circumstances. The data generation process arises from a further hierarchical generalization of these models (specifically the generative model used in our recent work¹²), in which the true value of stochasticity is unknown and changing, as are the true reward rate and volatility (Fig. 1d). The goal of the learner is to estimate the true reward rate from observations, which necessitates inferring volatility and stochasticity as well.

As with models of volatility, exact inference for this generative process is intractable. Furthermore, in our experience this problem is also relatively challenging to handle with variational inference, the family of approximate inference techniques used previously (see Discussion). Thus, we have instead used a different standard approximation approach that has also been popular in psychology and neuroscience, Monte Carlo sampling^3,46,47,48. In particular, we use particle filtering to track ${v}_{t}$ and ${s}_{t}$ based on data^49,50. Our method exploits the fact that given a sample of volatility and stochasticity, inference for the reward rate is tractable and is given by Eqs. 1–4, in which ${s}_{t}$ and ${v}_{t}$ are replaced by their corresponding samples (see Methods; this combination of sequential sampling with exact inference for a subproblem is known as Rao-Blackwellized particle filtering).

Learning under volatility and stochasticity

We now consider the implications of this model for learning under volatility and stochasticity.

A series of studies has used two-level manipulations (high vs. low volatility blockwise) to investigate the prediction that learning rates should increase under high volatility^6,13,38,51. Here volatility has been operationalized by frequent or infrequent reversals (Fig. 2a), rather than the smoother Gaussian diffusion that the volatility-augmented Kalman filter models formally assume. Nevertheless, applied to this type of task, these models detect higher volatility in the frequent-reversal blocks, and increase their learning rates accordingly^6,11,12. The current model (which effectively incorporates the others as a special case) achieves the same blockwise result when stochasticity is held fixed across both blocks (Supplementary Fig. 1).

**Fig. 2: Performance of the model in task with constant but unknown volatility and stochasticity parameters.**

In the preceding line of studies, stochasticity was not manipulated. (Indeed, it was not even independently manipulable because rewards were binary, and the variance of binomial outcomes is determined only by the mean.) However, analogous effects of stochasticity have been seen in another line of studies^7,9,25,52. In these studies, Nassar and colleagues studied learning rates in a task in which subjects had to predict a value, from observations in which the true value was corrupted, blockwise, by different levels of additive Gaussian noise (i.e., stochasticity) and occasionally “jumping” with a constant hazard rate, analogous to volatility. The main feature of these results relevant to the current model is that these studies have shown that participants’ learning rate decreases with increases in the noise level (see also²⁴). This effect cannot be explained by models that only consider volatility, and in fact, those models make opposite predictions because they take increased noise as evidence of volatility increase. The current model, however, produces the same blockwise effect as humans: because it correctly infers the change in stochasticity, its learning rate is lower, on average, for higher levels of noise (Supplementary Fig. 2). Although we do not intend the current model as a detailed account of how people solve this class of tasks (which is based on a somewhat different generative dynamics), the model can also reproduce other more fine-grained aspects of human behavior in this task, particularly increases in learning rate following switches and scaling of learning rate with the magnitude of error (Supplementary Fig. 2).

Note that while considered together, these two lines of studies separately demonstrate the two types of effects on learning rates we stress, neither of these lines of work has manipulated stochasticity alongside volatility (though see also²⁴). Furthermore, learning of the noise hyperparameters in these studies has largely been explicitly modeled only for either parameter conditional on the other being known. We next consider a variant of this type of task, elaborated to include a 2 × 2 factorial manipulation of both the stochasticity alongside volatility (Fig. 2; we also substitute smooth diffusion for reversals). Here, both parameters are constant within the task, but they are unknown to the model. A series of outcomes was generated based on a Markov random walk in which the hidden reward rate is changing according to a random walk and the learner observes outcomes that are noisily generated according to the reward rate.

Figure 2 shows the model’s learning rates and how these follow from its inferences of volatility and stochasticity. As above, the model increases its learning rate in the higher volatility conditions but as expected it also decreases it in the higher stochasticity conditions (Fig. 2a). These effects on learning rate arise, in turn (via Eq. 2) because the model is able to correctly estimate the various combinations of volatility and stochasticity from the data (Fig. 2b, c).

Our model thus suggests a general program of augmenting the standard 2-level volatility manipulation by crossing it with a second manipulation, of stochasticity, and predicts that higher stochasticity should decrease learning rate, separate from volatility effects.

Interactions between volatility and stochasticity

The previous results highlight an important implication of the current model: that inferences about volatility and stochasticity are mutually interdependent. These interrelationships immediately imply a general interpretational issue for experiments that manipulate only one of these noise parameters, and analyze data using a model that attributes all dynamic learning rate effects to one of them. But the details of the interdependence are themselves informative. From the learner’s perspective, a challenging problem (simplified away in many of the previous models) is to distinguish volatility from stochasticity when both are unknown, because both of them increase the noisiness of observations. Disentangling their respective contributions requires trading off two opposing explanations for the pattern of observations, a process known in Bayesian probability theory as explaining away. Thus, models that neglect stochasticity tend to misidentify stochasticity as volatility and inappropriately modulate learning.

Intriguingly, this situation might in principle arise in neurological damage and psychiatric disorders, if they selectively impact inference about volatility or stochasticity. In that case, the model predicts a characteristic pattern of compensation, whereby learning rate modulation is not merely impaired but reversed, reflecting the substitution of volatility for stochasticity or vice versa: a failure of explaining away. Fig. 3 shows this phenomenon in the 2 × 2 design of Fig. 2, with two characteristic lesion models. The key point here is that a lesioned model that does not consider one factor (e.g., stochasticity), inevitably makes systematically incorrect inferences about the other factor too. Importantly, previous models that only consider volatility are analogous to the stochasticity lesion model (Fig. 3b) and, therefore, make systematically erroneous inference about volatility (Fig. 3g) and misadjust learning rate if stochasticity is changing (Fig. 3e). This set of lesioned models provide a rich potential framework for understanding pathological learning in psychiatric and neurologic disorders. Later we show that stochasticity lesion and volatility lesion models explain deficits in learning observed in anxiety and following amygdala damage, respectively. But first, we apply the healthy model to reinterpret some long-standing issues about learning rates in animal conditioning.

**Fig. 3: Behavior of the lesioned model.**

Stochasticity vs. volatility in Pavlovian learning

Learning rates and their dependence upon previous experience have also been extensively studied in Pavlovian conditioning. In this respect, a distinction emerges between two seemingly contradictory lines of theory and experiment, those of Mackintosh³⁹ vs. Pearce and Hall⁴⁰. Both of these theories concern how the history of experiences with some cue drives animals to devote more or less “attention” to it. Attention is envisioned to affect several phenomena including not only just rates of learning about the cue but also other aspects of their processing, such as competition between the multiple stimuli presented in compound. Here, to most clearly examine the relationship with the research and models discussed above, we focus specifically on learning rates for a single cue.

The two lines of models start with opposing core intuitions. Mackintosh³⁹ argues that animals should pay more attention to (e.g., learn faster about) cues that have in the past been more reliable predictors of outcomes. Pearce and Hall⁴⁰ argue for the opposite: faster learning about cues that have previously been accompanied by surprising outcomes, i.e., those that have been less reliably predictive.

Indeed, different experiments—as discussed below—support either view. For our purposes, we can view these experiments as involving two phases: a pretraining phase that manipulates stochasticity or surprise, followed by a retraining phase to test how this affects the speed of subsequent (re)learning. In terms of our model, we can interpret the pretraining phase as establishing inferences about stochasticity and volatility, which then (depending on their balance) govern learning rate during retraining. On this view, noisier pretraining might, depending on the pattern of noise, lead to either higher volatility and higher learning rates (consistent with Pearce-Hall) or higher stochasticity and lower learning rates (consistent with Mackintosh).

First consider volatility. It has been argued that the Pearce-Hall⁴⁰ logic is formalized by volatility-learning models^2,3,12. In these models, surprising outcomes during pretraining increase inferred volatility and thus speed subsequent relearning. Hall and Pearce^40,53 pretrained rats with a tone stimulus predicting a moderate shock. In the retraining phase, the intensity of the shock was increased. Critically, one group of rats experienced a few surprising “omission” trials at the end of the pretraining phase, in which the tone stimulus was presented with no shock. The speed of learning was substantially increased following the omission trials compared with a control group that experienced no omission in pretraining. Figure 4a–d shows a simulation of this experiment from the current model, showing that the omission trials lead to increased volatility and faster learning. Note that the history-dependence of learning rates in this type of experiment also rejects simpler models like the Kalman filter, in which volatility (and stochasticity) are taken as fixed; for the Kalman filter, learning rate depends only on the number of pretraining trials but not the particular pattern of observed outcomes. The response probability of the model thus shows the same pattern as response rate for rats (Supplementary Fig. 4).

**Fig. 4: The model explains puzzling issues in Pavlovian learning.**

Next, consider stochasticity. Perhaps the best example of Mackintosh’s³⁹ principle in terms of learning rates for a single cue is the “partial reinforcement extinction effect”^54,55,56. Here, for pretraining, a cue is reinforced either on every trial or instead (“partial reinforcement”) on only a fraction of trials (Fig. 4e). The number of times that the learner encounters the stimulus is the same for both conditions, but the outcome is noisier for the partially reinforced stimulus. The retraining phase consists of extinction (i.e., fully unreinforced presentations of the cue), which occurs faster for fully reinforced cues even though they had been paired with more reinforcers initially. Our model explains this finding (Fig. 4f), because it infers larger stochasticity in the partially reinforced condition, leading to slower learning (Fig. 4g, h). Notably, this type of finding cannot be explained by models, which learn only about volatility^6,11,12. In general, this class of models mistake partial reinforcement for increased volatility (rather than increased stochasticity), and incorrectly predict faster learning.

Note the subtle difference between the two experiments of Fig. 4. The surprising omission experiment involves stable pretraining prior to omission, then an abrupt shift, whereas pretraining in the partial reinforcement experiment is stochastic, but uniformly so. Accordingly, though both pretraining phases involve increased noise (relative to their controls) the model interprets the pattern of this noise as more likely reflecting either volatility or stochasticity, respectively, with opposite effects on learning rate. Overall, then, these experiments support the current model’s suggestion that organisms learn about stochasticity in addition to volatility. Conversely, the models help to clarify and reconcile the seemingly opposing theory and experiments of Mackintosh and Pearce-Hall, at least with respect to learning rates for individual cues. Indeed, although previous work has noted the relationship between Pearce-Hall surprise, uncertainty, and learning rates^{2,3,6,12,20,57}, the current modeling significantly clarifies this mapping by identifying it more specifically with volatility, as contrasted against simultaneous inference about stochasticity. Meanwhile, while our basic statistical interpretation of the partial reversal extinction effect has been noted before (e.g., by Gallistel and Gibbon⁵⁸), to our knowledge these previous explanations have not reconciled it with the volatility/Pearce-Hall phenomena. Instead, previous work attempting to map Mackintosh’s³⁹ principle onto statistical models (and distinguish it from Pearce-Hall-like effects) has focused on attention and uncertainty affecting cue combination rather than learning rates², which is a complementary but separate idea.

Anxiety and inference about stochasticity vs. volatility

Lesioned models like the ones in Fig. 3 are potentially useful for understanding learning deficits in psychiatric disorders, for example anxiety disorders, which have recently been studied in the context of volatility and its effects on learning rate^13,20. These studies have shown that people with anxiety are less sensitive to volatility manipulations in probabilistic learning tasks similar to Fig. 2. Besides learning rates, an analogous insensitivity to volatility has been observed in pupil dilation¹³ and in neural activity in the dorsal anterior cingulate cortex, a region that covaried with learning rate in controls²⁰.

These results have been interpreted in relation to the more general idea that intolerance of uncertainty is a key foundation of anxiety; accordingly, fully understanding them requires taking account of multiple sources of uncertainty²⁶, including both the volatility and stochasticity. Nevertheless, the primary interpretation of these types of results has been that observed abnormalities are rooted in volatility estimation per se^13,20,26. Our current model suggests an alternative explanation: that the core underlying deficit is actually with stochasticity, and apparent disturbances in volatility processing are secondary to this, due to their interrelationship.

In particular, these effects and a number of others are well explained by the stochasticity lesion model of Fig. 3b, i.e., by assuming that people with anxiety have a core deficit in estimating stochasticity, and instead treat it as small and constant. As shown in Fig. 5b, this model shows insensitivity to volatility manipulation, but in the model that is actually because volatility is misestimated nearer ceiling due to underestimation of stochasticity. This, in turn, substantially dampens further adaptation of learning rate in blocks when volatility actually increases. The elevated learning rate across all blocks leads to hypersensitivity to noise, which prevents individuals with anxiety from benefitting from stability, as has been observed empirically²⁰. In particular, Piray et al.²⁰ have studied learning in individuals with low- or high- in trait social anxiety using a switching probabilistic task (Supplementary Fig. 6) in which each trial started with a social threatening cue (angry face image). It was found that individuals with high trait anxiety perform particularly worse than controls in stable trials, whereas their performance is generally matched with controls in volatile trials²⁰ (Fig. 5e). The model shows similar behavior (Fig. 5f).

**Fig. 5: The stochasticity lesion model shows a pattern of learning deficits associated with anxiety.**

One key prediction of our model, which differs from a volatility-specific account, is that the learning rate is generally higher in people with anxiety regardless of volatility manipulation or even in tasks that do not manipulate volatility. In fact, Browning et al.¹³ do not find evidence to support this prediction, they do not find a significant overall effect of anxiety on learning rate. Of course, it is important not to interpret null results as evidence in favor of the null hypothesis, since a failure to reject the null hypothesis may reflect insufficient power to detect a true effect. Indeed, in Browning’s¹³ data, while the effect of anxiety on learning rate was not significant overall or in either condition, the point estimate was largest (r(28) = 0.26, p = 0.16) in the stable condition, which is also the block that the model predicts the effect should be statistically strongest (because baseline learning rates, absent any effect of anxiety, are lower).

Importantly, other, larger studies provide positive statistical support for the prediction of elevated learning rate with anxiety^31,33,34. Note that in delta-rule models, behavior under higher learning rates is closer to win-stay/lose-shift (since higher learning rates weight the most recent outcome more heavily, with full win-stay/lose-shift—dependence only on the most recent outcome—equivalent to a learning rate of 1). Such a strategy has itself been linked to anxiety^33,34. A notable observation was made in a large (n = 122) study by Huang et al.³⁴, who found people with anxiety show higher win-stay/lose-shift and this effect is driven by higher lose-shift. Figure 5g, h shows results of simulating the proposed model in a task similar to Huang et al.³⁴ (Supplementary Fig. 6). The model shows the same pattern of behavior, with the additional modulation by win vs. loss captured because any loss is seen as an evidence for volatility and that results in higher learning rate and a contingency switch. The effect is much less salient for win trials because prediction errors are relatively small in those trials, which substantially dampen any effect of learning rate. Across all trials, the stochasticity lesion model shows higher learning rate, similar to what Huang et al.³⁴ found by fitting reinforcement learning models to choice data.

Finally, the lesion model is an extreme case in which a hypothetical stochasticity module is completely eliminated. But this general approach can be extended to less extreme cases in which one module of the model (e.g., stochasticity) has some relative disadvantage in explaining noise. In terms of our model, this can be achieved by having higher update rate parameters for volatility relative to that of stochasticity. These are two main parameters of the model that one can use to explain individual differences across people. For example, the ratio of volatility to stochasticity update rate can be used to capture continuous individual variation in trait anxiety. In this case, the stochasticity lesion model of Fig. 3b is an extreme case of this approach in which the stochasticity update rate is zero (thus the ratio of volatility to stochasticity is infinitely large). We have exploited this approach to simulate a result from Browning et al.¹³ concerning graded individual differences in anxiety’s effect on learning rate adjustment. In particular, they report (and the model captures; Fig. 6) negative correlation between relative learning rate (volatile minus stable) and trait anxiety in the probabilistic switching task with stable and volatile blocks.

**Fig. 6: The model explains effects of trait anxiety as a continuous index on learning.**

Amygdala damage and inference about volatility vs. stochasticity

The opposite pattern of compensatory effects on inference is evidently visible in the effects of amygdala damage on learning. The amygdala plays an important role in associative learning^59,60. Although some researchers have emphasized a role of the amygdala as a site of association between conditioned- and unconditioned-stimulus in conditioning per se, other authors (drawing on evidence from human neuroimaging work, single-cell recordings, and lesion studies) have proposed that the amygdala is involved in a circuit for controlling or adjusting learning rates^{57,60,61,62,63,64}. Most informative from the perspective of our model are lesion studies in rats^61,65,66,which we interpret as supportive an involvement specifically in processing of volatility, rather than learning rates or uncertainty more generally. These experiments examine a surprise-induced upshift in learning rate similar to the Pearce-Hall experiment from Fig. 4. Lesions to the central nucleus of the amygdala attenuate this effect, suggesting a role in volatility processing. But an important detail of these results with respect to our model’s predictions is that the effect is not merely attenuated but reversed. This reciprocal effect supports perhaps the most central feature and prediction of our model that volatility trades off against a (presumably anatomically separate) system for stochasticity estimation.

Figure 7 shows their serial prediction task and results in more detail. Rats performed a prediction task in two phases. A group of rats in the “consistent” condition performed the same prediction task in both phases. The “shift” group, in contrast, experienced a sudden change in the contingency in the second phase. Whereas the control rats showed elevation of learning rate in the shift condition manifested by elevation of food seeking behavior in the very first trial of the test, the amygdala lesioned rats showed the opposite pattern. Lesioned rats showed significantly smaller learning rate in the shift condition compared with the consistent one, a reversal of the surprise-induced upshift.

**Fig. 7: The model displays the behavior of amygdala lesioned rats in associative learning.**

We simulated the model in this experiment. To model a hypothetical effect of amygdala lesion on volatility inference, we assumed that lesioned rats treat volatility as small and constant. As shown in Fig. 7, the model shows an elevated learning rate in the shift condition for the control rats, which is again due to increases in inferred volatility after the contingency shift. For the lesioned model, however, surprise is misattributed to the stochasticity term as an increase in inferred volatility cannot explain away surprising observations (because it was held fixed). Therefore, the contingency shift inevitably increases stochasticity and thereby decreases the learning rate. Notably, the compensatory reversal in this experiment cannot be explained using models that do not consider both the volatility and stochasticity terms.

A similar pattern of effects of amygdala lesions, consistent with our theory, is seen in an experiment on nonhuman primates. In a recent report by Costa et al.⁶³, it has been found that amygdala lesions in monkeys disrupt reversal learning with deterministic contingencies, moreso than a reversal task with stochastic contingencies. This is striking since deterministic reversal learning tasks are much easier. Similar to the previous experiment, our model explains this finding because large surprises caused by the contingency reversal are misattributed to the stochasticity in lesioned animals (because volatility was held fixed), while control animals correctly attribute them to the volatility term (Fig. 8; see Supplementary Fig. 7 for performance of the model and Supplementary Fig. 8 for simulation of the model in all probabilistic schedules tested by Costa et al.⁶³). This effect is particularly large in the deterministic case because the environment is very predictable before the reversal and therefore the reversal causes larger surprises than those for the stochastic one. Similar findings have been found in a study in human subjects with focal bilateral amygdala lesions⁶⁷, in which patients tend to show more deficits in deterministic reversal learning than stochastic one. Again, these experimental findings are not explained by a Kalman filter or models that only consider the volatility term.

**Fig. 8: The model displays the behavior of amygdala lesioned monkeys in probabilistic reversal learning.**

Overall, then, these experiments support the current model’s picture of dueling influences of stochasticity and volatility. Furthermore, the current model helps to clarify the precise role of amygdala in this type of learning, relating it specifically to volatility-mediated adjustments.

Discussion

A central question in decision neuroscience is how the brain learns from the consequences of choices given that these can be highly noisy. To do so effectively requires simultaneously learning about the characteristics of the noise, as has been emphasized most strongly in a prominent line of work on how the brain tracks the volatility of the environment. Here we revisit this problem for the more realistic case when both volatility and a second noise parameter, stochasticity, must be simultaneously estimated.

While various experiments have, mostly separately, shown that humans can adjust learning rates in response to manipulations of either type of noise, models of how they do so have focused primarily on estimating either parameter while taking the other as known. This skirts the more difficult problem of distinguishing types of noise. To solve this problem and investigating its consequences for learning, we built a probabilistic model for learning in uncertain environments that tracks volatility and stochasticity simultaneously. Using this model to simulate a number of experiments across conditioning, psychiatry and lesion studies, we show a consistent theme whereby the interdependence of inference about these two noise parameters gives rise to patterns of effects that could not be appreciated in previous models that considered estimating either type of noise separately.

The importance of dissociating these forms of noise, and some aspects of their interaction, have been noted previously. For instance, Pulcu and Browning²⁶ emphasize the inadequacy of existing experiments for dissociating volatility vs. stochasticity learning, and raise the possibility that in principle, people might confuse them. In Nassar et al.’s study²⁵, the volatility-like hazard rate parameter (though viewed from the model’s perspective as fixed and known) is fit as a per-subject free parameter construed as an individual difference. The empirical and model-fitting results showcase a dependence of the (inferred) stochasticity parameter upon the (fit/known) hazard rate, consistent with the bidirectional pattern of interdependence we posit. Building on all these ideas, we build and simulate a model to showcase the potential interdependence between these two types of inference across a range of situations. One important caveat, given the range of applications we consider, is that we abstract away details of the many individual studies to emphasize their parallelism with respect to our main point of interest. Thus, for instance, we neglect valence-dependent modulation of learning which is likely an additional dimension important both in anxiety^20,26,31,38 and in studies of amygdala⁶³. Relatedly, as our goal is to showcase the range of situations in which parallel issues may arise, we acknowledge that different explanations may exist for many individual results.

Our work builds most directly on a rich line of theoretical and experimental work on the relationship between the volatility and learning rates^{6,8,13,15,27,68,69}. There have been numerous reports of volatility effects on healthy and disordered behavioral and neural responses, often using a two-level manipulation of volatility like that from Fig. 5a^{6,8,10,13,14,15,16,17,18,19,20,21,22,23,38}. Our modeling suggests that it will be informative to drill deeper into these effects by augmenting this task to cross this manipulation with stochasticity so as more clearly to differentiate these two potential contributors²⁴. For example, in tasks that manipulate (and models that consider) only volatility, it can be seen from Eqs. (1–4) that the timeseries of several quantities all covary together, including the estimated volatility ${v}_{t}$, the posterior uncertainty ${w}_{t}$, and the learning rate ${\alpha }_{t}$. It can therefore be difficult in general to distinguish which of these variables is really driving tantalizing neural correlates related to these processes, for instance in amygdala and dorsal anterior cingulate cortex^6,57. The inclusion of stochasticity (which increases uncertainty but decreases learning rate) would help to drive these apart.

Indeed, another related set of learning tasks considered prediction of continuous outcomes corrupted by stochasticity, i.e., additive Gaussian noise^7,9,25,45,52, which could provide another foundation for factorial manipulations of the sort we propose. Indeed, a number of these studies (complementary to the volatility studies) included multiple levels of stochasticity and showed learning rate effects^7,9,25,52,70. The models used in these studies have largely used a complementary simplification to the volatility one: they estimate stochasticity, but conditional on a known value for the hazard rate (equivalent to volatility). Interestingly, rather than overall adjustment to noise statistics, these studies more explicitly emphasized the detection of discrete changes in the environment and the resulting local adjustments of the learning rate. From a modeling perspective, inference under change at discrete changepoints (occurring at some hazard rate) raises issues quite analogous to change due to more gradual diffusion (with some volatility). Thus, in practice it has been common and effective to apply models for one sort of change to tasks actually involving the other^6,11,12, a substitution also in part licensed by approximate models of changepoint detection that (as with the volatility models and the Kalman filter for continuous change) also reduce learning to error-driven updates with a time-varying learning rate^25,71. Thus, although we build the current work on a generative model with continuous rather than abrupt changepoints, we do not mean this as a substantive claim, as we expect our main substantive points (concerning the inference about noise vs. change hyperparameters) would play out analogously in other variants. In any case, research into the neural substrates of changepoint detection is highly relevant to the change problem conceived in terms of volatility as well (see¹⁰ for a recent review).

The current framework’s tendency to elide the distinction between discrete and continuous change (but distinguish both from stochasticity) is also the basis of an important, but subtle, distinction from another prominent dichotomy previously proposed that between “expected” and “unexpected” types of uncertainty^72,73. While it might appear that these categories correspond, respectively, to stochasticity and volatility as we define them, that is not actually the case. Formally, this is because the Dayan and Yu model (in its most detailed form, Dayan and Yu⁷²) arises from a Kalman filter augmented with additional discrete changepoints: i.e., both the diffusion and jumps. The focus of that work was distinguishing the special effects of surprising jumps (“unexpected uncertainty”), which were hypothesized to recruit a specialized neural interrupt system. Meanwhile, all other uncertainty arising in the baseline Kalman filter (i.e., that from both stochasticity and volatility; the posterior variance ${w}_{t}$ in Eq. 4) is lumped together under “expected uncertainty.” That said (although we see this as a misreading of the earlier work) our impression is that later authors’ use of these terms actually tends to comport more with our distinction than the original definition²⁶, i.e., to take unexpected and expected uncertainty as synonymous with volatility and stochasticity as we define them. In any case, Yu and Dayan did not consider the problem considered here, of estimating the noise hyperparameters for learning under uncertainty.

The most important feature of our model is the competition it induces between the volatility and stochasticity to “explain away” surprising observations. This leads to a predicted signature of the model in cases of lesion or damage affecting inference about either type of noise: disruption causing neglect of one of the terms leads to overestimation of the other term. For example, if a module responsible for volatility learning were disrupted, the model would overestimate stochasticity, because surprising observations that are due to volatility would be misattributed to the stochasticity. This allowed us to revisit the role of amygdala in associative learning and explain some puzzling findings about its contributions to reversal learning.

Similar explanations grounded in the current model may also be relevant to a number of psychiatric disorders. Abnormalities in uncertainty and inference, broadly, have been hypothesized to play a role in numerous disorders, including especially anxiety and schizophrenia. More specifically, abnormalities in volatility-related learning adjustments have been reported in patients or people reporting symptoms of several mental illnesses^{13,14,16,17,18,19,20,21,22,23,38}. The current model provides a more detailed potential framework for better dissecting these effects, though this will ideally require a new generation of experiments manipulating both factors.

In the present work, we have developed these ideas mostly in terms of pathological decision making in anxiety, which is one of the areas where earlier work on volatility estimation has been strongest and where further refinement using our theory seems most promising^32,35,37,74. We considered an account by which individuals with anxiety systematically misidentify outcomes occurring due to chance (stochasticity) as instead a signal of change (volatility)³⁴. This account offers a contrary interpretation of a pattern of effects that had been taken to indicate that volatility sensitivity is instead deficient in anxiety²⁶. Although some null effects in the study of Browning et al.¹³ do not support this account, we view it as an overall better account of the pattern of data across several studies^20,31,33,34. Our account is also broadly consistent with studies suggesting that individuals with anxiety might feel overwhelmed when faced with uncertainty³⁶ and fail to make use of long-term statistical regularities³⁴. These are also hypothesized to be related to symptoms of excessive worry^29,30. Misestimating stochasticity—moreso than volatility—also seems consonant with the idea that individuals with anxiety tend to fail to discount negative outcomes occurring by chance (i.e., stochasticity) and instead favor alternative explanations like self-blame⁷⁵. This hypothesis is also consistent with the observation that acquisition of fear conditioning tends to be enhanced in individuals with anxiety^76,77. Finally, although a simple increase in learning rate seems harder to reconcile with generally slower extinction of Pavlovian fear learning in anxiety⁷⁶, this probably reflects the well-known fact that extinction is not simply unlearning of the original associations, but instead is dominated by additional processes^78,79. This includes in particular statistical inference about latent contexts⁵, which is likely to be affected by both stochasticity and volatility in ways that should be explored in future work.

More generally, this modeling approach, which quantifies misattribution of stochasticity to volatility and vice versa, might be useful for understanding various other brain disorders that are thought to influence processing of uncertainty and have largely been studied in the context of volatility in the past decade^{14,16,17,19,21,22,27,28}. As another example, positive symptoms in schizophrenia have been argued to result from some alterations in prior vs likelihood processing, perhaps driven by abnormal attribution of uncertainty (or precision) to top-down expectations⁸⁰. But different such symptoms (e.g., hallucinations vs. delusions) manifest in different patients. One reason may be that these relate to disruption at different levels of a perceptual-inferential hierarchy, i.e., with hallucination vs. delusion reflecting involvement of more or less abstract inferential levels, respectively^81,82,83. In this respect, the current model may provide a simple and direct comparative test, since stochasticity enters at the perceptual, or outcome, level (potentially associated with hallucination) but volatility acts at the more abstract level of the latent reward (and may be associated with delusion; see Fig. 1).

Our work also touches upon a historical debate in the associative learning literature about the role of outcome stochasticity (i.e., in our terms, noise) in learning. One class of theories, most prominently represented by Mackintosh³⁹, proposes that attention is preferentially allocated to cues that are most reliably predictive of outcomes, whereas Pearce and Hall⁶² suggest the opposite that attention is attracted to surprising misprediction. We address only a subset of the experimental phenomena involved in this debate (those involving learning rates for cues presented alone), but for this subset we offer a very clear resolution of the apparent conflict. Our approach and goals also differ from classic work in this area. A number of important models of attention in psychology also attempt to reconcile these theories by providing more phenomenological models that hybridize the two theories to account for various and often paradoxical experimental work^84,85,86,87. Our goal is different and is more descended from a tradition of normative theories that provide a computational understanding of psychological phenomena from first principles by first addressing what is the computational problem that the corresponding neural system is evolved to solve^2,88.

Any probabilistic model relies on a set of explicit assumptions about how observations have been generated, i.e., a generative model, and also an inference procedure to estimate the hidden parameters that are not directly observable. Such inference algorithms typically reflect some approximation strategy because exact inference is not possible for most important problems, including our generative model (Fig. 1). In previous work in this area, we and others have relied on variational approaches to approximate inference, which factors difficult inference problems into smaller tractable ones, and approximates the answer as though they were independent^11,12. Interestingly, although one of the most promising successes of this approach in neuroscience has been in hierarchical Kalman filters with volatility inference, we found it difficult to develop an effective variational filter for the current problem, when stochasticity is unknown. The core problem, in our hands, was that effective explaining away between the two noise types was difficult to achieve using simplified variational posteriors that omitted aspects of their mutual dependency.

Interestingly, there are other algorithms that, in principle, address similar learning problems. These include using an explicitly variational approach extending the HGF (code is publicly available as hgf_jget in the TAPAS toolbox⁸⁹, but has not been documented or tested in published articles), augmenting the variational HGF with mixture models⁴³, an analogous simplified learning rule based more on neural considerations⁴⁴, and an exact model for tracking hazard rates under a particular case of changepoint detection⁴⁵. While these have not yet been applied to the full range of problems we investigate here, we suspect that future work investigating the approximate approaches will find challenges in explaining away. In any case, in the current modeling, we have adopted a different estimation method based on Monte Carlo sampling, in particular a variant of particle filtering that preserves many of the advantages of variational methods by incorporating exact conditional inference for a subset of variables⁴⁹. The inference model employed here combines Kalman filtering for estimation of reward rate⁴¹ conditional on the volatility and stochasticity, with particle filtering for inference about these⁵⁰. One drawback of the particle filter, however, is that it requires tracking a number of samples on every trial. In practice, we found that a handful (e.g., 100) of particles results in a sufficiently good approximation.

Finally, in this study, we only modeled the effects of volatility and stochasticity on learning rate. However, uncertainty affects many different problems beyond learning rate, and a full account of how subjects infer volatility and stochasticity (and how these, in turn, affect uncertainty) may have ramifications for many other behaviors. Thus, there have been important statistical accounts of a number of such problems, but most of them have neglected either stochasticity or volatility, and none of them have explicitly considered the effects of learning the levels of these types of noise. These problems include cue- or feature-selective attention²; the explore-exploit dilemma^90,91; and the partition of experience into latent states, causes or contexts^5,79,92. The current model, or variants of it, is more or less directly applicable to all these problems and should imply predictions about the effects of manipulating either type of noise across many different behaviors.

Methods

Description of the model

Recall that outcome on trial $t$, ${o}_{t}$, in our model depends on three latent variables, the reward rate, stochasticity and volatility. The reward rate on trial $t$, ${x}_{t}$, has Markov-structure dynamics:

$${x}_{t}={x}_{t-1}+{e}_{t},$$

(5)

where ${e}_{t}$ is a (zero-mean) Gaussian noise with variance given by volatility. Therefore, we have:

$$p\left({x}_{t}|{x}_{t-1},{v}_{t}\right)=N\left({x}_{t}|{x}_{t-1},{v}_{t}\right),$$

(6)

where ${v}_{t}$ is volatility. We define the inverse volatility, ${z}_{t}={v}_{t}^{-1}$, which is the preferred formulation here as it has been used in previous studies for its analytical plausibility¹². Outcomes were generated based on the reward rate and stochasticity according to a Gaussian distribution:

$$p\left({o}_{t}|{x}_{t},{s}_{t}\right)=N\left({o}_{t}|{x}_{t},{s}_{t}\right),$$

(7)

where ${s}_{t}$ is the stochasticity with ${y}_{t}={s}_{t}^{-1}$.

For volatility and stochasticity, we assumed a multiplicative noise on their inverse, which is an approach that has been shown to give rise to analytical inference when considered in isolation (but not here)^93,94. Specifically, the dynamics over these variables are given by ${z}_{t}={{\eta }_{v}}^{-1}{z}_{t-1}{\epsilon }_{t},$ where ${0 {\,} < {\,}\eta }_{v} {\,} < {\,}1$ is a constant and ${\epsilon }_{t}$ is a random variable in the unit range with a Beta-distribution $p\left({{\epsilon }_{t}}\right)={\rm {B}}\left({{\epsilon }_{t}},|,0.5{{\eta }_{v}}{(1-{{\eta }_{v}})}^{-1},0.5\right).$ Note that the conditional expectation of ${z}_{t}$ is given by ${z}_{t-1}$, because $E\left[{\epsilon }_{t}\right]={\eta }_{v}$. We assume a similar and independent dynamics for ${y}_{t}$ parametrized by the constant ${\eta }_{s}$: ${y}_{t}={{{\eta }_{s}}^{-1}y}_{t-1}{\varepsilon }_{t}$, in which ${\varepsilon }_{t}$ has a similar distribution to ${\epsilon }_{t}$ parametrized by ${\eta }_{s}$.

In our implementation, we parametrized the model with ${\lambda }_{v}=1-{\eta }_{v}$ and ${\lambda }_{s}=1-{\eta }_{s}$, respectively. This is because these parameters can be interpreted as the update rate for volatility and stochasticity, respectively. In other words, larger values of ${\lambda }_{v}$ and ${\lambda }_{s}$ result in faster update of volatility and stochasticity, respectively. Intuitively, this is because a smaller ${\lambda }_{v}$ increases the mean of ${\epsilon }_{t}$ and results in a larger update of ${z}_{t}$. Since volatility is the inverse of ${z}_{t}$, therefore, smaller ${\lambda }_{v}$ results in slower update of volatility. This has been formally shown in our recent work¹². In addition to these two parameters, this generative process depends on initial value of volatility and stochasticity, ${v}_{0}$ and ${s}_{0}$.

For inference, we employed a Rao-Blackwellised Particle Filtering approach⁴⁹, in which the inference about ${v}_{t}$ and ${s}_{t}$ were made by a particle filter⁵⁰ and, conditional on these, the inference over ${x}_{t}$ was given by the Kalman filter (i.e., Equations (1–4)). The particle filter is a Monte Carlo sequential importance sampling method, which keeps track of a set of particles (i.e., samples). The algorithm performs three steps on each trial. First, in a prediction step, each particle is transitioned to the next step based on the generative process. Second, weights of each particle are updated based on the probability of observed outcome:

$${b}_{t}^{l}\propto N({o}_{t}{{{\rm{|}}}}{m}_{t-1}^{l},{w}_{t-1}^{l}+{v}_{t}^{l}+{s}_{t}^{l}),$$

(8)

where ${b}_{t}^{l}$ is the weight of particle $l$ on trial $t$, ${m}_{t-1}^{l}$ and ${w}_{t-1}^{l}$ are estimated mean and variance by the Kalman filter on the previous trial (Eqs. 1–4), and ${v}_{t}^{l}$ and ${s}_{t}^{l}$ are volatility and stochasticity samples (i.e., the inverse of ${z}_{t}^{l}$ and ${y}_{t}^{l}$). In this step, particles were also resampled using the systematic resampling procedure if the ratio of effective to total particles falls below 0.5. In the third step, the Kalman filter was used to update the mean and variance. In particular, for every particle, Eqs. 1–4 were used to define ${\alpha }_{t}^{l}$ and update ${m}_{t}^{l}$ and ${w}_{t}^{l}$ for every particle. Learning rate and estimated reward rate on trial $t$ was then defined as the weighted average of all particles, in which the weights were given by ${b}_{t}^{l}$. We have used particle filter routines implemented in MATLAB.

Finally, we should note that our results are not dependent on the specific generative process that we have assumed here. In particular, it is possible to define a generative process that diffuses according to Gaussian noise. In such a generative model, random variables related to volatility and stochasticity diffuse according to independent Gaussian random walks:

$$p\left({z}_{t}|{z}_{t-1}\right)=N\left({z}_{t}|{z}_{t-1},{\sigma }_{v}^{2}\right),$$

(9)

$$p\left({y}_{t}|{y}_{t-1}\right)=N\left({y}_{t}|{y}_{t-1},{\sigma }_{s}^{2}\right),$$

(10)

where volatility and stochasticity are respectively defined as ${{v}_{t}}={\exp} \left({{z}_{t}}\right)$ and ${{s}_{t}}={\exp}({{y}_{t}})$, and ${{\sigma }_{v}}$ and ${{\sigma}_{s}}$ are model parameters that play analogous role as ${\lambda }_{v}$ and ${\lambda }_{s}$ above, respectively. The reward rate and outcomes are then generated based on the same process as the previous generative model (Eqs. 5–7). Inference about reward rate also remains the same (Eqs. 1–4). Our simulations show that, as long as the particle filter was used for inference about volatility and stochasticity, such a process can successfully recover true unknown volatility and stochasticity (Supplementary Fig. 3).

Simulation details

In simulations related to Figs. 1–3, timeseries were generated according to the Markov random walk with constant volatility and stochasticity. For these simulations, we assumed ${\lambda }_{v}=0.1$ and ${\lambda }_{s}=0.1$; ${v}_{0}=1$ (average over small and large true volatility) and ${s}_{0}=2$ (average over small and large true stochasticity). For lesioned models in Fig. 3, the corresponding lesion variable was assumed to be fixed at its initial value throughout the task.

For the conditioned suppression experiment presented in Fig. 4, the weak and strong shock was 0.3 and 1, respectively, plus a small noise with variance of 10^-2. The noise for the partial reinforcement experiment was assumed to be 10^-4. 100 trials were used for training. We assumed 5 omission trials for the omission condition of conditioned suppression experiment. Model parameters in Fig. 4 were ${\lambda }_{v}={\lambda }_{s}=0.2$, and ${v}_{0}={s}_{0}=0.1$. For corresponding Supplementary Figs. 4–5, the response probability of the model was calculated based on a softmax with a decision noise parameter of 5.

Reward rate in Fig. 5a was 0.8 in the stable block and switching between 0.25 and 0.75 in the volatile block with the outcome variance of 0.01. For simulations presented in Fig. 5a-d, model parameters were similar to those used for simulations in Fig. 4 and the stochasticity for the lesioned models was assumed to be 0.001 (note that outcomes were not binary in this simulation). Volatile condition in Fig. 5e, f was defined as trials with no contingency switch in their preceding 10 trials, similar to Piray et al.²⁰. For the simulation presented in Fig. 5h, we followed Huang et al.³⁴ who fitted a number of reinforcement learning models to choice data, in which they simplified the task to its core features that are directly related to reinforcement learning. Furthermore, we made a further simplification here by considering only two choices. Probability of reward in tasks by Piray et al. and Huang et al. presented in Fig. 5 are plotted in Supplementary Fig. 6. Outcome variance was assumed to be 0.01. Model parameters were similar to those used for simulations in Fig. 4. The stochasticity for the lesioned models in these two simulations were assumed to be 0.05. For simulating choice, we used the softmax with a decision noise of 3.

In Fig. 6, the task was a probabilistic switching task with stable and volatile blocks similar to the task of Browning et al.¹³. Outcome variance was assumed to be 0.01. The median correlation presented in the inset of Fig. 6b is the Spearman rank correlation across 1000 sets of simulations. For each set, 30 artificial subjects were generated, which only differed in their volatility- and stochasticity-update rate parameters. To have a relatively uniform model trait anxiety (i.e., volatility to stochasticity update rate), every set was further divided to 3 subsets (each containing 10 artificial subjects), in which the mean of model trait anxiety was 0.5, 1, 3, respectively. We further ensured that the model trait anxiety is greater than 0.26 and smaller than 4. These values were chosen to relatively reflect the distribution of trait anxiety in Browning et al.’s¹³ data (Fig. 6a). Furthermore, the volatility update rate was drawn randomly between 0 and 0.2 and the stochasticity update rate was calculated according to the model trait anxiety. A fixed and small initial volatility and stochasticity was used for all artificial subjects (${v}_{0}={s}_{0}=0.001$).

For simulating the experiment in Fig. 7, reward timeseries was generated with a very small outcome variance, 10^-6. Here, the model was trained to predict both the tone (given the light) and the reward (given the light) on every trial. Model parameters were ${\lambda }_{v}={\lambda }_{s}=0.2$, and ${v}_{0}={s}_{0}=0.5$. Volatility was assumed for the lesioned model to be small and fixed (0.25e-6). Figure 7b shows the average learning rate on the last trial of phase 2 (i.e., the first trial of the test) for the first cue across all simulations. Figure 7c–f shows volatility and stochasticity signals for the first cue. For simulation of the reversal task in Fig. 8, a small outcome variance similar, 10^-6, was used for generating outcomes. Model parameters were the same as those used in Fig. 7. Volatility for the lesioned model was assumed to be small and fixed at 0.01. We have used softmax with a decision noise parameter as the choice model. We assumed that the decision noise parameter is 3 and 1 for the control and lesioned animals, respectively. These parameters were used to reproduce the general reduction of performance in lesioned animals, which is independent of the difference between the deterministic vs stochastic task in the two groups explained by our learning model.

For Supplementary Fig. 2, we followed the design of the prediction task by Nassar et al.²⁵. The timeseries were generated according to a hidden reward rate plus a noise term, in which the variance of noise was either one (small) or nine (large). The reward rate was subject to a random jump in the range of 0–10. Change points occurred after at least five trials plus a random draw according to an exponential distribution with a rate of 0.05 (i.e., mean 20). The initial volatility and stochasticity were both assumed to be five (average over small and large true stochasticity). We further assumed that ${\lambda }_{v}=0.4$ and ${\lambda }_{s}=0.2$, which reflects the instructions given to subjects about possible jumps in the underlying reward rate. Simulation and model parameters in Supplementary Figure 1 were the same as those in Fig. 5a. For all simulations, we have assumed initial reward rate prior to be a Gaussian with mean 0 and variance 100 (Figs. 2–3 and Supplementary Figure 3) or 1 (all other simulations). Simulations were repeated sufficiently to have negligible sampling error (i.e. invisible standard error of the mean). Thus, simulations presented in Figs. 1, 2, 3 and Supplementary Fig. 3 were repeated 10,000 times; conditioning simulations presented in Figs. 4 and 7 were repeated 40000 times; and all other simulations were repeated 1000 times. All simulations were conducted with 100 particles.

Reporting Summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability

Simulation data are publicly available at https://doi.org/10.5281/zenodo.5526668⁹⁵. Data by Piray et al.²⁰ presented in Fig. 5e are publicly available at https://github.com/payampiray/piray_etal_2019_JNeurosci. Data by Browning et al.¹³ plotted in Fig. 6a are publicly available as a Source Data file to the corresponding paper. Source data are provided with this paper.

Code availability

All simulations were conducted using custom code written in MATLAB (2018a). Codes are available at https://doi.org/10.5281/zenodo.5526668⁹⁵.

References

Dayan, P. & Long, T. Statistical Models of Conditioning. In Advances in Neural Information Processing Systems 10 (eds, Jordan, M., Kearns, M. & Solla, S.) 117–123 (MIT Press, 1998).
Dayan, P., Kakade, S. & Montague, P. R. Learning and selective attention. Nat. Neurosci. 3, 1218–1223 (2000).
Article CAS PubMed Google Scholar
Courville, A. C., Daw, N. D. & Touretzky, D. S. Bayesian theories of conditioning in a changing world. Trends Cogn. Sci. (Regul. Ed.) 10, 294–300 (2006).
Article Google Scholar
Daunizeau, J. et al. Observing the observer (I): meta-bayesian models of learning and decision-making. PLoS ONE 5, e15554 (2010).
Article ADS CAS PubMed PubMed Central Google Scholar
Gershman, S. J., Blei, D. M. & Niv, Y. Context, learning, and extinction. Psychol. Rev. 117, 197–209 (2010).
Article PubMed Google Scholar
Behrens, T. E. J., Woolrich, M. W., Walton, M. E. & Rushworth, M. F. S. Learning the value of information in an uncertain world. Nat. Neurosci. 10, 1214–1221 (2007).
Article CAS PubMed Google Scholar
Nassar, M. R. et al. Rational regulation of learning dynamics by pupil-linked arousal systems. Nat. Neurosci. 15, 1040–1046 (2012).
Article CAS PubMed PubMed Central Google Scholar
Iglesias, S. et al. Hierarchical prediction errors in midbrain and basal forebrain during sensory learning. Neuron 80, 519–530 (2013).
Article CAS PubMed Google Scholar
McGuire, J. T., Nassar, M. R., Gold, J. I. & Kable, J. W. Functionally dissociable influences on learning rate in a dynamic environment. Neuron 84, 870–881 (2014).
Article CAS PubMed PubMed Central Google Scholar
Soltani, A. & Izquierdo, A. Adaptive learning under expected and unexpected uncertainty. Nat. Rev. Neurosci. 20, 635–644 (2019).
Article CAS PubMed PubMed Central Google Scholar
Mathys, C., Daunizeau, J., Friston, K. J. & Stephan, K. E. A bayesian foundation for individual learning under uncertainty. Front Hum. Neurosci. 5, 39 (2011).
Article PubMed PubMed Central Google Scholar
Piray, P. & Daw, N. D. A simple model for learning in volatile environments. PLoS Comput. Biol. 16, e1007963 (2020).
Article ADS CAS PubMed PubMed Central Google Scholar
Browning, M., Behrens, T. E., Jocham, G., O’Reilly, J. X. & Bishop, S. J. Anxious individuals have difficulty learning the causal statistics of aversive environments. Nat. Neurosci. 18, 590–596 (2015).
Article CAS PubMed PubMed Central Google Scholar
Brazil, I. A., Mathys, C. D., Popma, A., Hoppenbrouwers, S. S. & Cohn, M. D. Representational uncertainty in the brain during threat conditioning and the link with psychopathic traits. Biol. Psychiatry Cogn. Neurosci. Neuroimaging 2, 689–695 (2017).
PubMed Google Scholar
Farashahi, S. et al. Metaplasticity as a neural substrate for adaptive learning and choice under uncertainty. Neuron 94, 401–414.e6 (2017).
Article CAS PubMed PubMed Central Google Scholar
Lawson, R. P., Mathys, C. & Rees, G. Adults with autism overestimate the volatility of the sensory environment. Nat. Neurosci. 20, 1293–1299 (2017).
Article CAS PubMed PubMed Central Google Scholar
Powers, A. R., Mathys, C. & Corlett, P. R. Pavlovian conditioning-induced hallucinations result from overweighting of perceptual priors. Science 357, 596–600 (2017).
Article ADS CAS PubMed PubMed Central Google Scholar
Katthagen, T. et al. Modeling subjective relevance in schizophrenia and its relation to aberrant salience. PLoS Comput. Biol. 14, e1006319 (2018).
Article PubMed PubMed Central Google Scholar
Paliwal, S. et al. Subjective estimates of uncertainty during gambling and impulsivity after subthalamic deep brain stimulation for Parkinson’s disease. Sci. Rep. 9, 14795 (2019).
Article ADS PubMed PubMed Central Google Scholar
Piray, P., Ly, V., Roelofs, K., Cools, R. & Toni, I. Emotionally aversive cues suppress neural systems underlying optimal learning in socially anxious individuals. J. Neurosci. 39, 1445–1456 (2019).
Article CAS PubMed PubMed Central Google Scholar
Cole, D. M. et al. Atypical processing of uncertainty in individuals at risk for psychosis. NeuroImage: Clin. 26, 102239 (2020).
Article Google Scholar
Deserno, L. et al. Volatility estimates increase choice switching and relate to prefrontal activity in Schizophrenia. Biol. Psychiatry.: Cogn. Neurosci. Neuroimaging 5, 173–183 (2020).
Google Scholar
Diaconescu, A. O., Wellstein, K. V., Kasper, L., Mathys, C. & Stephan, K. E. Hierarchical Bayesian models of social inference for probing persecutory delusional ideation. J. Abnorm. Psychol. 129, 556–569 (2020).
Article PubMed Google Scholar
Lee, S., Gold, J. I. & Kable, J. W. The human as delta-rule learner. - PsycNET. Decision 7, 55–66 (2020).
Article CAS Google Scholar
Nassar, M. R., Wilson, R. C., Heasly, B. & Gold, J. I. An approximately Bayesian delta-rule model explains the dynamics of belief updating in a changing environment. J. Neurosci. 30, 12366–12378 (2010).
Article CAS PubMed PubMed Central Google Scholar
Pulcu, E. & Browning, M. The Misestimation of Uncertainty in Affective Disorders. Trends Cogn. Sci. 23, 865–875 (2019).
Article PubMed Google Scholar
Diaconescu, A. O. et al. Inferring on the Intentions of Others by Hierarchical Bayesian Learning. PLoS Comput. Biol. 10, e1003810 (2014).
Reed, E. J. et al. Paranoia as a deficit in non-social belief updating. eLife 9, (2020).
Dugas, M. J., Gagnon, F., Ladouceur, R. & Freeston, M. H. Generalized anxiety disorder: a preliminary test of a conceptual model. Behav. Res Ther. 36, 215–226 (1998).
Article CAS PubMed Google Scholar
Ladouceur, R., Gosselin, P. & Dugas, M. J. Experimental manipulation of intolerance of uncertainty: a study of a theoretical model of worry. Behav. Res Ther. 38, 933–941 (2000).
Article CAS PubMed Google Scholar
Aylward, J. et al. Altered learning under uncertainty in unmedicated mood and anxiety disorders. Nat. Hum. Behav. 3, 1116–1123 (2019).
Article PubMed PubMed Central Google Scholar
Gagne, C., Dayan, P. & Bishop, S. J. When planning to survive goes wrong: predicting the future and replaying the past in anxiety and PTSD. Curr. Opin. Behav. Sci. 24, 89–95 (2018).
Article Google Scholar
Harlé, K. M., Guo, D., Zhang, S., Paulus, M. P. & Yu, A. J. Anhedonia and anxiety underlying depressive symptomatology have distinct effects on reward-based decision-making. PLOS ONE 12, e0186473 (2017).
Article PubMed PubMed Central Google Scholar
Huang, H., Thompson, W. & Paulus, M. P. Computational dysfunctions in anxiety: failure to differentiate signal from noise. Biol. Psychiatry 82, 440–446 (2017).
Article PubMed PubMed Central Google Scholar
Huys, Q. J. M., Daw, N. D. & Dayan, P. Depression: a decision-theoretic analysis. Annu. Rev. Neurosci. 38, 1–23 (2015).
Article CAS PubMed Google Scholar
Luhmann, C. C., Ishida, K. & Hajcak, G. Intolerance of uncertainty and decisions about delayed, probabilistic rewards. Behav. Ther. 42, 378–386 (2011).
Article PubMed Google Scholar
Paulus, M. P. & Yu, A. J. Emotion and decision-making: affect-driven belief systems in anxiety and depression. Trends Cogn. Sci. 16, 476–483 (2012).
Article PubMed PubMed Central Google Scholar
Pulcu, E. & Browning, M. Affective bias as a rational response to the statistics of rewards and punishments. eLife 6, e27879 (2017).
Article PubMed PubMed Central Google Scholar
Mackintosh, N. J. A theory of attention: Variations in the associability of stimuli with reinforcement. Psychological Rev. 82, 276–298 (1975).
Article Google Scholar
Pearce, J. M. & Hall, G. A model for Pavlovian learning: variations in the effectiveness of conditioned but not of unconditioned stimuli. Psychol. Rev. 87, 532–552 (1980).
Article CAS PubMed Google Scholar
Kalman, R. E. A New Approach to Linear Filtering and Prediction Problems. Trans. ASME–J. Basic Eng. 82, 35–45 (1960).
Article MathSciNet Google Scholar
Kakade, S. & Dayan, P. Acquisition and extinction in autoshaping. Psychol. Rev. 109, 533–544 (2002).
Article PubMed Google Scholar
Moens, V. & Zénon, A. Learning and forgetting using reinforced Bayesian change detection. PLoS Comput. Biol. 15, e1006713 (2019).
Silvetti, M., Vassena, E., Abrahamse, E. & Verguts, T. Dorsal anterior cingulate-brainstem ensemble as a reinforcement meta-learner. PLOS Computational Biol. 14, e1006370 (2018).
Article ADS Google Scholar
Wilson, R. C., Nassar, M. R. & Gold, J. I. Bayesian on-line learning of the hazard rate in change-point problems. Neural Comput 22, 2452–2476 (2010).
Article PubMed PubMed Central MATH Google Scholar
Griffiths, T. L., Navarro, D. J. & Sanborn, A. N. A More Rational Model of Categorization. Proceedings of the Annual Meeting of the Cognitive Science Society 28, (2006).
Daw, N. D. & Courville, A. C. The rat as particle filter. in Advances in Neural Information Processing Systems 20 (eds. Platt, J. C., Koller, D., Singer, Y. & Roweis, S. T.) 369–376 (Curran Associates, Inc., 2008).
Brown, S. D. & Steyvers, M. Detecting and predicting changes. Cogn. Psychol. 58, 49–67 (2009).
Article PubMed Google Scholar
Doucet, A., Freitas, N. de, Murphy, K. P. & Russell, S. J. Rao-Blackwellised Particle Filtering for Dynamic Bayesian Networks. in Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence 176–183 (Morgan Kaufmann Publishers Inc., 2000).
Doucet, A. & Johansen, A. M. A tutorial on particle filtering and smoothing: fifteen years later (Oxford University Press, 2011).
Behrens, T. E. J., Hunt, L. T., Woolrich, M. W. & Rushworth, M. F. S. Associative learning of social value. Nature 456, 245–249 (2008).
Article ADS CAS PubMed PubMed Central Google Scholar
Nassar, M. R. et al. Age differences in learning emerge from an insufficient representation of uncertainty in older adults. Nat Commun 7, 11609 (2016).
Hall, G. & Pearce, J. M. Restoring the associability of a pre-exposed CS by a surprising event. Q. J. Exp. Psychol. Sect. B 34, 127–140 (1982).
Article Google Scholar
Gibbon, J., Farrell, L., Locurto, C. M., Duncan, H. J. & Terrace, H. S. Partial reinforcement in autoshaping with pigeons. Anim. Learn. Behav. 8, 45–59 (1980).
Article Google Scholar
Rescorla, R. A. Within-subject partial reinforcement extinction effect in autoshaping. Q. J. Exp. Psychol. B: Comp. Physiological Psychol. 52B, 75–87 (1999).
Google Scholar
Haselgrove, M., Aydin, A. & Pearce, J. M. A partial reinforcement extinction effect despite equal rates of reinforcement during Pavlovian conditioning. J. Exp. Psychol. Anim. Behav. Process 30, 240–250 (2004).
Article PubMed Google Scholar
Li, J., Schiller, D., Schoenbaum, G., Phelps, E. A. & Daw, N. D. Differential roles of human striatum and amygdala in associative learning. Nat. Neurosci. 14, 1250–1252 (2011).
Article CAS PubMed PubMed Central Google Scholar
Gallistel, C. R. & Gibbon, J. Time, rate, and conditioning. Psychol. Rev. 107, 289–344 (2000).
Article CAS PubMed Google Scholar
Phelps, E. A., Lempert, K. M. & Sokol-Hessner, P. Emotion and decision making: multiple modulatory neural circuits. Annu. Rev. Neurosci. 37, 263–287 (2014).
Article CAS PubMed Google Scholar
Averbeck, B. B. & Costa, V. D. Motivational neural circuits underlying reinforcement learning. Nat. Neurosci. 20, 505–512 (2017).
Article CAS PubMed Google Scholar
Holland, P. C. & Gallagher, M. Amygdala circuitry in attentional and representational processes. Trends Cogn. Sci. (Regul. Ed.) 3, 65–73 (1999).
Article CAS Google Scholar
Roesch, M. R., Esber, G. R., Li, J., Daw, N. D. & Schoenbaum, G. Surprise! neural correlates of Pearce-Hall and Rescorla-Wagner Coexist within the Brain. Eur. J. Neurosci. 35, 1190–1200 (2012).
Article PubMed PubMed Central Google Scholar
Costa, V. D., Dal Monte, O., Lucas, D. R., Murray, E. A. & Averbeck, B. B. Amygdala and Ventral Striatum Make Distinct Contributions to Reinforcement Learning. Neuron 92, 505–517 (2016).
Article CAS PubMed PubMed Central Google Scholar
Homan, P. et al. Neural computations of threat in the aftermath of combat trauma. Nat. Neurosci. 22, 470–476 (2019).
Article CAS PubMed PubMed Central Google Scholar
Holland, P. C. & Gallagher, M. Amygdala central nucleus lesions disrupt increments, but not decrements, in conditioned stimulus processing. Behav. Neurosci. 107, 246–253 (1993).
Article CAS PubMed Google Scholar
Holland, P. C. & Schiffino, F. L. Mini-review: prediction errors, attention and associative learning. Neurobiol. Learn Mem. 131, 207–215 (2016).
Article PubMed PubMed Central Google Scholar
Hampton, A. N., Adolphs, R., Tyszka, M. J. & O’Doherty, J. P. Contributions of the amygdala to reward expectancy and choice signals in human prefrontal cortex. Neuron 55, 545–555 (2007).
Article CAS PubMed Google Scholar
de Berker, A. O. et al. Computations of uncertainty mediate acute stress responses in humans. Nat. Commun. 7, 10996 (2016).
Article ADS PubMed PubMed Central Google Scholar
Khorsand, P. & Soltani, A. Optimal structure of metaplasticity for adaptive learning. PLoS Comput. Biol. 13, e1005630 (2017).
Article ADS PubMed PubMed Central Google Scholar
Diederen, K. M. J. & Schultz, W. Scaling prediction errors to reward variability benefits error-driven learning in humans. J. Neurophysiol. 114, 1628–1640 (2015).
Article CAS PubMed PubMed Central Google Scholar
Wilson, R. C., Nassar, M. R. & Gold, J. I. A Mixture of Delta-Rules Approximation to Bayesian Inference in Change-Point Problems. PLOS Computational Biol. 9, e1003150 (2013).
Article ADS MathSciNet CAS Google Scholar
Dayan, P. & Yu, A. Uncertainty and Learning. IETE J. Res. 49, 171–181 (2003).
Article Google Scholar
Yu, A. J. & Dayan, P. Uncertainty, neuromodulation, and attention. Neuron 46, 681–692 (2005).
Article CAS PubMed Google Scholar
Hartley, C. A. & Phelps, E. A. Anxiety and decision-making. Biol. Psychiatry 72, 113–118 (2012).
Article PubMed Google Scholar
Beck, A. T. Depression: Causes and Treatment. (University of Pennsylvania Press, 1970).
Duits, P. et al. Updated meta-analysis of classical fear conditioning in the anxiety disorders. Depress Anxiety 32, 239–253 (2015).
Article PubMed Google Scholar
Lissek, S. et al. Classical fear conditioning in the anxiety disorders: a meta-analysis. Behav. Res Ther. 43, 1391–1424 (2005).
Article PubMed Google Scholar
Bouton, M. E. Context and behavioral processes in extinction. Learn Mem. 11, 485–494 (2004).
Article PubMed Google Scholar
Redish, A. D., Jensen, S., Johnson, A. & Kurth-Nelson, Z. Reconciling reinforcement learning models with behavioral extinction and renewal: implications for addiction, relapse, and problem gambling. Psychol. Rev. 114, 784–805 (2007).
Article PubMed Google Scholar
Stephan, K. E., Baldeweg, T. & Friston, K. J. Synaptic plasticity and dysconnection in schizophrenia. Biol. Psychiatry 59, 929–939 (2006).
Article CAS PubMed Google Scholar
Baker, S. C., Konova, A. B., Daw, N. D. & Horga, G. A distinct inferential mechanism for delusions in schizophrenia. Brain 142, 1797–1812 (2019).
Article PubMed PubMed Central Google Scholar
Horga, G. & Abi-Dargham, A. An integrative framework for perceptual disturbances in psychosis. Nat. Rev. Neurosci. 20, 763–778 (2019).
Article CAS PubMed Google Scholar
Wengler, K., Goldberg, A., Chahine, G. & Horga, G. Hallucinations and Delusions Relate to Distinct Hierarchical Alterations in Intrinsic Neural Timescales. Biol. Psychiatry 87, S179–S180 (2020).
Article Google Scholar
Le Pelley, M. E. The role of associative history in models of associative learning: a selective review and a hybrid model. Q J. Exp. Psychol. B 57, 193–243 (2004).
Article PubMed Google Scholar
Haselgrove, M., Esber, G. R., Pearce, J. M. & Jones, P. M. Two kinds of attention in Pavlovian conditioning: evidence for a hybrid model of learning. J. Exp. Psychol. Anim. Behav. Process 36, 456–470 (2010).
Article PubMed Google Scholar
Pearce, J. M. & Mackintosh, N. J. Two theories of attention: a review and a possible integration. in Attention and Associative Learning: From Brain to Behaviour (eds. Mitchell, C. & Le Pelley, M. E.) 11–39 (Oxford University Press, 2010).
Le Pelley, M. E., Mitchell, C. J., Beesley, T., George, D. N. & Wills, A. J. Attention and associative learning in humans: An integrative review. Psychol. Bull. 142, 1111–1140 (2016).
Article PubMed Google Scholar
Marr, D. Vision: A Computational Investigation into the Human Representation and Processing of Visual Information. (W. H. Freeman and Company, 1982).
Aponte, E. et al. TAPAS - Translational Algorithms for Psychiatry-Advancing Science. Front. Psychiatry. https://doi.org/10.3389/fpsyt.2021.680811 (2020).
Gittins, J. C. Bandit Processes and Dynamic Allocation Indices. J. R. Stat. Soc. Ser. B (Methodol.) 41, 148–177 (1979).
MathSciNet MATH Google Scholar
Daw, N. D., O’Doherty, J. P., Dayan, P., Seymour, B. & Dolan, R. J. Cortical substrates for exploratory decisions in humans. Nature 441, 876–879 (2006).
Article ADS CAS PubMed PubMed Central Google Scholar
Wilson, R. C., Takahashi, Y. K., Schoenbaum, G. & Niv, Y. Orbitofrontal cortex as a cognitive map of task space. Neuron 81, 267–279 (2014).
Article CAS PubMed PubMed Central Google Scholar
West, M. On Scale Mixtures of Normal Distributions. Biometrika 74, 646–648 (1987).
Article MathSciNet MATH Google Scholar
Gamerman, D., dos Santos, T. R. & Franco, G. C. A Non-Gaussian Family of State-Space Models with Exact Marginal Likelihood. J. Time Ser. Anal. 34, 625–645 (2013).
Article MathSciNet MATH Google Scholar
Piray, P. & Daw, N. D. A model for learning based on the joint estimation of stochasticity and volatility. Zenodo, https://doi.org/10.5281/zenodo.5526668 (2021).

Download references

Acknowledgements

We thank Sam Zorowitz, Peter Dayan, Yoel Sanchez Araujo, and Guillermo Horga for helpful discussions. This work was supported by grants IIS-1822571 from the National Science Foundation, part of the CRNCS program, and 61454 from the John Templeton Foundation.

Author information

Authors and Affiliations

Princeton Neuroscience Institute and Department of Psychology, Princeton University, Princeton, NJ, USA
Payam Piray & Nathaniel D. Daw

Authors

Payam Piray
View author publications
You can also search for this author in PubMed Google Scholar
Nathaniel D. Daw
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

P.P. and N.D.D. designed the study and wrote the manuscript. P.P. performed analyses.

Corresponding author

Correspondence to Payam Piray.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information Nature Communications thanks the anonymous reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Peer Review File

Reporting summary

Source data

Source Data

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Piray, P., Daw, N.D. A model for learning based on the joint estimation of stochasticity and volatility. Nat Commun 12, 6587 (2021). https://doi.org/10.1038/s41467-021-26731-9

Download citation

Received: 23 January 2021
Accepted: 08 October 2021
Published: 15 November 2021
DOI: https://doi.org/10.1038/s41467-021-26731-9

This article is cited by

Specifying the timescale of early life unpredictability helps explain the development of internalising and externalising behaviours
- Bence Csaba Farkas
- Axel Baptista
- Pierre Olivier Jacquet
Scientific Reports (2024)
Understanding the development of reward learning through the lens of meta-learning
- Kate Nussenbaum
- Catherine A. Hartley
Nature Reviews Psychology (2024)
Blocked training facilitates learning of multiple schemas
- Andre O. Beukers
- Silvy H. P. Collin
- Kenneth A. Norman
Communications Psychology (2024)
Curiosity: primate neural circuits for novelty and information seeking
- Ilya E. Monosov
Nature Reviews Neuroscience (2024)
How do animals weigh conflicting information about reward sources over time? Comparing dynamic averaging models
- Jack Van Allsburg
- Timothy A. Shahan
Animal Cognition (2024)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.