Abstract
Previous research has stressed the importance of uncertainty for controlling the speed of learning, and how such control depends on the learner inferring the noise properties of the environment, especially volatility: the speed of change. However, learning rates are jointly determined by the comparison between volatility and a second factor, momenttomoment stochasticity. Yet much previous research has focused on simplified cases corresponding to estimation of either factor alone. Here, we introduce a learning model, in which both factors are learned simultaneously from experience, and use the model to simulate human and animal data across many seemingly disparate neuroscientific and behavioral phenomena. By considering the full problem of joint estimation, we highlight a set of previously unappreciated issues, arising from the mutual interdependence of inference about volatility and stochasticity. This interdependence complicates and enriches the interpretation of previous results, such as pathological learning in individuals with anxiety and following amygdala damage.
Similar content being viewed by others
Introduction
Among the successes of computational neuroscience is a levelspanning account of learning and conditioning, which has grounded biological plasticity mechanisms (specifically, errordriven updating) in terms of a normative analysis of the problem faced by the organism^{1,2,3,4,5}. These models recast learning as statistical inference: using experience to estimate the amount of some outcome (e.g., food) expected on average following some cue or action. This is an important subproblem of reinforcement learning, which uses such value estimates to guide choice. The statistical framing has motivated an influential program of investigating the brain’s mechanisms for tracking uncertainty about its beliefs, and how these impact learning^{6,7,8,9,10}.
Uncertainty, in turn, depends upon the noise properties of the values being learned, including both the degree of stochasticity in their measurement (observation noise, the variance of which we call stochasticity) and how quickly or how often they change (process noise, the variance of which is known as volatility). In general, then, a statistically efficient learner must estimate not only just the primary quantity of interest (e.g., the action value) but also parameters describing its noise properties. This perspective has inspired a series of hierarchical Bayesian inference models, which extend inference to either volatility or stochasticity, though typically while treating the other noise parameter as fixed and known to the model.
For volatility, a particularly influential series of theories extends a baseline model known as the Kalman filter to incorporate volatility estimation (but conditional on known stochasticity)^{6,11,12}. All else equal, when volatility is higher, the organism is more uncertain about the cue’s value (because the true value will on average have fluctuated more following each observation), and so the learning rate (the reliance on each new outcome) should be higher. A series of experiments have reported behavioral and neural signatures of these effects of volatility enhancing learning rate, and also their disruption in relation to psychiatric symptoms^{6,8,10,13,14,15,16,17,18,19,20,21,22,23}. Conversely, stochasticity also affects the learning rate, but in the opposite direction: all else equal, when individual outcomes are more stochastic (larger stochasticity), they are less informative about the cue’s true value and the learning rate, in turn, should be smaller. Here again, experiments confirm that people adjust their learning rates in the predicted direction^{24,25}, and this behavior has been captured by a model that estimates stochasticity (but treating the process noise parameter, in this case a hazard rate, as known).
Altogether, this work has led to a strong argument that the brain’s mechanisms for tracking uncertainty, and the inference of the noise parameters that govern it, are crucial to healthy and disordered learning^{26}. Although these components seem individually well understood, in this article we argue that important insights are revealed by considering in greater detail the full problem facing the learner: simultaneously estimating both volatility and stochasticity during learning. By introducing a model that performs such joint estimation, and studying its behavior in reinforcement learning tasks, we show that because of the interrelationship between these variables, a full account of any of them interacts with the other in consequential ways.
The key issue is that although the learner’s estimates of them play opposite roles on learning rates, from the learner’s perspective, they both similarly manifest in noisier, less reliably predictable outcomes. The observation that experienced noise can, to a first approximation, be explained by either volatility or stochasticity—and that these effects might be confused, either by experimenters or by learners—has implications. First, previous work apparently showing variation in volatility processing in different groups, such as various psychiatric patients^{14,16,17,19,21,22,27,28} (using a model and tasks that do not vary stochasticity), might instead reflect misidentified abnormalities in processing stochasticity. We suggest that future research should test both the dimensions of learning explicitly. Furthermore, from the perspective of a learner inferring volatility and stochasticity, these factors should compete or trade off against one another to best explain experienced noise. This means that any dysfunction or damage that impairs detection of stochasticity, should lead to a compensatory increase in inferred volatility, and vice versa: a classic pattern known in Bayesian inference as a failure of “explaining away.”
We argue that such compensatory tradeoffs may be apparent both in anxiety disorders and following damage to amygdala. Intolerance of uncertainty is thought to be a critical component of anxiety and a crucial risk factor for developing anxiety disorders^{29,30}. Although there has been recent interest in operationalizing this idea by connecting it to statistical learning models and tasks^{26,31,32,33,34,35,36,37,38}, we and others have focused on apparent abnormalities in processing volatility^{13,20,26}. The current model suggests a different interpretation, in which anxiety primarily disrupts inference about stochasticity, but with the additional result that the learner misinterprets noise due to stochasticity as a signal of change, i.e., volatility. We argue that the complementary pattern of explaining away, in which a failure to detect volatility leads to change misattributed to stochasticity, can be appreciated in studies of the amygdala’s role in modulating learning rates. In particular, our model suggests that a specific involvement of amygdala in volatility (and the explaining away pattern) explains effects of amygdala damage better than an involvement in learning rates more generally. These sorts of reciprocal interactions also give rise to a richer and subtler set of possible patterns of dysfunction that may help to understand a wide range of other neurological and psychiatric disorders, such as schizophrenia, in which there has been a tendency to study altered processing of uncertainty narrowly in the context of volatility.
The model also sheds light on experimental phenomena of learning rates in conditioning, and on two classic descriptive theories of conditioning in psychology that have been interpreted as predecessors of the statistical accounts^{39,40}. We suggest that seemingly contradictory effects of noise in different experiments, associated with these two theories, can be identified with effects on learning rates of inferred stochasticity vs. volatility, respectively. On this view, different effects will dominate in different experiments depending on which parameter the pattern of the noise suggests.
In the remainder of this article, we present i) a probabilistic model for the joint estimation of volatility and stochasticity from experience; and ii) volatility and stochasticitylesioned models in which the corresponding module is damaged. These models highlight the mutual interdependence of inference about volatility and stochasticity and show how the interdependence leads the model to predict paradoxical compensatory behaviors if inference about either factor is damaged. We use these lesioned models in a series of simulation experiments to explain aspects of pathological behavior observed in anxiety disorders and following amygdala damage.
Results
Model
We begin with the Kalman filter, which describes statistically optimal learning from data produced according to a specific form of noisy generation process. The model assumes that the agent must draw inferences (e.g., about true reward rates) from observations (individual reward amounts) that are corrupted by two distinct sources of noise: process noise or volatility and outcome noise or stochasticity (Fig. 1a, b). Volatility captures the speed by which the true value being estimated changes from trial to trial (modeled as Gaussian diffusion); stochasticity describes additional measurement noise in the observation of each outcome around its true value (modeled as Gaussian noise on each trial).
For this data generating process, if the true values of volatility and stochasticity, \({v}_{t}\) and \({s}_{t}\) are known, then optimal inference about the underlying reward rate is tractable using a specific application of Bayes rule, here called the Kalman filter^{41}. The Kalman filter represents its beliefs about the reward rate at each step as a Gaussian distribution with a mean, \({m}_{t}\), and variance (i.e., uncertainty about the true value), \({w}_{t}\). The update, on every trial, is driven by a prediction error signal, \({\delta }_{t}\), and learning rate, \({\alpha }_{t}\). This leads to simple update rules following observation of outcome \({o}_{t}\):
This derivation thus provides a rationale for the errordriven update (Eq. 3) prominent in neuroscience and psychology^{42}, and adds to these a principled account of the learning rate, \({\alpha }_{t}\), which on this view should depend (Eq. 2) on the agent’s uncertainty and the noise characteristics of the environment. In particular, Eq. 2 shows that the learning rate is increasing and decreasing, respectively, with volatility and stochasticity. This is because higher volatility increases the chance that the true value will have changed since last observed (increasing the need to rely on the new observation), but higher stochasticity decreases the informativeness of the new observation relative to previous beliefs.
This observation launched a line of research focused on elucidating and testing the prediction that organisms adopt higher learning rates when volatility is higher^{6}. But the very premise of these experiments violates the simplifying assumption of the Kalman filter—that volatility is fixed and known to the agent. To handle this situation, new models were developed^{11,12} that generalize the Kalman filter to incorporate learning the volatility \({v}_{t}\) as well, arising from Bayesian inference in a hierarchical generative model in which the true \({v}_{t}\) is also changing. In this case, exact inference is no longer tractable, but approximate inference is possible and typically incorporates Eqs. 1–4 as a subprocess.
This line of work on volatility estimation inherited from the Kalman filter the view of stochasticity as fixed and known. Of course, in general, all the same considerations apply to stochasticity as well: it must be learned from experience, may be changing, and its value impacts learning rate. Indeed, estimating both parameters is critical for efficient learning, because they have opposite effects on learning rate: whereas volatility increases the learning rate, stochasticity reduces it (Fig. 1c). Although some algorithms have been explored for learning when both noise parameters are unknown^{43,44,45}, the main other application of this type of model in neuroscience has relied on a different simplification^{25}, which estimates the stochasticity while treating the hazard rate (the analogue of volatility for changepoint problems) as fixed and known to the model.
Learning these two parameters simultaneously is more difficult because, from the perspective of the agent, larger values of either volatility or stochasticity result in more surprising observations: i.e., larger outcome variance (Fig. 1e). However, there is a subtle and critical difference between the effects of these parameters on generated outcomes, whereas larger volatility increases the autocorrelation between outcomes (i.e., covariation between the outcomes on consecutive trials), stochasticity reduces the autocorrelation (Fig. 1f). This is the key point that makes it possible to dissociate and infer these two terms while only observing outcomes.
We developed a probabilistic model for learning under these circumstances. The data generation process arises from a further hierarchical generalization of these models (specifically the generative model used in our recent work^{12}), in which the true value of stochasticity is unknown and changing, as are the true reward rate and volatility (Fig. 1d). The goal of the learner is to estimate the true reward rate from observations, which necessitates inferring volatility and stochasticity as well.
As with models of volatility, exact inference for this generative process is intractable. Furthermore, in our experience this problem is also relatively challenging to handle with variational inference, the family of approximate inference techniques used previously (see Discussion). Thus, we have instead used a different standard approximation approach that has also been popular in psychology and neuroscience, Monte Carlo sampling^{3,46,47,48}. In particular, we use particle filtering to track \({v}_{t}\) and \({s}_{t}\) based on data^{49,50}. Our method exploits the fact that given a sample of volatility and stochasticity, inference for the reward rate is tractable and is given by Eqs. 1–4, in which \({s}_{t}\) and \({v}_{t}\) are replaced by their corresponding samples (see Methods; this combination of sequential sampling with exact inference for a subproblem is known as RaoBlackwellized particle filtering).
Learning under volatility and stochasticity
We now consider the implications of this model for learning under volatility and stochasticity.
A series of studies has used twolevel manipulations (high vs. low volatility blockwise) to investigate the prediction that learning rates should increase under high volatility^{6,13,38,51}. Here volatility has been operationalized by frequent or infrequent reversals (Fig. 2a), rather than the smoother Gaussian diffusion that the volatilityaugmented Kalman filter models formally assume. Nevertheless, applied to this type of task, these models detect higher volatility in the frequentreversal blocks, and increase their learning rates accordingly^{6,11,12}. The current model (which effectively incorporates the others as a special case) achieves the same blockwise result when stochasticity is held fixed across both blocks (Supplementary Fig. 1).
In the preceding line of studies, stochasticity was not manipulated. (Indeed, it was not even independently manipulable because rewards were binary, and the variance of binomial outcomes is determined only by the mean.) However, analogous effects of stochasticity have been seen in another line of studies^{7,9,25,52}. In these studies, Nassar and colleagues studied learning rates in a task in which subjects had to predict a value, from observations in which the true value was corrupted, blockwise, by different levels of additive Gaussian noise (i.e., stochasticity) and occasionally “jumping” with a constant hazard rate, analogous to volatility. The main feature of these results relevant to the current model is that these studies have shown that participants’ learning rate decreases with increases in the noise level (see also^{24}). This effect cannot be explained by models that only consider volatility, and in fact, those models make opposite predictions because they take increased noise as evidence of volatility increase. The current model, however, produces the same blockwise effect as humans: because it correctly infers the change in stochasticity, its learning rate is lower, on average, for higher levels of noise (Supplementary Fig. 2). Although we do not intend the current model as a detailed account of how people solve this class of tasks (which is based on a somewhat different generative dynamics), the model can also reproduce other more finegrained aspects of human behavior in this task, particularly increases in learning rate following switches and scaling of learning rate with the magnitude of error (Supplementary Fig. 2).
Note that while considered together, these two lines of studies separately demonstrate the two types of effects on learning rates we stress, neither of these lines of work has manipulated stochasticity alongside volatility (though see also^{24}). Furthermore, learning of the noise hyperparameters in these studies has largely been explicitly modeled only for either parameter conditional on the other being known. We next consider a variant of this type of task, elaborated to include a 2 × 2 factorial manipulation of both the stochasticity alongside volatility (Fig. 2; we also substitute smooth diffusion for reversals). Here, both parameters are constant within the task, but they are unknown to the model. A series of outcomes was generated based on a Markov random walk in which the hidden reward rate is changing according to a random walk and the learner observes outcomes that are noisily generated according to the reward rate.
Figure 2 shows the model’s learning rates and how these follow from its inferences of volatility and stochasticity. As above, the model increases its learning rate in the higher volatility conditions but as expected it also decreases it in the higher stochasticity conditions (Fig. 2a). These effects on learning rate arise, in turn (via Eq. 2) because the model is able to correctly estimate the various combinations of volatility and stochasticity from the data (Fig. 2b, c).
Our model thus suggests a general program of augmenting the standard 2level volatility manipulation by crossing it with a second manipulation, of stochasticity, and predicts that higher stochasticity should decrease learning rate, separate from volatility effects.
Interactions between volatility and stochasticity
The previous results highlight an important implication of the current model: that inferences about volatility and stochasticity are mutually interdependent. These interrelationships immediately imply a general interpretational issue for experiments that manipulate only one of these noise parameters, and analyze data using a model that attributes all dynamic learning rate effects to one of them. But the details of the interdependence are themselves informative. From the learner’s perspective, a challenging problem (simplified away in many of the previous models) is to distinguish volatility from stochasticity when both are unknown, because both of them increase the noisiness of observations. Disentangling their respective contributions requires trading off two opposing explanations for the pattern of observations, a process known in Bayesian probability theory as explaining away. Thus, models that neglect stochasticity tend to misidentify stochasticity as volatility and inappropriately modulate learning.
Intriguingly, this situation might in principle arise in neurological damage and psychiatric disorders, if they selectively impact inference about volatility or stochasticity. In that case, the model predicts a characteristic pattern of compensation, whereby learning rate modulation is not merely impaired but reversed, reflecting the substitution of volatility for stochasticity or vice versa: a failure of explaining away. Fig. 3 shows this phenomenon in the 2 × 2 design of Fig. 2, with two characteristic lesion models. The key point here is that a lesioned model that does not consider one factor (e.g., stochasticity), inevitably makes systematically incorrect inferences about the other factor too. Importantly, previous models that only consider volatility are analogous to the stochasticity lesion model (Fig. 3b) and, therefore, make systematically erroneous inference about volatility (Fig. 3g) and misadjust learning rate if stochasticity is changing (Fig. 3e). This set of lesioned models provide a rich potential framework for understanding pathological learning in psychiatric and neurologic disorders. Later we show that stochasticity lesion and volatility lesion models explain deficits in learning observed in anxiety and following amygdala damage, respectively. But first, we apply the healthy model to reinterpret some longstanding issues about learning rates in animal conditioning.
Stochasticity vs. volatility in Pavlovian learning
Learning rates and their dependence upon previous experience have also been extensively studied in Pavlovian conditioning. In this respect, a distinction emerges between two seemingly contradictory lines of theory and experiment, those of Mackintosh^{39} vs. Pearce and Hall^{40}. Both of these theories concern how the history of experiences with some cue drives animals to devote more or less “attention” to it. Attention is envisioned to affect several phenomena including not only just rates of learning about the cue but also other aspects of their processing, such as competition between the multiple stimuli presented in compound. Here, to most clearly examine the relationship with the research and models discussed above, we focus specifically on learning rates for a single cue.
The two lines of models start with opposing core intuitions. Mackintosh^{39} argues that animals should pay more attention to (e.g., learn faster about) cues that have in the past been more reliable predictors of outcomes. Pearce and Hall^{40} argue for the opposite: faster learning about cues that have previously been accompanied by surprising outcomes, i.e., those that have been less reliably predictive.
Indeed, different experiments—as discussed below—support either view. For our purposes, we can view these experiments as involving two phases: a pretraining phase that manipulates stochasticity or surprise, followed by a retraining phase to test how this affects the speed of subsequent (re)learning. In terms of our model, we can interpret the pretraining phase as establishing inferences about stochasticity and volatility, which then (depending on their balance) govern learning rate during retraining. On this view, noisier pretraining might, depending on the pattern of noise, lead to either higher volatility and higher learning rates (consistent with PearceHall) or higher stochasticity and lower learning rates (consistent with Mackintosh).
First consider volatility. It has been argued that the PearceHall^{40} logic is formalized by volatilitylearning models^{2,3,12}. In these models, surprising outcomes during pretraining increase inferred volatility and thus speed subsequent relearning. Hall and Pearce^{40,53} pretrained rats with a tone stimulus predicting a moderate shock. In the retraining phase, the intensity of the shock was increased. Critically, one group of rats experienced a few surprising “omission” trials at the end of the pretraining phase, in which the tone stimulus was presented with no shock. The speed of learning was substantially increased following the omission trials compared with a control group that experienced no omission in pretraining. Figure 4a–d shows a simulation of this experiment from the current model, showing that the omission trials lead to increased volatility and faster learning. Note that the historydependence of learning rates in this type of experiment also rejects simpler models like the Kalman filter, in which volatility (and stochasticity) are taken as fixed; for the Kalman filter, learning rate depends only on the number of pretraining trials but not the particular pattern of observed outcomes. The response probability of the model thus shows the same pattern as response rate for rats (Supplementary Fig. 4).
Next, consider stochasticity. Perhaps the best example of Mackintosh’s^{39} principle in terms of learning rates for a single cue is the “partial reinforcement extinction effect”^{54,55,56}. Here, for pretraining, a cue is reinforced either on every trial or instead (“partial reinforcement”) on only a fraction of trials (Fig. 4e). The number of times that the learner encounters the stimulus is the same for both conditions, but the outcome is noisier for the partially reinforced stimulus. The retraining phase consists of extinction (i.e., fully unreinforced presentations of the cue), which occurs faster for fully reinforced cues even though they had been paired with more reinforcers initially. Our model explains this finding (Fig. 4f), because it infers larger stochasticity in the partially reinforced condition, leading to slower learning (Fig. 4g, h). Notably, this type of finding cannot be explained by models, which learn only about volatility^{6,11,12}. In general, this class of models mistake partial reinforcement for increased volatility (rather than increased stochasticity), and incorrectly predict faster learning.
Note the subtle difference between the two experiments of Fig. 4. The surprising omission experiment involves stable pretraining prior to omission, then an abrupt shift, whereas pretraining in the partial reinforcement experiment is stochastic, but uniformly so. Accordingly, though both pretraining phases involve increased noise (relative to their controls) the model interprets the pattern of this noise as more likely reflecting either volatility or stochasticity, respectively, with opposite effects on learning rate. Overall, then, these experiments support the current model’s suggestion that organisms learn about stochasticity in addition to volatility. Conversely, the models help to clarify and reconcile the seemingly opposing theory and experiments of Mackintosh and PearceHall, at least with respect to learning rates for individual cues. Indeed, although previous work has noted the relationship between PearceHall surprise, uncertainty, and learning rates^{2,3,6,12,20,57}, the current modeling significantly clarifies this mapping by identifying it more specifically with volatility, as contrasted against simultaneous inference about stochasticity. Meanwhile, while our basic statistical interpretation of the partial reversal extinction effect has been noted before (e.g., by Gallistel and Gibbon^{58}), to our knowledge these previous explanations have not reconciled it with the volatility/PearceHall phenomena. Instead, previous work attempting to map Mackintosh’s^{39} principle onto statistical models (and distinguish it from PearceHalllike effects) has focused on attention and uncertainty affecting cue combination rather than learning rates^{2}, which is a complementary but separate idea.
Anxiety and inference about stochasticity vs. volatility
Lesioned models like the ones in Fig. 3 are potentially useful for understanding learning deficits in psychiatric disorders, for example anxiety disorders, which have recently been studied in the context of volatility and its effects on learning rate^{13,20}. These studies have shown that people with anxiety are less sensitive to volatility manipulations in probabilistic learning tasks similar to Fig. 2. Besides learning rates, an analogous insensitivity to volatility has been observed in pupil dilation^{13} and in neural activity in the dorsal anterior cingulate cortex, a region that covaried with learning rate in controls^{20}.
These results have been interpreted in relation to the more general idea that intolerance of uncertainty is a key foundation of anxiety; accordingly, fully understanding them requires taking account of multiple sources of uncertainty^{26}, including both the volatility and stochasticity. Nevertheless, the primary interpretation of these types of results has been that observed abnormalities are rooted in volatility estimation per se^{13,20,26}. Our current model suggests an alternative explanation: that the core underlying deficit is actually with stochasticity, and apparent disturbances in volatility processing are secondary to this, due to their interrelationship.
In particular, these effects and a number of others are well explained by the stochasticity lesion model of Fig. 3b, i.e., by assuming that people with anxiety have a core deficit in estimating stochasticity, and instead treat it as small and constant. As shown in Fig. 5b, this model shows insensitivity to volatility manipulation, but in the model that is actually because volatility is misestimated nearer ceiling due to underestimation of stochasticity. This, in turn, substantially dampens further adaptation of learning rate in blocks when volatility actually increases. The elevated learning rate across all blocks leads to hypersensitivity to noise, which prevents individuals with anxiety from benefitting from stability, as has been observed empirically^{20}. In particular, Piray et al.^{20} have studied learning in individuals with low or high in trait social anxiety using a switching probabilistic task (Supplementary Fig. 6) in which each trial started with a social threatening cue (angry face image). It was found that individuals with high trait anxiety perform particularly worse than controls in stable trials, whereas their performance is generally matched with controls in volatile trials^{20} (Fig. 5e). The model shows similar behavior (Fig. 5f).
One key prediction of our model, which differs from a volatilityspecific account, is that the learning rate is generally higher in people with anxiety regardless of volatility manipulation or even in tasks that do not manipulate volatility. In fact, Browning et al.^{13} do not find evidence to support this prediction, they do not find a significant overall effect of anxiety on learning rate. Of course, it is important not to interpret null results as evidence in favor of the null hypothesis, since a failure to reject the null hypothesis may reflect insufficient power to detect a true effect. Indeed, in Browning’s^{13} data, while the effect of anxiety on learning rate was not significant overall or in either condition, the point estimate was largest (r(28) = 0.26, p = 0.16) in the stable condition, which is also the block that the model predicts the effect should be statistically strongest (because baseline learning rates, absent any effect of anxiety, are lower).
Importantly, other, larger studies provide positive statistical support for the prediction of elevated learning rate with anxiety^{31,33,34}. Note that in deltarule models, behavior under higher learning rates is closer to winstay/loseshift (since higher learning rates weight the most recent outcome more heavily, with full winstay/loseshift—dependence only on the most recent outcome—equivalent to a learning rate of 1). Such a strategy has itself been linked to anxiety^{33,34}. A notable observation was made in a large (n = 122) study by Huang et al.^{34}, who found people with anxiety show higher winstay/loseshift and this effect is driven by higher loseshift. Figure 5g, h shows results of simulating the proposed model in a task similar to Huang et al.^{34} (Supplementary Fig. 6). The model shows the same pattern of behavior, with the additional modulation by win vs. loss captured because any loss is seen as an evidence for volatility and that results in higher learning rate and a contingency switch. The effect is much less salient for win trials because prediction errors are relatively small in those trials, which substantially dampen any effect of learning rate. Across all trials, the stochasticity lesion model shows higher learning rate, similar to what Huang et al.^{34} found by fitting reinforcement learning models to choice data.
Finally, the lesion model is an extreme case in which a hypothetical stochasticity module is completely eliminated. But this general approach can be extended to less extreme cases in which one module of the model (e.g., stochasticity) has some relative disadvantage in explaining noise. In terms of our model, this can be achieved by having higher update rate parameters for volatility relative to that of stochasticity. These are two main parameters of the model that one can use to explain individual differences across people. For example, the ratio of volatility to stochasticity update rate can be used to capture continuous individual variation in trait anxiety. In this case, the stochasticity lesion model of Fig. 3b is an extreme case of this approach in which the stochasticity update rate is zero (thus the ratio of volatility to stochasticity is infinitely large). We have exploited this approach to simulate a result from Browning et al.^{13} concerning graded individual differences in anxiety’s effect on learning rate adjustment. In particular, they report (and the model captures; Fig. 6) negative correlation between relative learning rate (volatile minus stable) and trait anxiety in the probabilistic switching task with stable and volatile blocks.
Amygdala damage and inference about volatility vs. stochasticity
The opposite pattern of compensatory effects on inference is evidently visible in the effects of amygdala damage on learning. The amygdala plays an important role in associative learning^{59,60}. Although some researchers have emphasized a role of the amygdala as a site of association between conditioned and unconditionedstimulus in conditioning per se, other authors (drawing on evidence from human neuroimaging work, singlecell recordings, and lesion studies) have proposed that the amygdala is involved in a circuit for controlling or adjusting learning rates^{57,60,61,62,63,64}. Most informative from the perspective of our model are lesion studies in rats^{61,65,66},which we interpret as supportive an involvement specifically in processing of volatility, rather than learning rates or uncertainty more generally. These experiments examine a surpriseinduced upshift in learning rate similar to the PearceHall experiment from Fig. 4. Lesions to the central nucleus of the amygdala attenuate this effect, suggesting a role in volatility processing. But an important detail of these results with respect to our model’s predictions is that the effect is not merely attenuated but reversed. This reciprocal effect supports perhaps the most central feature and prediction of our model that volatility trades off against a (presumably anatomically separate) system for stochasticity estimation.
Figure 7 shows their serial prediction task and results in more detail. Rats performed a prediction task in two phases. A group of rats in the “consistent” condition performed the same prediction task in both phases. The “shift” group, in contrast, experienced a sudden change in the contingency in the second phase. Whereas the control rats showed elevation of learning rate in the shift condition manifested by elevation of food seeking behavior in the very first trial of the test, the amygdala lesioned rats showed the opposite pattern. Lesioned rats showed significantly smaller learning rate in the shift condition compared with the consistent one, a reversal of the surpriseinduced upshift.
We simulated the model in this experiment. To model a hypothetical effect of amygdala lesion on volatility inference, we assumed that lesioned rats treat volatility as small and constant. As shown in Fig. 7, the model shows an elevated learning rate in the shift condition for the control rats, which is again due to increases in inferred volatility after the contingency shift. For the lesioned model, however, surprise is misattributed to the stochasticity term as an increase in inferred volatility cannot explain away surprising observations (because it was held fixed). Therefore, the contingency shift inevitably increases stochasticity and thereby decreases the learning rate. Notably, the compensatory reversal in this experiment cannot be explained using models that do not consider both the volatility and stochasticity terms.
A similar pattern of effects of amygdala lesions, consistent with our theory, is seen in an experiment on nonhuman primates. In a recent report by Costa et al.^{63}, it has been found that amygdala lesions in monkeys disrupt reversal learning with deterministic contingencies, moreso than a reversal task with stochastic contingencies. This is striking since deterministic reversal learning tasks are much easier. Similar to the previous experiment, our model explains this finding because large surprises caused by the contingency reversal are misattributed to the stochasticity in lesioned animals (because volatility was held fixed), while control animals correctly attribute them to the volatility term (Fig. 8; see Supplementary Fig. 7 for performance of the model and Supplementary Fig. 8 for simulation of the model in all probabilistic schedules tested by Costa et al.^{63}). This effect is particularly large in the deterministic case because the environment is very predictable before the reversal and therefore the reversal causes larger surprises than those for the stochastic one. Similar findings have been found in a study in human subjects with focal bilateral amygdala lesions^{67}, in which patients tend to show more deficits in deterministic reversal learning than stochastic one. Again, these experimental findings are not explained by a Kalman filter or models that only consider the volatility term.
Overall, then, these experiments support the current model’s picture of dueling influences of stochasticity and volatility. Furthermore, the current model helps to clarify the precise role of amygdala in this type of learning, relating it specifically to volatilitymediated adjustments.
Discussion
A central question in decision neuroscience is how the brain learns from the consequences of choices given that these can be highly noisy. To do so effectively requires simultaneously learning about the characteristics of the noise, as has been emphasized most strongly in a prominent line of work on how the brain tracks the volatility of the environment. Here we revisit this problem for the more realistic case when both volatility and a second noise parameter, stochasticity, must be simultaneously estimated.
While various experiments have, mostly separately, shown that humans can adjust learning rates in response to manipulations of either type of noise, models of how they do so have focused primarily on estimating either parameter while taking the other as known. This skirts the more difficult problem of distinguishing types of noise. To solve this problem and investigating its consequences for learning, we built a probabilistic model for learning in uncertain environments that tracks volatility and stochasticity simultaneously. Using this model to simulate a number of experiments across conditioning, psychiatry and lesion studies, we show a consistent theme whereby the interdependence of inference about these two noise parameters gives rise to patterns of effects that could not be appreciated in previous models that considered estimating either type of noise separately.
The importance of dissociating these forms of noise, and some aspects of their interaction, have been noted previously. For instance, Pulcu and Browning^{26} emphasize the inadequacy of existing experiments for dissociating volatility vs. stochasticity learning, and raise the possibility that in principle, people might confuse them. In Nassar et al.’s study^{25}, the volatilitylike hazard rate parameter (though viewed from the model’s perspective as fixed and known) is fit as a persubject free parameter construed as an individual difference. The empirical and modelfitting results showcase a dependence of the (inferred) stochasticity parameter upon the (fit/known) hazard rate, consistent with the bidirectional pattern of interdependence we posit. Building on all these ideas, we build and simulate a model to showcase the potential interdependence between these two types of inference across a range of situations. One important caveat, given the range of applications we consider, is that we abstract away details of the many individual studies to emphasize their parallelism with respect to our main point of interest. Thus, for instance, we neglect valencedependent modulation of learning which is likely an additional dimension important both in anxiety^{20,26,31,38} and in studies of amygdala^{63}. Relatedly, as our goal is to showcase the range of situations in which parallel issues may arise, we acknowledge that different explanations may exist for many individual results.
Our work builds most directly on a rich line of theoretical and experimental work on the relationship between the volatility and learning rates^{6,8,13,15,27,68,69}. There have been numerous reports of volatility effects on healthy and disordered behavioral and neural responses, often using a twolevel manipulation of volatility like that from Fig. 5a^{6,8,10,13,14,15,16,17,18,19,20,21,22,23,38}. Our modeling suggests that it will be informative to drill deeper into these effects by augmenting this task to cross this manipulation with stochasticity so as more clearly to differentiate these two potential contributors^{24}. For example, in tasks that manipulate (and models that consider) only volatility, it can be seen from Eqs. (1–4) that the timeseries of several quantities all covary together, including the estimated volatility \({v}_{t}\), the posterior uncertainty \({w}_{t}\), and the learning rate \({\alpha }_{t}\). It can therefore be difficult in general to distinguish which of these variables is really driving tantalizing neural correlates related to these processes, for instance in amygdala and dorsal anterior cingulate cortex^{6,57}. The inclusion of stochasticity (which increases uncertainty but decreases learning rate) would help to drive these apart.
Indeed, another related set of learning tasks considered prediction of continuous outcomes corrupted by stochasticity, i.e., additive Gaussian noise^{7,9,25,45,52}, which could provide another foundation for factorial manipulations of the sort we propose. Indeed, a number of these studies (complementary to the volatility studies) included multiple levels of stochasticity and showed learning rate effects^{7,9,25,52,70}. The models used in these studies have largely used a complementary simplification to the volatility one: they estimate stochasticity, but conditional on a known value for the hazard rate (equivalent to volatility). Interestingly, rather than overall adjustment to noise statistics, these studies more explicitly emphasized the detection of discrete changes in the environment and the resulting local adjustments of the learning rate. From a modeling perspective, inference under change at discrete changepoints (occurring at some hazard rate) raises issues quite analogous to change due to more gradual diffusion (with some volatility). Thus, in practice it has been common and effective to apply models for one sort of change to tasks actually involving the other^{6,11,12}, a substitution also in part licensed by approximate models of changepoint detection that (as with the volatility models and the Kalman filter for continuous change) also reduce learning to errordriven updates with a timevarying learning rate^{25,71}. Thus, although we build the current work on a generative model with continuous rather than abrupt changepoints, we do not mean this as a substantive claim, as we expect our main substantive points (concerning the inference about noise vs. change hyperparameters) would play out analogously in other variants. In any case, research into the neural substrates of changepoint detection is highly relevant to the change problem conceived in terms of volatility as well (see^{10} for a recent review).
The current framework’s tendency to elide the distinction between discrete and continuous change (but distinguish both from stochasticity) is also the basis of an important, but subtle, distinction from another prominent dichotomy previously proposed that between “expected” and “unexpected” types of uncertainty^{72,73}. While it might appear that these categories correspond, respectively, to stochasticity and volatility as we define them, that is not actually the case. Formally, this is because the Dayan and Yu model (in its most detailed form, Dayan and Yu^{72}) arises from a Kalman filter augmented with additional discrete changepoints: i.e., both the diffusion and jumps. The focus of that work was distinguishing the special effects of surprising jumps (“unexpected uncertainty”), which were hypothesized to recruit a specialized neural interrupt system. Meanwhile, all other uncertainty arising in the baseline Kalman filter (i.e., that from both stochasticity and volatility; the posterior variance \({w}_{t}\) in Eq. 4) is lumped together under “expected uncertainty.” That said (although we see this as a misreading of the earlier work) our impression is that later authors’ use of these terms actually tends to comport more with our distinction than the original definition^{26}, i.e., to take unexpected and expected uncertainty as synonymous with volatility and stochasticity as we define them. In any case, Yu and Dayan did not consider the problem considered here, of estimating the noise hyperparameters for learning under uncertainty.
The most important feature of our model is the competition it induces between the volatility and stochasticity to “explain away” surprising observations. This leads to a predicted signature of the model in cases of lesion or damage affecting inference about either type of noise: disruption causing neglect of one of the terms leads to overestimation of the other term. For example, if a module responsible for volatility learning were disrupted, the model would overestimate stochasticity, because surprising observations that are due to volatility would be misattributed to the stochasticity. This allowed us to revisit the role of amygdala in associative learning and explain some puzzling findings about its contributions to reversal learning.
Similar explanations grounded in the current model may also be relevant to a number of psychiatric disorders. Abnormalities in uncertainty and inference, broadly, have been hypothesized to play a role in numerous disorders, including especially anxiety and schizophrenia. More specifically, abnormalities in volatilityrelated learning adjustments have been reported in patients or people reporting symptoms of several mental illnesses^{13,14,16,17,18,19,20,21,22,23,38}. The current model provides a more detailed potential framework for better dissecting these effects, though this will ideally require a new generation of experiments manipulating both factors.
In the present work, we have developed these ideas mostly in terms of pathological decision making in anxiety, which is one of the areas where earlier work on volatility estimation has been strongest and where further refinement using our theory seems most promising^{32,35,37,74}. We considered an account by which individuals with anxiety systematically misidentify outcomes occurring due to chance (stochasticity) as instead a signal of change (volatility)^{34}. This account offers a contrary interpretation of a pattern of effects that had been taken to indicate that volatility sensitivity is instead deficient in anxiety^{26}. Although some null effects in the study of Browning et al.^{13} do not support this account, we view it as an overall better account of the pattern of data across several studies^{20,31,33,34}. Our account is also broadly consistent with studies suggesting that individuals with anxiety might feel overwhelmed when faced with uncertainty^{36} and fail to make use of longterm statistical regularities^{34}. These are also hypothesized to be related to symptoms of excessive worry^{29,30}. Misestimating stochasticity—moreso than volatility—also seems consonant with the idea that individuals with anxiety tend to fail to discount negative outcomes occurring by chance (i.e., stochasticity) and instead favor alternative explanations like selfblame^{75}. This hypothesis is also consistent with the observation that acquisition of fear conditioning tends to be enhanced in individuals with anxiety^{76,77}. Finally, although a simple increase in learning rate seems harder to reconcile with generally slower extinction of Pavlovian fear learning in anxiety^{76}, this probably reflects the wellknown fact that extinction is not simply unlearning of the original associations, but instead is dominated by additional processes^{78,79}. This includes in particular statistical inference about latent contexts^{5}, which is likely to be affected by both stochasticity and volatility in ways that should be explored in future work.
More generally, this modeling approach, which quantifies misattribution of stochasticity to volatility and vice versa, might be useful for understanding various other brain disorders that are thought to influence processing of uncertainty and have largely been studied in the context of volatility in the past decade^{14,16,17,19,21,22,27,28}. As another example, positive symptoms in schizophrenia have been argued to result from some alterations in prior vs likelihood processing, perhaps driven by abnormal attribution of uncertainty (or precision) to topdown expectations^{80}. But different such symptoms (e.g., hallucinations vs. delusions) manifest in different patients. One reason may be that these relate to disruption at different levels of a perceptualinferential hierarchy, i.e., with hallucination vs. delusion reflecting involvement of more or less abstract inferential levels, respectively^{81,82,83}. In this respect, the current model may provide a simple and direct comparative test, since stochasticity enters at the perceptual, or outcome, level (potentially associated with hallucination) but volatility acts at the more abstract level of the latent reward (and may be associated with delusion; see Fig. 1).
Our work also touches upon a historical debate in the associative learning literature about the role of outcome stochasticity (i.e., in our terms, noise) in learning. One class of theories, most prominently represented by Mackintosh^{39}, proposes that attention is preferentially allocated to cues that are most reliably predictive of outcomes, whereas Pearce and Hall^{62} suggest the opposite that attention is attracted to surprising misprediction. We address only a subset of the experimental phenomena involved in this debate (those involving learning rates for cues presented alone), but for this subset we offer a very clear resolution of the apparent conflict. Our approach and goals also differ from classic work in this area. A number of important models of attention in psychology also attempt to reconcile these theories by providing more phenomenological models that hybridize the two theories to account for various and often paradoxical experimental work^{84,85,86,87}. Our goal is different and is more descended from a tradition of normative theories that provide a computational understanding of psychological phenomena from first principles by first addressing what is the computational problem that the corresponding neural system is evolved to solve^{2,88}.
Any probabilistic model relies on a set of explicit assumptions about how observations have been generated, i.e., a generative model, and also an inference procedure to estimate the hidden parameters that are not directly observable. Such inference algorithms typically reflect some approximation strategy because exact inference is not possible for most important problems, including our generative model (Fig. 1). In previous work in this area, we and others have relied on variational approaches to approximate inference, which factors difficult inference problems into smaller tractable ones, and approximates the answer as though they were independent^{11,12}. Interestingly, although one of the most promising successes of this approach in neuroscience has been in hierarchical Kalman filters with volatility inference, we found it difficult to develop an effective variational filter for the current problem, when stochasticity is unknown. The core problem, in our hands, was that effective explaining away between the two noise types was difficult to achieve using simplified variational posteriors that omitted aspects of their mutual dependency.
Interestingly, there are other algorithms that, in principle, address similar learning problems. These include using an explicitly variational approach extending the HGF (code is publicly available as hgf_jget in the TAPAS toolbox^{89}, but has not been documented or tested in published articles), augmenting the variational HGF with mixture models^{43}, an analogous simplified learning rule based more on neural considerations^{44}, and an exact model for tracking hazard rates under a particular case of changepoint detection^{45}. While these have not yet been applied to the full range of problems we investigate here, we suspect that future work investigating the approximate approaches will find challenges in explaining away. In any case, in the current modeling, we have adopted a different estimation method based on Monte Carlo sampling, in particular a variant of particle filtering that preserves many of the advantages of variational methods by incorporating exact conditional inference for a subset of variables^{49}. The inference model employed here combines Kalman filtering for estimation of reward rate^{41} conditional on the volatility and stochasticity, with particle filtering for inference about these^{50}. One drawback of the particle filter, however, is that it requires tracking a number of samples on every trial. In practice, we found that a handful (e.g., 100) of particles results in a sufficiently good approximation.
Finally, in this study, we only modeled the effects of volatility and stochasticity on learning rate. However, uncertainty affects many different problems beyond learning rate, and a full account of how subjects infer volatility and stochasticity (and how these, in turn, affect uncertainty) may have ramifications for many other behaviors. Thus, there have been important statistical accounts of a number of such problems, but most of them have neglected either stochasticity or volatility, and none of them have explicitly considered the effects of learning the levels of these types of noise. These problems include cue or featureselective attention^{2}; the exploreexploit dilemma^{90,91}; and the partition of experience into latent states, causes or contexts^{5,79,92}. The current model, or variants of it, is more or less directly applicable to all these problems and should imply predictions about the effects of manipulating either type of noise across many different behaviors.
Methods
Description of the model
Recall that outcome on trial \(t\), \({o}_{t}\), in our model depends on three latent variables, the reward rate, stochasticity and volatility. The reward rate on trial \(t\), \({x}_{t}\), has Markovstructure dynamics:
where \({e}_{t}\) is a (zeromean) Gaussian noise with variance given by volatility. Therefore, we have:
where \({v}_{t}\) is volatility. We define the inverse volatility, \({z}_{t}={v}_{t}^{1}\), which is the preferred formulation here as it has been used in previous studies for its analytical plausibility^{12}. Outcomes were generated based on the reward rate and stochasticity according to a Gaussian distribution:
where \({s}_{t}\) is the stochasticity with \({y}_{t}={s}_{t}^{1}\).
For volatility and stochasticity, we assumed a multiplicative noise on their inverse, which is an approach that has been shown to give rise to analytical inference when considered in isolation (but not here)^{93,94}. Specifically, the dynamics over these variables are given by \({z}_{t}={{\eta }_{v}}^{1}{z}_{t1}{\epsilon }_{t},\) where \({0 {\,} < {\,}\eta }_{v} {\,} < {\,}1\) is a constant and \({\epsilon }_{t}\) is a random variable in the unit range with a Betadistribution \(p\left({{\epsilon }_{t}}\right)={\rm {B}}\left({{\epsilon }_{t}},,0.5{{\eta }_{v}}{(1{{\eta }_{v}})}^{1},0.5\right).\) Note that the conditional expectation of \({z}_{t}\) is given by \({z}_{t1}\), because \(E\left[{\epsilon }_{t}\right]={\eta }_{v}\). We assume a similar and independent dynamics for \({y}_{t}\) parametrized by the constant \({\eta }_{s}\): \({y}_{t}={{{\eta }_{s}}^{1}y}_{t1}{\varepsilon }_{t}\), in which \({\varepsilon }_{t}\) has a similar distribution to \({\epsilon }_{t}\) parametrized by \({\eta }_{s}\).
In our implementation, we parametrized the model with \({\lambda }_{v}=1{\eta }_{v}\) and \({\lambda }_{s}=1{\eta }_{s}\), respectively. This is because these parameters can be interpreted as the update rate for volatility and stochasticity, respectively. In other words, larger values of \({\lambda }_{v}\) and \({\lambda }_{s}\) result in faster update of volatility and stochasticity, respectively. Intuitively, this is because a smaller \({\lambda }_{v}\) increases the mean of \({\epsilon }_{t}\) and results in a larger update of \({z}_{t}\). Since volatility is the inverse of \({z}_{t}\), therefore, smaller \({\lambda }_{v}\) results in slower update of volatility. This has been formally shown in our recent work^{12}. In addition to these two parameters, this generative process depends on initial value of volatility and stochasticity, \({v}_{0}\) and \({s}_{0}\).
For inference, we employed a RaoBlackwellised Particle Filtering approach^{49}, in which the inference about \({v}_{t}\) and \({s}_{t}\) were made by a particle filter^{50} and, conditional on these, the inference over \({x}_{t}\) was given by the Kalman filter (i.e., Equations (1–4)). The particle filter is a Monte Carlo sequential importance sampling method, which keeps track of a set of particles (i.e., samples). The algorithm performs three steps on each trial. First, in a prediction step, each particle is transitioned to the next step based on the generative process. Second, weights of each particle are updated based on the probability of observed outcome:
where \({b}_{t}^{l}\) is the weight of particle \(l\) on trial \(t\), \({m}_{t1}^{l}\) and \({w}_{t1}^{l}\) are estimated mean and variance by the Kalman filter on the previous trial (Eqs. 1–4), and \({v}_{t}^{l}\) and \({s}_{t}^{l}\) are volatility and stochasticity samples (i.e., the inverse of \({z}_{t}^{l}\) and \({y}_{t}^{l}\)). In this step, particles were also resampled using the systematic resampling procedure if the ratio of effective to total particles falls below 0.5. In the third step, the Kalman filter was used to update the mean and variance. In particular, for every particle, Eqs. 1–4 were used to define \({\alpha }_{t}^{l}\) and update \({m}_{t}^{l}\) and \({w}_{t}^{l}\) for every particle. Learning rate and estimated reward rate on trial \(t\) was then defined as the weighted average of all particles, in which the weights were given by \({b}_{t}^{l}\). We have used particle filter routines implemented in MATLAB.
Finally, we should note that our results are not dependent on the specific generative process that we have assumed here. In particular, it is possible to define a generative process that diffuses according to Gaussian noise. In such a generative model, random variables related to volatility and stochasticity diffuse according to independent Gaussian random walks:
where volatility and stochasticity are respectively defined as \({{v}_{t}}={\exp} \left({{z}_{t}}\right)\) and \({{s}_{t}}={\exp}({{y}_{t}})\), and \({{\sigma }_{v}}\) and \({{\sigma}_{s}}\) are model parameters that play analogous role as \({\lambda }_{v}\) and \({\lambda }_{s}\) above, respectively. The reward rate and outcomes are then generated based on the same process as the previous generative model (Eqs. 5–7). Inference about reward rate also remains the same (Eqs. 1–4). Our simulations show that, as long as the particle filter was used for inference about volatility and stochasticity, such a process can successfully recover true unknown volatility and stochasticity (Supplementary Fig. 3).
Simulation details
In simulations related to Figs. 1–3, timeseries were generated according to the Markov random walk with constant volatility and stochasticity. For these simulations, we assumed \({\lambda }_{v}=0.1\) and \({\lambda }_{s}=0.1\); \({v}_{0}=1\) (average over small and large true volatility) and \({s}_{0}=2\) (average over small and large true stochasticity). For lesioned models in Fig. 3, the corresponding lesion variable was assumed to be fixed at its initial value throughout the task.
For the conditioned suppression experiment presented in Fig. 4, the weak and strong shock was 0.3 and 1, respectively, plus a small noise with variance of 10^{2}. The noise for the partial reinforcement experiment was assumed to be 10^{4}. 100 trials were used for training. We assumed 5 omission trials for the omission condition of conditioned suppression experiment. Model parameters in Fig. 4 were \({\lambda }_{v}={\lambda }_{s}=0.2\), and \({v}_{0}={s}_{0}=0.1\). For corresponding Supplementary Figs. 4–5, the response probability of the model was calculated based on a softmax with a decision noise parameter of 5.
Reward rate in Fig. 5a was 0.8 in the stable block and switching between 0.25 and 0.75 in the volatile block with the outcome variance of 0.01. For simulations presented in Fig. 5ad, model parameters were similar to those used for simulations in Fig. 4 and the stochasticity for the lesioned models was assumed to be 0.001 (note that outcomes were not binary in this simulation). Volatile condition in Fig. 5e, f was defined as trials with no contingency switch in their preceding 10 trials, similar to Piray et al.^{20}. For the simulation presented in Fig. 5h, we followed Huang et al.^{34} who fitted a number of reinforcement learning models to choice data, in which they simplified the task to its core features that are directly related to reinforcement learning. Furthermore, we made a further simplification here by considering only two choices. Probability of reward in tasks by Piray et al. and Huang et al. presented in Fig. 5 are plotted in Supplementary Fig. 6. Outcome variance was assumed to be 0.01. Model parameters were similar to those used for simulations in Fig. 4. The stochasticity for the lesioned models in these two simulations were assumed to be 0.05. For simulating choice, we used the softmax with a decision noise of 3.
In Fig. 6, the task was a probabilistic switching task with stable and volatile blocks similar to the task of Browning et al.^{13}. Outcome variance was assumed to be 0.01. The median correlation presented in the inset of Fig. 6b is the Spearman rank correlation across 1000 sets of simulations. For each set, 30 artificial subjects were generated, which only differed in their volatility and stochasticityupdate rate parameters. To have a relatively uniform model trait anxiety (i.e., volatility to stochasticity update rate), every set was further divided to 3 subsets (each containing 10 artificial subjects), in which the mean of model trait anxiety was 0.5, 1, 3, respectively. We further ensured that the model trait anxiety is greater than 0.26 and smaller than 4. These values were chosen to relatively reflect the distribution of trait anxiety in Browning et al.’s^{13} data (Fig. 6a). Furthermore, the volatility update rate was drawn randomly between 0 and 0.2 and the stochasticity update rate was calculated according to the model trait anxiety. A fixed and small initial volatility and stochasticity was used for all artificial subjects (\({v}_{0}={s}_{0}=0.001\)).
For simulating the experiment in Fig. 7, reward timeseries was generated with a very small outcome variance, 10^{6}. Here, the model was trained to predict both the tone (given the light) and the reward (given the light) on every trial. Model parameters were \({\lambda }_{v}={\lambda }_{s}=0.2\), and \({v}_{0}={s}_{0}=0.5\). Volatility was assumed for the lesioned model to be small and fixed (0.25e6). Figure 7b shows the average learning rate on the last trial of phase 2 (i.e., the first trial of the test) for the first cue across all simulations. Figure 7c–f shows volatility and stochasticity signals for the first cue. For simulation of the reversal task in Fig. 8, a small outcome variance similar, 10^{6}, was used for generating outcomes. Model parameters were the same as those used in Fig. 7. Volatility for the lesioned model was assumed to be small and fixed at 0.01. We have used softmax with a decision noise parameter as the choice model. We assumed that the decision noise parameter is 3 and 1 for the control and lesioned animals, respectively. These parameters were used to reproduce the general reduction of performance in lesioned animals, which is independent of the difference between the deterministic vs stochastic task in the two groups explained by our learning model.
For Supplementary Fig. 2, we followed the design of the prediction task by Nassar et al.^{25}. The timeseries were generated according to a hidden reward rate plus a noise term, in which the variance of noise was either one (small) or nine (large). The reward rate was subject to a random jump in the range of 0–10. Change points occurred after at least five trials plus a random draw according to an exponential distribution with a rate of 0.05 (i.e., mean 20). The initial volatility and stochasticity were both assumed to be five (average over small and large true stochasticity). We further assumed that \({\lambda }_{v}=0.4\) and \({\lambda }_{s}=0.2\), which reflects the instructions given to subjects about possible jumps in the underlying reward rate. Simulation and model parameters in Supplementary Figure 1 were the same as those in Fig. 5a. For all simulations, we have assumed initial reward rate prior to be a Gaussian with mean 0 and variance 100 (Figs. 2–3 and Supplementary Figure 3) or 1 (all other simulations). Simulations were repeated sufficiently to have negligible sampling error (i.e. invisible standard error of the mean). Thus, simulations presented in Figs. 1, 2, 3 and Supplementary Fig. 3 were repeated 10,000 times; conditioning simulations presented in Figs. 4 and 7 were repeated 40000 times; and all other simulations were repeated 1000 times. All simulations were conducted with 100 particles.
Reporting Summary
Further information on research design is available in the Nature Research Reporting Summary linked to this article.
Data availability
Simulation data are publicly available at https://doi.org/10.5281/zenodo.5526668^{95}. Data by Piray et al.^{20} presented in Fig. 5e are publicly available at https://github.com/payampiray/piray_etal_2019_JNeurosci. Data by Browning et al.^{13} plotted in Fig. 6a are publicly available as a Source Data file to the corresponding paper. Source data are provided with this paper.
Code availability
All simulations were conducted using custom code written in MATLAB (2018a). Codes are available at https://doi.org/10.5281/zenodo.5526668^{95}.
References
Dayan, P. & Long, T. Statistical Models of Conditioning. In Advances in Neural Information Processing Systems 10 (eds, Jordan, M., Kearns, M. & Solla, S.) 117–123 (MIT Press, 1998).
Dayan, P., Kakade, S. & Montague, P. R. Learning and selective attention. Nat. Neurosci. 3, 1218–1223 (2000).
Courville, A. C., Daw, N. D. & Touretzky, D. S. Bayesian theories of conditioning in a changing world. Trends Cogn. Sci. (Regul. Ed.) 10, 294–300 (2006).
Daunizeau, J. et al. Observing the observer (I): metabayesian models of learning and decisionmaking. PLoS ONE 5, e15554 (2010).
Gershman, S. J., Blei, D. M. & Niv, Y. Context, learning, and extinction. Psychol. Rev. 117, 197–209 (2010).
Behrens, T. E. J., Woolrich, M. W., Walton, M. E. & Rushworth, M. F. S. Learning the value of information in an uncertain world. Nat. Neurosci. 10, 1214–1221 (2007).
Nassar, M. R. et al. Rational regulation of learning dynamics by pupillinked arousal systems. Nat. Neurosci. 15, 1040–1046 (2012).
Iglesias, S. et al. Hierarchical prediction errors in midbrain and basal forebrain during sensory learning. Neuron 80, 519–530 (2013).
McGuire, J. T., Nassar, M. R., Gold, J. I. & Kable, J. W. Functionally dissociable influences on learning rate in a dynamic environment. Neuron 84, 870–881 (2014).
Soltani, A. & Izquierdo, A. Adaptive learning under expected and unexpected uncertainty. Nat. Rev. Neurosci. 20, 635–644 (2019).
Mathys, C., Daunizeau, J., Friston, K. J. & Stephan, K. E. A bayesian foundation for individual learning under uncertainty. Front Hum. Neurosci. 5, 39 (2011).
Piray, P. & Daw, N. D. A simple model for learning in volatile environments. PLoS Comput. Biol. 16, e1007963 (2020).
Browning, M., Behrens, T. E., Jocham, G., O’Reilly, J. X. & Bishop, S. J. Anxious individuals have difficulty learning the causal statistics of aversive environments. Nat. Neurosci. 18, 590–596 (2015).
Brazil, I. A., Mathys, C. D., Popma, A., Hoppenbrouwers, S. S. & Cohn, M. D. Representational uncertainty in the brain during threat conditioning and the link with psychopathic traits. Biol. Psychiatry Cogn. Neurosci. Neuroimaging 2, 689–695 (2017).
Farashahi, S. et al. Metaplasticity as a neural substrate for adaptive learning and choice under uncertainty. Neuron 94, 401–414.e6 (2017).
Lawson, R. P., Mathys, C. & Rees, G. Adults with autism overestimate the volatility of the sensory environment. Nat. Neurosci. 20, 1293–1299 (2017).
Powers, A. R., Mathys, C. & Corlett, P. R. Pavlovian conditioninginduced hallucinations result from overweighting of perceptual priors. Science 357, 596–600 (2017).
Katthagen, T. et al. Modeling subjective relevance in schizophrenia and its relation to aberrant salience. PLoS Comput. Biol. 14, e1006319 (2018).
Paliwal, S. et al. Subjective estimates of uncertainty during gambling and impulsivity after subthalamic deep brain stimulation for Parkinson’s disease. Sci. Rep. 9, 14795 (2019).
Piray, P., Ly, V., Roelofs, K., Cools, R. & Toni, I. Emotionally aversive cues suppress neural systems underlying optimal learning in socially anxious individuals. J. Neurosci. 39, 1445–1456 (2019).
Cole, D. M. et al. Atypical processing of uncertainty in individuals at risk for psychosis. NeuroImage: Clin. 26, 102239 (2020).
Deserno, L. et al. Volatility estimates increase choice switching and relate to prefrontal activity in Schizophrenia. Biol. Psychiatry.: Cogn. Neurosci. Neuroimaging 5, 173–183 (2020).
Diaconescu, A. O., Wellstein, K. V., Kasper, L., Mathys, C. & Stephan, K. E. Hierarchical Bayesian models of social inference for probing persecutory delusional ideation. J. Abnorm. Psychol. 129, 556–569 (2020).
Lee, S., Gold, J. I. & Kable, J. W. The human as deltarule learner.  PsycNET. Decision 7, 55–66 (2020).
Nassar, M. R., Wilson, R. C., Heasly, B. & Gold, J. I. An approximately Bayesian deltarule model explains the dynamics of belief updating in a changing environment. J. Neurosci. 30, 12366–12378 (2010).
Pulcu, E. & Browning, M. The Misestimation of Uncertainty in Affective Disorders. Trends Cogn. Sci. 23, 865–875 (2019).
Diaconescu, A. O. et al. Inferring on the Intentions of Others by Hierarchical Bayesian Learning. PLoS Comput. Biol. 10, e1003810 (2014).
Reed, E. J. et al. Paranoia as a deficit in nonsocial belief updating. eLife 9, (2020).
Dugas, M. J., Gagnon, F., Ladouceur, R. & Freeston, M. H. Generalized anxiety disorder: a preliminary test of a conceptual model. Behav. Res Ther. 36, 215–226 (1998).
Ladouceur, R., Gosselin, P. & Dugas, M. J. Experimental manipulation of intolerance of uncertainty: a study of a theoretical model of worry. Behav. Res Ther. 38, 933–941 (2000).
Aylward, J. et al. Altered learning under uncertainty in unmedicated mood and anxiety disorders. Nat. Hum. Behav. 3, 1116–1123 (2019).
Gagne, C., Dayan, P. & Bishop, S. J. When planning to survive goes wrong: predicting the future and replaying the past in anxiety and PTSD. Curr. Opin. Behav. Sci. 24, 89–95 (2018).
Harlé, K. M., Guo, D., Zhang, S., Paulus, M. P. & Yu, A. J. Anhedonia and anxiety underlying depressive symptomatology have distinct effects on rewardbased decisionmaking. PLOS ONE 12, e0186473 (2017).
Huang, H., Thompson, W. & Paulus, M. P. Computational dysfunctions in anxiety: failure to differentiate signal from noise. Biol. Psychiatry 82, 440–446 (2017).
Huys, Q. J. M., Daw, N. D. & Dayan, P. Depression: a decisiontheoretic analysis. Annu. Rev. Neurosci. 38, 1–23 (2015).
Luhmann, C. C., Ishida, K. & Hajcak, G. Intolerance of uncertainty and decisions about delayed, probabilistic rewards. Behav. Ther. 42, 378–386 (2011).
Paulus, M. P. & Yu, A. J. Emotion and decisionmaking: affectdriven belief systems in anxiety and depression. Trends Cogn. Sci. 16, 476–483 (2012).
Pulcu, E. & Browning, M. Affective bias as a rational response to the statistics of rewards and punishments. eLife 6, e27879 (2017).
Mackintosh, N. J. A theory of attention: Variations in the associability of stimuli with reinforcement. Psychological Rev. 82, 276–298 (1975).
Pearce, J. M. & Hall, G. A model for Pavlovian learning: variations in the effectiveness of conditioned but not of unconditioned stimuli. Psychol. Rev. 87, 532–552 (1980).
Kalman, R. E. A New Approach to Linear Filtering and Prediction Problems. Trans. ASME–J. Basic Eng. 82, 35–45 (1960).
Kakade, S. & Dayan, P. Acquisition and extinction in autoshaping. Psychol. Rev. 109, 533–544 (2002).
Moens, V. & Zénon, A. Learning and forgetting using reinforced Bayesian change detection. PLoS Comput. Biol. 15, e1006713 (2019).
Silvetti, M., Vassena, E., Abrahamse, E. & Verguts, T. Dorsal anterior cingulatebrainstem ensemble as a reinforcement metalearner. PLOS Computational Biol. 14, e1006370 (2018).
Wilson, R. C., Nassar, M. R. & Gold, J. I. Bayesian online learning of the hazard rate in changepoint problems. Neural Comput 22, 2452–2476 (2010).
Griffiths, T. L., Navarro, D. J. & Sanborn, A. N. A More Rational Model of Categorization. Proceedings of the Annual Meeting of the Cognitive Science Society 28, (2006).
Daw, N. D. & Courville, A. C. The rat as particle filter. in Advances in Neural Information Processing Systems 20 (eds. Platt, J. C., Koller, D., Singer, Y. & Roweis, S. T.) 369–376 (Curran Associates, Inc., 2008).
Brown, S. D. & Steyvers, M. Detecting and predicting changes. Cogn. Psychol. 58, 49–67 (2009).
Doucet, A., Freitas, N. de, Murphy, K. P. & Russell, S. J. RaoBlackwellised Particle Filtering for Dynamic Bayesian Networks. in Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence 176–183 (Morgan Kaufmann Publishers Inc., 2000).
Doucet, A. & Johansen, A. M. A tutorial on particle filtering and smoothing: fifteen years later (Oxford University Press, 2011).
Behrens, T. E. J., Hunt, L. T., Woolrich, M. W. & Rushworth, M. F. S. Associative learning of social value. Nature 456, 245–249 (2008).
Nassar, M. R. et al. Age differences in learning emerge from an insufficient representation of uncertainty in older adults. Nat Commun 7, 11609 (2016).
Hall, G. & Pearce, J. M. Restoring the associability of a preexposed CS by a surprising event. Q. J. Exp. Psychol. Sect. B 34, 127–140 (1982).
Gibbon, J., Farrell, L., Locurto, C. M., Duncan, H. J. & Terrace, H. S. Partial reinforcement in autoshaping with pigeons. Anim. Learn. Behav. 8, 45–59 (1980).
Rescorla, R. A. Withinsubject partial reinforcement extinction effect in autoshaping. Q. J. Exp. Psychol. B: Comp. Physiological Psychol. 52B, 75–87 (1999).
Haselgrove, M., Aydin, A. & Pearce, J. M. A partial reinforcement extinction effect despite equal rates of reinforcement during Pavlovian conditioning. J. Exp. Psychol. Anim. Behav. Process 30, 240–250 (2004).
Li, J., Schiller, D., Schoenbaum, G., Phelps, E. A. & Daw, N. D. Differential roles of human striatum and amygdala in associative learning. Nat. Neurosci. 14, 1250–1252 (2011).
Gallistel, C. R. & Gibbon, J. Time, rate, and conditioning. Psychol. Rev. 107, 289–344 (2000).
Phelps, E. A., Lempert, K. M. & SokolHessner, P. Emotion and decision making: multiple modulatory neural circuits. Annu. Rev. Neurosci. 37, 263–287 (2014).
Averbeck, B. B. & Costa, V. D. Motivational neural circuits underlying reinforcement learning. Nat. Neurosci. 20, 505–512 (2017).
Holland, P. C. & Gallagher, M. Amygdala circuitry in attentional and representational processes. Trends Cogn. Sci. (Regul. Ed.) 3, 65–73 (1999).
Roesch, M. R., Esber, G. R., Li, J., Daw, N. D. & Schoenbaum, G. Surprise! neural correlates of PearceHall and RescorlaWagner Coexist within the Brain. Eur. J. Neurosci. 35, 1190–1200 (2012).
Costa, V. D., Dal Monte, O., Lucas, D. R., Murray, E. A. & Averbeck, B. B. Amygdala and Ventral Striatum Make Distinct Contributions to Reinforcement Learning. Neuron 92, 505–517 (2016).
Homan, P. et al. Neural computations of threat in the aftermath of combat trauma. Nat. Neurosci. 22, 470–476 (2019).
Holland, P. C. & Gallagher, M. Amygdala central nucleus lesions disrupt increments, but not decrements, in conditioned stimulus processing. Behav. Neurosci. 107, 246–253 (1993).
Holland, P. C. & Schiffino, F. L. Minireview: prediction errors, attention and associative learning. Neurobiol. Learn Mem. 131, 207–215 (2016).
Hampton, A. N., Adolphs, R., Tyszka, M. J. & O’Doherty, J. P. Contributions of the amygdala to reward expectancy and choice signals in human prefrontal cortex. Neuron 55, 545–555 (2007).
de Berker, A. O. et al. Computations of uncertainty mediate acute stress responses in humans. Nat. Commun. 7, 10996 (2016).
Khorsand, P. & Soltani, A. Optimal structure of metaplasticity for adaptive learning. PLoS Comput. Biol. 13, e1005630 (2017).
Diederen, K. M. J. & Schultz, W. Scaling prediction errors to reward variability benefits errordriven learning in humans. J. Neurophysiol. 114, 1628–1640 (2015).
Wilson, R. C., Nassar, M. R. & Gold, J. I. A Mixture of DeltaRules Approximation to Bayesian Inference in ChangePoint Problems. PLOS Computational Biol. 9, e1003150 (2013).
Dayan, P. & Yu, A. Uncertainty and Learning. IETE J. Res. 49, 171–181 (2003).
Yu, A. J. & Dayan, P. Uncertainty, neuromodulation, and attention. Neuron 46, 681–692 (2005).
Hartley, C. A. & Phelps, E. A. Anxiety and decisionmaking. Biol. Psychiatry 72, 113–118 (2012).
Beck, A. T. Depression: Causes and Treatment. (University of Pennsylvania Press, 1970).
Duits, P. et al. Updated metaanalysis of classical fear conditioning in the anxiety disorders. Depress Anxiety 32, 239–253 (2015).
Lissek, S. et al. Classical fear conditioning in the anxiety disorders: a metaanalysis. Behav. Res Ther. 43, 1391–1424 (2005).
Bouton, M. E. Context and behavioral processes in extinction. Learn Mem. 11, 485–494 (2004).
Redish, A. D., Jensen, S., Johnson, A. & KurthNelson, Z. Reconciling reinforcement learning models with behavioral extinction and renewal: implications for addiction, relapse, and problem gambling. Psychol. Rev. 114, 784–805 (2007).
Stephan, K. E., Baldeweg, T. & Friston, K. J. Synaptic plasticity and dysconnection in schizophrenia. Biol. Psychiatry 59, 929–939 (2006).
Baker, S. C., Konova, A. B., Daw, N. D. & Horga, G. A distinct inferential mechanism for delusions in schizophrenia. Brain 142, 1797–1812 (2019).
Horga, G. & AbiDargham, A. An integrative framework for perceptual disturbances in psychosis. Nat. Rev. Neurosci. 20, 763–778 (2019).
Wengler, K., Goldberg, A., Chahine, G. & Horga, G. Hallucinations and Delusions Relate to Distinct Hierarchical Alterations in Intrinsic Neural Timescales. Biol. Psychiatry 87, S179–S180 (2020).
Le Pelley, M. E. The role of associative history in models of associative learning: a selective review and a hybrid model. Q J. Exp. Psychol. B 57, 193–243 (2004).
Haselgrove, M., Esber, G. R., Pearce, J. M. & Jones, P. M. Two kinds of attention in Pavlovian conditioning: evidence for a hybrid model of learning. J. Exp. Psychol. Anim. Behav. Process 36, 456–470 (2010).
Pearce, J. M. & Mackintosh, N. J. Two theories of attention: a review and a possible integration. in Attention and Associative Learning: From Brain to Behaviour (eds. Mitchell, C. & Le Pelley, M. E.) 11–39 (Oxford University Press, 2010).
Le Pelley, M. E., Mitchell, C. J., Beesley, T., George, D. N. & Wills, A. J. Attention and associative learning in humans: An integrative review. Psychol. Bull. 142, 1111–1140 (2016).
Marr, D. Vision: A Computational Investigation into the Human Representation and Processing of Visual Information. (W. H. Freeman and Company, 1982).
Aponte, E. et al. TAPAS  Translational Algorithms for PsychiatryAdvancing Science. Front. Psychiatry. https://doi.org/10.3389/fpsyt.2021.680811 (2020).
Gittins, J. C. Bandit Processes and Dynamic Allocation Indices. J. R. Stat. Soc. Ser. B (Methodol.) 41, 148–177 (1979).
Daw, N. D., O’Doherty, J. P., Dayan, P., Seymour, B. & Dolan, R. J. Cortical substrates for exploratory decisions in humans. Nature 441, 876–879 (2006).
Wilson, R. C., Takahashi, Y. K., Schoenbaum, G. & Niv, Y. Orbitofrontal cortex as a cognitive map of task space. Neuron 81, 267–279 (2014).
West, M. On Scale Mixtures of Normal Distributions. Biometrika 74, 646–648 (1987).
Gamerman, D., dos Santos, T. R. & Franco, G. C. A NonGaussian Family of StateSpace Models with Exact Marginal Likelihood. J. Time Ser. Anal. 34, 625–645 (2013).
Piray, P. & Daw, N. D. A model for learning based on the joint estimation of stochasticity and volatility. Zenodo, https://doi.org/10.5281/zenodo.5526668 (2021).
Acknowledgements
We thank Sam Zorowitz, Peter Dayan, Yoel Sanchez Araujo, and Guillermo Horga for helpful discussions. This work was supported by grants IIS1822571 from the National Science Foundation, part of the CRNCS program, and 61454 from the John Templeton Foundation.
Author information
Authors and Affiliations
Contributions
P.P. and N.D.D. designed the study and wrote the manuscript. P.P. performed analyses.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Peer review information Nature Communications thanks the anonymous reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Source data
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Piray, P., Daw, N.D. A model for learning based on the joint estimation of stochasticity and volatility. Nat Commun 12, 6587 (2021). https://doi.org/10.1038/s41467021267319
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41467021267319
This article is cited by

Mechanisms linking social media use to adolescent mental health vulnerability
Nature Reviews Psychology (2024)

Specifying the timescale of early life unpredictability helps explain the development of internalising and externalising behaviours
Scientific Reports (2024)

Understanding the development of reward learning through the lens of metalearning
Nature Reviews Psychology (2024)

Blocked training facilitates learning of multiple schemas
Communications Psychology (2024)

Curiosity: primate neural circuits for novelty and information seeking
Nature Reviews Neuroscience (2024)
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.