A model for learning based on the joint estimation of stochasticity and volatility

Previous research has stressed the importance of uncertainty for controlling the speed of learning, and how such control depends on the learner inferring the noise properties of the environment, especially volatility: the speed of change. However, learning rates are jointly determined by the comparison between volatility and a second factor, moment-to-moment stochasticity. Yet much previous research has focused on simplified cases corresponding to estimation of either factor alone. Here, we introduce a learning model, in which both factors are learned simultaneously from experience, and use the model to simulate human and animal data across many seemingly disparate neuroscientific and behavioral phenomena. By considering the full problem of joint estimation, we highlight a set of previously unappreciated issues, arising from the mutual interdependence of inference about volatility and stochasticity. This interdependence complicates and enriches the interpretation of previous results, such as pathological learning in individuals with anxiety and following amygdala damage.


Introduction
Among the successes of computational neuroscience is a level-spanning account of learning and conditioning, which has grounded biological plasticity mechanisms (specifically, error-driven updating) in terms of a normative analysis of the problem faced by the organism (Courville et al., 2006;Daunizeau et al., 2010;Dayan and Long, 1998;Dayan et al., 2000;Gershman et al., 2010).These models recast learning as statistical inference: using experience to estimate the amount of some outcome (e.g., food) expected on average following some cue or action.This is an important subproblem of reinforcement learning, which uses such value estimates to guide choice.The statistical framing has motivated an influential program of investigating the brain's mechanisms for tracking uncertainty about its beliefs, and how these impact learning (Behrens et al., 2007;Iglesias et al., 2013;McGuire et al., 2014;Nassar et al., 2012;Soltani and Izquierdo, 2019).This program has shed particular light on the principles governing the rate of learning at each step -that is, the degree one should update one's beliefs in light of each new outcome.The statistical analysis implies that the learning rate, like all Bayesian cue combination problems, should depend on how uncertain those (prior) beliefs were balanced against the statistical noisiness of the new evidence (the likelihood).Much previous work has focused on the role of the prior uncertainty in this learning rate control.In this article, we focus attention on the second, relatively neglected half of this comparison, to ask how organisms learn to assess the noisiness of outcomes, which we call unpredictability, during learning.
The connection between uncertainty and learning rate has inspired an influential series of hierarchical Bayesian inference models, elaborating a baseline model known as the Kalman filter (Behrens et al., 2007;Mathys et al., 2011;Piray and Daw, 2020).A key feature of these models is that learning occurs over multiple trials, so the updated posterior at one step becomes the prior at the next -with the important additional twist that uncertainty increases at each step due to the possibility that the true value has changed.The hierarchical models extend the Kalman filter to also estimate the rate of such change, called volatility.All else equal, when volatility is higher, the organism is more uncertain about the cue's value (because the true value will on average have fluctuated more following each observation), and so the learning rate (the reliance on each new outcome) should be higher.A series of experiments have reported behavioral and neural signatures of these volatility effects on learning rate, and also their disruption in relation to psychiatric symptoms (Behrens et al., 2007;Brazil et al., 2017;Browning et al., 2015;Cole et al., 2020;Deserno et al., 2020;Diaconescu et al., 2020;Farashahi et al., 2017;Iglesias et al., 2013;Katthagen et al., 2018;Lawson et al., 2017;Paliwal et al., 2019;Piray et al., 2019;Powers et al., 2017;Soltani and Izquierdo, 2019).However, volatility is only one of two noise parameters in the underlying Kalman filter; the second is unpredictability, which controls how noisy are each of the outcomes (the width of the likelihood) individually.This type of noise also affects the learning rate, but in the opposite direction: all else equal, when individual outcomes are more unpredictable, they are less informative about the cue's true value and the learning rate, in turn, should be smaller.
Although unpredictability therefore plays an equally important and symmetric role as does volatility, the line of research that extends the Kalman filter hierarchically to investigate how organisms estimate volatility has simply assumed that unpredictability is fixed and known.Another related line of studies has shown that humans are sensitive to unpredictability when it is manipulated, but has not specifically modeled how people estimate it from experience (McGuire et al., 2014;Nassar et al., 2010Nassar et al., , 2012)).Accordingly, the potential effects of unpredictability on learning and choice have been largely overlooked, and in particular the question how the brain estimates unpredictability (and thus also volatility and uncertainty, which in the general case depend on it) remains unaddressed.In this study, we fill this gap by developing a hierarchical model that simultaneously infers both quantities during learning.We then study its behavior in reinforcement learning tasks and its relationship to previous studies.
The model sheds new light on experimental phenomena of learning rates in conditioning, and on two classic descriptive theories of conditioning in psychology that have been interpreted as predecessors of the statistical accounts.The theories of Mackintosh (Mackintosh, 1975) vs Pearce and Hall (Pearce and Hall, 1980) claim in apparent contradiction that animals pay, respectively, either more or less attention to cues that are more reliably predictive of outcomes.Although these two theories are described loosely in terms of "attention," one of the key operational counterparts of such attention (and the one relevant to the current theory) is faster learning about those cues.Indeed, different experimental procedures seem to produce either faster or slower learning when outcomes are noisier, supporting either theory.This has led to long-lasting discussions and further descriptive models of attention hybridizing the two theories (Le Pelley, 2004;Le Pelley et al., 2016;Pearce and Mackintosh, 2010).Previous work with the statistical models has also proposed to resolve this conundrum by identifying the theories with different aspects of the broad term "attention," and in particular by arguing that Mackintosh's logic (less attention to poor predictors) applies not to learning rates but instead when making predictions on the basis of multiple, competing cues (Dayan et al., 2000).Here, motivated by the work on volatility, we focus strictly on learning rates for a single cue or action.In this case, we argue, seemingly contradictory Mackintosh-vs.Pearce-Hall-type effects (slower or faster learning under noise) can be explained by learning rate modulation due to inferred unpredictability vs. volatility, respectively.On this view, different effects will dominate in different experiments depending on which parameter the pattern of the noise suggests.Indeed, a core question raised by our analysis is how the brain manages to distinguish volatility from unpredictability, and when it might confuse them.Although the learner's estimates of them play opposite roles in learning -and thus, telling them apart is crucial to appropriately control learning -from the learner's perspective, they both similarly manifest in noisier, less reliably predictable outcomes.Thus the model highlights that jointly estimating both volatility and unpredictability poses a difficult computational challenge, because inferences about them are interdependent.We argue that dissociating them is statistically possible, because they have distinguishable effects on the more detailed temporal pattern of the noise, specifically the residual autocorrelation in the outcomes (covariation on consecutive trials).While unpredictability decreases autocorrelation, volatility increases it.Because this difference plays out over multiple trials, detecting it poses a challenge for many approximate inference techniques, which rely on simplifying the relationship between different observables.We therefore provide a novel process-level model, based on a well-known sequential sampling procedure, to learn volatility and unpredictability simultaneously from data.
The observation that observed noise can, to a first approximation, be explained by either volatility or unpredictability has novel implications.First, previous work apparently showing variation in volatility processing in different groups, such as patients (using a model and tasks that do not vary unpredictability), might instead reflect misidentified abnormalities in processing unpredictability.We suggest that future research should test both dimensions of learning explicitly.Furthermore, from the perspective of a learner inferring volatility and unpredictability, these factors should compete or trade off against one another to best explain observed noise.This means that any dysfunction or damage that impairs detection of unpredictability, should lead to a compensatory increase in inferred volatility, and vice versa: a classic pattern known in Bayesian inference as a failure of "explaining away."We argue that this may be the case both in some types of anxiety disorders and following damage to amygdala.Intolerance of uncertainty is thought to be a critical component of anxiety and a crucial risk factor for developing anxiety disorders (Dugas et al., 1998;Ladouceur et al., 2000).Although there has been recent interest in operationalizing this idea by connecting it to statistical learning models and tasks (Aylward et al., 2019;Gagne et al., 2018;Harlé et al., 2017;Huang et al., 2017;Huys et al., 2015;Luhmann et al., 2011;Paulus and Yu, 2012;Pulcu andBrowning, 2017, 2019), we and others have focused on apparent abnormalities in processing volatility (Browning et al., 2015;Piray et al., 2019).The current model suggests a different interpretation, in which anxiety primarily disrupts inference about unpredictability, but with the additional result that the learner misinterprets noise due to unpredictability as a signal of change, i.e. volatility.We argue that the complementary pattern of explaining away, in which a failure to detect volatility leads to change misattributed to unpredictability, can be appreciated in studies of the amygdala's role in modulating learning rates.In particular, our model suggests that a specific involvement of amygdala in volatility (and the explaining away pattern) explains effects of amygdala damage better than an involvement in learning rates more generally.These sorts of reciprocal interactions also give rise to a richer and subtler set of possible patterns of dysfunction that may help to understand a wide range of other neurological and psychiatric disorders, such as schizophrenia, in which there has been a tendency to study altered processing of uncertainty narrowly in the context of volatility.

Model
We begin with the Kalman filter, which describes statistically optimal learning from data produced according to a specific form of noisy generation process.The model assumes that the agent must draw inferences (e.g., about true reward rates) from observations (individual reward amounts) that are corrupted by two distinct sources of noise: process noise or volatility and outcome noise or unpredictability (Figure 1ab).Volatility captures the speed by which the true value being estimated changes from trial to trial (modeled as Gaussian diffusion); unpredictability describes additional measurement noise in the observation of each outcome around its true value (modeled as Gaussian noise on each trial).
For this data generating process, if the true values (i.e., variances) of volatility and unpredictability,  " and  " are known, then optimal inference about the underlying reward rate is tractable using a specific application of Bayes rule, here called the Kalman filter (Kalman, 1960).The Kalman filter represents its beliefs about the reward rate at each step as a Gaussian distribution with a mean,  " , and variance (i.e., uncertainty about the true value),  " .The update, on every trial, is driven by a prediction error signal,  " , and learning rate,  " .This leads to simple update rules following observation of outcome  " : " =  " −  " Equation 1 " =  " +  "  " +  " +  " Equation 2 ",-=  " +  "  " Equation 3 ",-= (1 −  " )( " +  " ) Equation 4 This derivation thus provides a rationale for the error-driven update prominent in neuroscience and psychology (Kakade and Dayan, 2002), and adds to these a principled account of the learning rate,  " , which on this view should depend (Eq.2) on the agent's uncertainty and the noise characteristics of the environment.In particular, Eq. 2 shows that the learning rate is increasing and decreasing, respectively, with volatility and unpredictability.This is because higher volatility increases the chance that the true value will have changed since last observed (increasing the need to rely on the new observation), but higher unpredictability decreases the informativeness of the new observation relative to previous beliefs.
This observation launched a line of research focused on elucidating and testing the prediction that organisms adopt higher learning rates when volatility is higher (Behrens et al., 2007).But the very premise of these experiments violates the simplifying assumption of the Kalman filter -that volatility is fixed and known to the agent.To handle this situation, new models were developed (Mathys et al., 2011;Piray and Daw, 2020) that generalize the Kalman filter to incorporate learning the volatility  " as well, arising from Bayesian inference in a hierarchical generative model in which the true  " is also changing.In this case, exact inference is no longer tractable, but approximate inference is possible and typically incorporates Eqs.1-4 as a subprocess.
This line of work on volatility estimation inherited from the Kalman filter the view of unpredictability as fixed and known.The premise of the current article is that all the same considerations apply to unpredictability as well: it must be learned from experience, may be changing, and its value impacts learning rate.Indeed, learning both parameters is critical for efficient learning, because they have opposite effects on learning rate: whereas volatility increases the learning rate, unpredictability reduces it (Figure 1c).
Learning these two parameters simultaneously is difficult because, from the perspective of the agent, larger values of either volatility or unpredictability result in more surprising observations: i.e., larger outcome variance (Figure 1e).However, there is a subtle and critical difference between the effects of these parameters on generated outcomes: whereas larger volatility increases the autocorrelation between outcomes (i.e.covariation between outcomes on consecutive trials), unpredictability reduces the autocorrelation (Figure 1f).This is the key point that makes it possible to dissociate and infer these two terms while only observing outcomes.The observed outcome is given by the true reward rate,  " , plus some noise whose variance is given by the true unpredictability,  " .The reward rate itself depends on its value on the previous trial plus some noise whose variance is given by the true volatility,  " .Both volatility and unpredictability are dynamic and probabilistic Markovian variables generated by their value on the previous trial multiplied by some independent noise.The multiplicative noise ensures that these variables are positive on every trial.See Methods for the details of the model.e-f) It is possible to infer both volatility and unpredictability based on observed outcomes, because these parameters have dissociable statistical signatures.In particular, although both of them increase variance (e), but they have opposite effects on autocorrelation (f).In particular, whereas volatility increases autocorrelation, unpredictability tends to reduce it.Here, 1-step autocorrelation (i.e.correlation between trial  and  − 1) was computed for 100 time-series generated with parameters defined in b and c.Error-bars are standard error of the mean.Small and large parameters for volatility were 0.1 and 0.4 and for unpredictability were 1 and 4, respectively.
We developed a probabilistic model for learning under these circumstances.The data generation process arises from a further hierarchical generalization of these models (especially the generative model used in our recent work (Piray and Daw, 2020)), in which the true value of unpredictability  " is unknown and changing, as are the true reward rate and volatility (Figure 1d).The goal of the learner is to estimate the true reward rate from observations, which necessitates inferring volatility and unpredictability as well.
As with models of volatility, exact inference for this generative process is intractable.Furthermore, this problem is also challenging to handle with variational inference, the family of approximate inference techniques used previously (see Discussion).Thus, we have instead used a different standard approximation approach that has also been popular in psychology and neuroscience, Monte Carlo sampling (Brown and Steyvers, 2009;Courville et al., 2006;Daw and Courville, 2008;Griffiths et al., 2006).
In particular, we use particle filtering to track  " and  " based on data (Doucet and Johansen, 2011;Doucet et al., 2000).Our method exploits the fact that given a sample of volatility and unpredictability, inference for the reward rate is tractable and is given by Eqs 1-4, in which  " and  " are replaced by their corresponding samples (see Methods).

Learning under volatility and unpredictability
We now consider the implications of this model for learning under volatility and unpredictability.
A series of studies has used two-level manipulations (high vs low volatility blockwise) to investigate the prediction that learning rates should increase under high volatility (Behrens et al., 2007(Behrens et al., , 2008;;Browning et al., 2015;Pulcu and Browning, 2017).Here volatility has been operationalized by frequent or infrequent reversals (Figure 2a), rather than the smoother Gaussian diffusion that the volatility-augmented Kalman filter models formally assume.Nevertheless, applied to this type of task, these models detect higher volatility in the frequent-reversal blocks, and increase their learning rates accordingly (Behrens et al., 2007;Mathys et al., 2011;Piray and Daw, 2020).The current model (which effectively incorporates the others as a special case) achieves the same result (Figure 2).In the preceding line of studies, unpredictability was not manipulated.(Indeed, it was not even independently manipulable because rewards were binary, and the variance of binomial outcomes is determined only by the mean.)However, analogous effects of unpredictability have been seen in another line of studies (Nassar et al., 2010(Nassar et al., , 2012(Nassar et al., , 2016)).In these studies, Nassar and colleagues studied learning rates in task in which subjects had to predict a value, from observations in which the true value was corrupted by different levels of additive Gaussian noise (i.e., unpredictability) and occasionally "jumping" with a constant hazard rate, analogous to volatility.Important for our purposes here is that these studies have shown that participants' learning rate decreases with increases in the noise level.This effect cannot be explained by models that only consider volatility, and in fact, those models make opposite predictions because they take increased noise as evidence of volatility increase.The current model, however, produces the same effect as humans: because it correctly infers the change in unpredictability, its learning rate is lower, on average, for higher levels of noise (Figure 3).Note that while considered together, these two lines of studies separately demonstrate the two types of effects on learning rates we stress, neither of these lines of work has manipulated unpredictability alongside volatility.Furthermore, learning of the noise hyperparameters has only been explicitly modeled for volatility alone.We next consider a variant of this type of task, elaborated to include a 2x2 factorial manipulation of both unpredictability alongside volatility (Figure 4; we also substitute smooth diffusion for reversals).Here both parameters are constant within the task, but they are unknown to the model.A series of outcomes was generated based on a Markov random walk in which the hidden reward rate is changing according to a random walk and the learner observes outcomes that are noisily generated according to the reward rate.
Figure 4 shows the model's learning rates and how these follow from its inferences of volatility and unpredictability.As above, the model increases its learning rate in the higher volatility conditions; but as expected it also decreases it in the higher unpredictability conditions (Figure 4a).These effects on learning rate arise, in turn (via Eq. 2) because the model is able to correctly estimate the various combinations of volatility and unpredictability from the data (Figure 4bc).
Our model thus suggests a general program of augmenting the standard 2-level volatility manipulation by crossing it with a second manipulation, of unpredictability, and predicts that higher unpredictability should decrease learning rate, separate from volatility effects.Next we apply this model to reinterpret some issues about learning rates in animal conditioning, and then in psychiatric and neurological disorders.Furthermore, these parameters have opposite effects on learning rate.In contrast to volatility, higher unpredictability reduces the learning rate.b) Estimated volatility captures variations in true volatility (small: 0.5; large: 1.5).c) Estimated unpredictability captures variations in the true unpredictability (small: 2; large: 6).
In (a-c), average learning rate, estimated volatility and unpredictability in the last 20 trials were plotted over all simulations (100 simulations).

Unpredictability vs. volatility in Pavlovian learning
Learning rates and their dependence upon previous experience have also been extensively studied in Pavlovian conditioning.In this respect, a distinction emerges between two seemingly contradictory lines of theory and experiment, those of Mackintosh (1975) vs Pearce and Hall (1980).Both of these theories concern how the history of experiences with some cue drives animals to devote more or less "attention" to it.Attention is envisioned to affect several phenomena including not just rates of learning about the cue, but also other aspects of their processing, such as competition between multiple stimuli presented in compound.Here, to most clearly examine the relationship with the research and models discussed above, we focus specifically on learning rates for a single cue.
The two lines of models start with opposing core intuitions.Mackintosh (1975) argues that animals should pay more attention to (e.g., learn faster about) cues that have in the past been more reliable predictors of outcomes.Pearce and Hall (1980) argue for the opposite: faster learning about cues that have previously been accompanied by surprising outcomes, i.e. those that have been less reliably predictive.
Indeed, different experiments -as discussed below -support either view.For our purposes, we can view these experiments as involving two phases: a pretraining phase that manipulates unpredictability or surprise, followed by a retraining phase to test how this affects the speed of subsequent (re)learning.In terms of our model, we can interpret the pretraining phase as establishing inferences about unpredictability and volatility, which then (depending on their balance) govern learning rate during retraining.On this view, noisier pretraining might, depending on the pattern of noise, lead to either higher volatility and higher learning rates (consistent with Pearce-Hall) or higher unpredictability and lower learning rates (consistent with Mackintosh).
First consider volatility.It has been argued that the Pearce-Hall (1980) logic is formalized by volatilitylearning models (Courville et al., 2006;Dayan et al., 2000;Piray and Daw, 2020).In these models, surprising outcomes during pretraining increase inferred volatility and thus speed subsequent relearning.
Hall and Pearce (Hall and Pearce, 1982;Pearce and Hall, 1980) pretrained rats with a tone stimulus predicting a moderate shock.In the retraining phase, the intensity of the shock was increased.Critically, one group of rats experienced a few surprising "omission" trials at the end of the pretraining phase, in which the tone stimulus was presented with no shock.The speed of learning was substantially increased following the omission trials compared with a control group that experienced no omission in pretraining.
Figure 5 shows a simulation of this experiment from the current model, showing that the omission trials lead to increased volatility and faster learning.Note that the history-dependence of learning rates in this type of experiment also rejects simpler models like the Kalman filter, in which volatility (and unpredictability) are taken as fixed; for the Kalman filter, learning rate depends only on the number of pretraining trials but not the particular pattern of observed outcomes.(Hall and Pearce, 1982), in which they found that the omission group show higher speed of learning than the control group.b) Median learning rate over the first trial of the retraining.The learning rate is larger for the omission group due to increases of volatility (c), while unpredictability is similar for both groups.Errorbars reflect standard error of the median over 100 simulations.
Next, consider unpredictability.Perhaps the best example of Mackintosh's (1975) principle in terms of learning rates for a single cue is the "partial reinforcement extinction effect" (Gibbon et al., 1980;Haselgrove et al., 2004;Rescorla, 1999).Here, for pretraining, a cue is reinforced either on every trial or instead ("partial reinforcement") on only a fraction of trials.The number of times that the learner encounters the stimulus is the same for both conditions, but the outcome is noisier for the partially reinforced stimulus.The retraining phase consists of extinction (i.e.fully unreinforced presentations of the cue), which occurs faster for fully reinforced cues even though they had been paired with more reinforcers initially.Our model explains this finding (Figure 6), because it infers larger unpredictability in the partially reinforced condition, leading to slower learning.Notably, this type of finding cannot be explained by models like which learn only about volatility (Behrens et al., 2007;Mathys et al., 2011;Piray and Daw, 2020).In general, this class of models mistake partial reinforcement for increased volatility (rather than increased unpredictability), and incorrectly predict faster learning.Errorbars reflect standard error of the mean over 100 simulations and are, for some parameters, too small to be visible.
Note the subtle difference between the experiments of Figures 4 and 5.The surprising omission experiment involves stable pretraining prior to omission, then an abrupt shift, whereas pretraining in the partial reinforcement experiment is stochastic, but uniformly so.Accordingly, though both pretraining phases involve increased noise (relative to their controls) the model interprets the pattern of this noise as more likely reflecting either volatility or unpredictability, respectively, with opposite effects on learning rate.Overall, then, these experiments support the current model's suggestion that organisms learn about unpredictability in addition to volatility.Conversely, the models help to clarify and reconcile the seemingly opposing theory and experiments of Mackintosh and Pearce-Hall, at least with respect to learning rates for individual cues.Indeed, although previous work has noted the relationship between Pearce-Hall surprise, uncertainty, and learning rates (Behrens et al., 2007;Dayan et al., 2000;Li et al., 2011;Piray and Daw, 2020;Piray et al., 2019), the current modeling significantly clarifies this mapping by identifying it more specifically with volatility, as contrasted against simultaneous inference about unpredictability.
Relatedly, previous work attempting to map Mackintosh's (1975) principle onto statistical models has focused on cue combination rather than learning rates (Dayan et al., 2000), which is a complementary but separate idea.

Interactions between volatility and unpredictability
The previous results highlight an important implication of the current model: that inferences about volatility and unpredictability are mutually interdependent.From the learner's perspective, both volatility and unpredictability increase the noisiness of observations, and disentangling their respective contributions requires trading off two opposing explanations for the pattern of observations, a process known in Bayesian probability theory as explaining away.Thus, models that neglect unpredictability tend to misidentify unpredictability as volatility and inappropriately modulate learning.Intriguingly, this situation might in principle arise in neurological damage and psychiatric disorders, if they selectively impact inference about volatility or unpredictability.In that case, the model predicts a characteristic pattern of compensation, whereby learning rate modulation is not merely impaired but reversed, reflecting the substitution of volatility for unpredictability or vice versa: a failure of explaining away.Figure 7 shows this phenomenon in the 2x2 design of Figure 4.The key point here is that a lesioned model that does not consider one factor (e.g.unpredictability), inevitably makes systematically incorrect inferences about the other factor too. Importantly, previous models that only consider volatility are analogous to the unpredictability lesion model (Figure 7c) and, therefore, make systematically erroneous inference about volatility (Figure 7h) and misadjust learning rate if unpredictability is changing (Figure 7f).This set of lesioned models provide a rich framework for understanding pathological learning in psychiatric and neurologic disorders.We show below that volatility lesion and unpredictability lesion models explain deficits in learning observed in amygdala damage and anxiety, respectively.
The predicted pattern of compensatory effects on inference is evidently visible in the effects of amygdala damage on learning.The amygdala is known to be critical for associative learning (Averbeck and Costa, 2017;Phelps et al., 2014).Based on evidence from human neuroimaging work, single-cell recording in nonhuman primates and lesion studies in rats, different authors have proposed that the amygdala is involved in controlling learning rates (Averbeck and Costa, 2017;Costa et al., 2016;Holland and Gallagher, 1999;Homan et al., 2019;Li et al., 2011;Roesch et al., 2012).Most informative from the perspective of our model are lesion studies in rats (Holland andGallagher, 1993, 1999) that support an involvement specifically in processing of volatility, rather than learning rates or uncertainty more generally.These experiments examine a surprise-induced upshift in learning rate similar to the Pearce-Hall experiment from Figure 6.Lesions to the central nucleus of the amygdala attenuate this effect, suggesting a role in volatility processing.But in fact, the effect is not merely attenuated but reversed, consistent with our theory's predictions in which volatility trades off against a (presumably anatomically separate) system for unpredictability estimation.4 for the healthy and lesioned models.For both lesion models, lesioning does not merely abolish the corresponding effects on learning rate, but reverses it.Thus, the volatility lesion model shows reduced learning rate with increases in volatility (e) and the unpredictability lesion model shows elevated learning rate with increases in unpredictability (f).This is due to misattribution of the noise due to the lesioned factor to the existing module.g) The volatility lesion model makes erroneous inference about unpredictability and increases its unpredictability estimate for higher volatile environments.h) The unpredictability lesion model makes erroneous inference about volatility and increases its volatility estimate in higher unpredictable environments.In fact, both lesion models are not able to distinguish between volatility and unpredictability and therefore show similar pattern for the remaining module.Small and large true volatility are 0.5 and 1.5, respectively.Small and large true unpredictability are 1 and 3, respectively.
Figure 8 shows their serial prediction task and results in more detail.Rats performed a prediction task in two phases.A group of rats in the 'consistent' condition performed the same prediction task in both phases.The 'shift' group, in contrast, experienced a sudden change in the contingency in the second phase.Whereas the control rats showed elevation of learning rate in the shift condition manifested by elevation of food seeking behavior in the very first trial of the test, the amygdala lesioned rats showed the opposite pattern.Lesioned rats showed significantly smaller learning rate in the shift condition compared with the consistent one: a reversal of the surprise-induced upshift.In the 'shift' condition, however, rats were trained on the same light-tone partial reinforcement schedule in the first phase, but the schedule shifted to a different one in the shorter second phase, in which rats received light-tone-reward on half of trials and light-nothing on the other half.b) Empirical data showed that while the contingency shift facilitates learning in the control rats, it disrupts performance in lesioned rats.c) learning rate in the last trial of second phase shows the same pattern.This is because the shift increases volatility for the control rats (d) but not for the lesioned rate (e).In contrast, the contingency shift increases the unpredictability for the lesioned rats substantially more than that for the control rats, which results in reduced learning rate for the lesioned animals (f-g).The gray line shows the starting trial of the second phase.Data in (b) was original reported in (Holland and Gallagher, 1993) and reproduced here from (Holland and Schiffino, 2016).Errorbars reflect standard error of the mean over 100 simulations.
We simulated the model in this experiment.To model a hypothetical effect of amygdala lesion on volatility inference, we assumed that lesioned rats treat volatility as small and constant.As shown in Figure 8, the model shows an elevated learning rate in the shift condition for the control rats, which is again due to increases in inferred volatility after the contingency shift.For the lesioned model, however, surprise is misattributed to the unpredictability term as an increase in inferred volatility cannot explain away surprising observations (because it was held fixed).Therefore, the contingency shift inevitably increases unpredictability and thereby decreases the learning rate.Notably, the compensatory reversal in this experiment cannot be explained using models that do not consider both volatility and unpredictability terms.
A similar pattern of effects of amygdala lesions, consistent with our theory, is seen in an experiment on non-human primates (Figure 9).In a recent report by Costa et al. (2016), it has been found that amygdala lesions in monkeys disrupt reversal learning with deterministic contingencies, moreso than a reversal task with stochastic contingencies.This is striking since deterministic reversal learning tasks are much easier.Similar to the previous experiment, our model explains this finding because large surprises caused by the contingency reversal are misattributed to the unpredictability in lesioned animals (because volatility was held fixed), while control animals correctly attribute them to the volatility term.This effect is particularly large in the deterministic case because the environment is very predictable before the reversal and therefore the reversal causes larger surprises than those for the stochastic one.Similar findings have been found in a study in human subjects with focal bilateral amygdala lesions (Hampton et al., 2007), in which patients tend to show more deficits in deterministic reversal learning than stochastic one.Again, these experimental findings are not explained by a Kalman filter or models that only consider the volatility term.
Overall, then, these experiments support the current model's picture of dueling influences of unpredictability and volatility.Furthermore, the current model helps to clarify the precise role of amygdala in this type of learning, relating it specifically to volatility-mediated adjustments.The probabilistic reversal learning task by Costa et al. (2016).The task consists of 80 trials, in which animals chose one of the two stimuli by making a saccade to it and fixating on the chosen cue.A probabilistic reward was given following a correct choice.The stimulus-reward contingency was reversed in the middle of the task (on a random trial between trials 30-50).The task consists of different schedules, but we focus here on 60%/40% (stochastic) and 100%/0% (deterministic), which show the clearest difference in empirical data.b) Performance of animals in this task.In addition to the general reduced performance by the lesioned animals, their performance was substantially more disrupted in the deterministic-than stochastic-reversal. c) Performance of the model in this task shows the same pattern.d-i) Learning rate, volatility and unpredictability signals for the deterministic (d-f) and stochastic task (g-i).Solid and dashed line are related to acquisition and reversal phase, respectively.Deterministic reversal increases the learning rate in the control animals due to increases in volatility, but not in the lesioned monkeys, in which it reduces the learning rate due to the increase of the unpredictability.The reversal in the stochastic task has very small effects on these signals, because unpredictability is relatively large during both acquisition and reversal.Errorbars reflect standard error of the mean over 100 simulations.
Lesioned models like the ones in Figure 7 might also be useful for understanding learning deficits in psychiatric disorders, for example anxiety disorders, which have recently been studied in the context of volatility and its effects on learning rate (Browning et al., 2015;Piray et al., 2019).These studies have shown that anxious people are less sensitive to volatility manipulations in probabilistic learning tasks similar to Figure 2.Besides learning rates, an analogous insensitivity to volatility has been observed in pupil dilation (Browning et al., 2015) and in neural activity in the dorsal anterior cingulate cortex, a region that covaried with learning rate in controls (Piray et al., 2019).
These results have been interpreted in relation to the more general idea that intolerance of uncertainty is a key foundation of anxiety; accordingly, we argue that fully understanding them requires taking account of unpredictability, which is prima facie an even more basic source of uncertainty.In particular, these of effects and a number of others are well explained by the unpredictability lesion model of Figure 7c, i.e. by assuming that anxious people have a core deficit in estimating unpredictability, and instead treat it as small and constant.As shown in Figure 10, this model shows insensitivity to volatility manipulation, but that is actually because volatility is misestimated nearer ceiling due to underestimation of volatility.This, in turn, substantially dampens further adaptation of learning rate in blocks when volatility actually increases.The elevated learning rate across all blocks leads to hypersensitivity to noise, which prevents anxious individuals from benefitting from stability, as has been observed empirically (Piray et al., 2019).For the same reason, the model also predicts that anxious people show elevated learning rate in tasks with probabilistic (i.e.unpredictable) outcomes (Figure 10de), even when there is no switch in the cue-outcome contingency (Aylward et al., 2019;Huang et al., 2017).In the extreme case, this effect would pin the learning rate near one, which (since in this case only the most recent outcome is considered) is equivalent to a win-stay/lose-shift strategy.Such a strategy has itself been linked to anxiety (Harlé et al., 2017;Huang et al., 2017).
Figure 10.The unpredictability lesion model shows a pattern of learning deficits associated with anxiety.
Behavior of the lesioned model as the model of anxiety, in which unpredictability is assumed to be small and constant, is shows along the control model.a-d) Behavior of the models in the switching task of Figure 2 is shown.An example of estimated reward by the models shows that the anxious model is more sensitive to noisy outcomes (a), which dramatically reduces sensitivity of the learning rate to volatility manipulation in this task (b).This is, however, is primarily related to inability to make inference about unpredictability, which leads to misestimation of volatility (c-d).e-f) Behavior of the models in a probabilistic learning task with no contingency switch is shown.Even in this simpler task, the anxious model shows elevated learning rate and elevated win-stay/lose-shift behavior, which is again due to misattribution of noise to volatility.

Discussion
A central question in decision neuroscience is how the brain learns from the consequences of choices given that these can be highly noisy.To do so effectively requires simultaneously learning about the characteristics of the noise, as has been studied in a prominent line of work on how the brain tracks the volatility of the environment.Here we argue that work on this topic has largely neglected a key distinction between two importantly different types of noise: volatility and unpredictability.A statistically efficient agent should behave very differently when faced with these two types of noise.Previous work has focused on the prediction that volatility should increase the learning rate, because it signals that previous observations are less likely to remain relevant (Behrens et al., 2007;Iglesias et al., 2013).But another superficially similar type of noise, unpredictability, should instead decrease the learning rate, because it signals that individual outcomes are noisy and must be averaged over time to reveal the underlying reward rate.Efficient learning requires taking both these effects into account by estimating both types of noise and adjusting learning rates accordingly.While various experiments have, separately, shown that humans can adjust learning rates in response to manipulations of either type of noise, models of how they do so have focused primarily on estimating volatility alone, thereby skirting the difficult problem of distinguishing types of noise.To solve this problem, we built a novel probabilistic model for learning in uncertain environments that tracks volatility and unpredictability simultaneously.Computationally, this is possible because these two types of noise have dissociable observable signatures despite having similar effects on outcome variance.In particular, whereas volatility increases outcome autocorrelation (i.e.covariation on consecutive trials), unpredictability decreases it.
Our work builds directly on a rich line of theoretical and experimental work on the relationship between volatility and learning rates (Behrens et al., 2007;de Berker et al., 2016;Browning et al., 2015;Diaconescu et al., 2014;Farashahi et al., 2017;Iglesias et al., 2013;Khorsand and Soltani, 2017).There have been numerous reports of volatility effects on healthy and disordered behavioral and neural responses, often using a two-level manipulation of volatility like that from Figure 2 (Behrens et al., 2007;Brazil et al., 2017;Browning et al., 2015;Cole et al., 2020;Deserno et al., 2020;Diaconescu et al., 2020;Farashahi et al., 2017;Iglesias et al., 2013;Katthagen et al., 2018;Lawson et al., 2017;Paliwal et al., 2019;Piray et al., 2019;Powers et al., 2017;Pulcu and Browning, 2017;Soltani and Izquierdo, 2019).Our modeling suggests that it will be informative to drill deeper into these effects by augmenting this task to cross this manipulation with unpredictability so as more clearly to differentiate these two potential contributors.For example, in tasks that manipulate (and models that consider) only volatility, it can be seen from Equations 1-4 that the timeseries of several quantities all covary together, including the estimated volatility  " , the posterior uncertainty  " , and the learning rate  " .It can therefore be difficult in general to distinguish which of these variables is really driving tantalizing neural correlates related to these processes, for instance in amygdala and dorsal anterior cingulate cortex (Behrens et al., 2007;Li et al., 2011).The inclusion of unpredictability (which increases uncertainty but decreases learning rate) would help to drive these apart.
Indeed, another related set of learning tasks considered prediction of continuous outcomes corrupted by unpredictability, i.e. additive Gaussian noise (McGuire et al., 2014;Nassar et al., 2010Nassar et al., , 2012Nassar et al., , 2016) ) which could provide another foundation for factorial manipulations of the sort we propose.Indeed, a number of these studies (complementary to the volatility studies) included multiple levels of unpredictability and showed learning rate effects (Diederen and Schultz, 2015;McGuire et al., 2014;Nassar et al., 2010Nassar et al., , 2012Nassar et al., , 2016)), though the accompanying models did not address our main question of how subjects estimate the noise hyperparameters.Interestingly, these studies more explicitly emphasized the detection of discrete changes in the environment and the resulting adjustment of the learning rate.From a modeling perspective, inference under change at discrete changepoints (occurring at some hazard rate) raises issues quite analogous to change due to more gradual diffusion (with some volatility).Thus, in practice it has been common and effective to apply models for one sort of change to tasks actually involving the other (Behrens et al., 2007;Mathys et al., 2011;Piray and Daw, 2020).Because there is not an especially well isolated functional distinction, research into the neural substrates of changepoint detection is highly relevant to the change problem conceived in terms of volatility as well (see (Soltani and Izquierdo, 2019) for a recent review).
The current framework's tendency to combine discrete and continuous change (but distinguish both from unpredictability) is also the basis of an important, but subtle, distinction from another prominent dichotomy previously proposed: that between "expected" and "unexpected" types of uncertainty (Dayan and Yu, 2003;Yu and Dayan, 2005).While it might appear that these categories correspond, respectively, to unpredictability and volatility as we define them, that is not actually the case.Formally, this is because the Yu and Dayan model arises from a Kalman filter augmented with additional discrete changepoints: i.e. both diffusion and jumps.The focus of that work was distinguishing the special effects of surprising jumps ("unexpected uncertainty"), which were hypothesized to recruit a specialized neural interrupt system.Meanwhile, all other uncertainty arising in the baseline Kalman filter (i.e., that from both unpredictability and volatility; the posterior variance  " in Eq. 4) is lumped together under "expected uncertainty."That said, our impression is that later authors' use of these terms actually tends to comport more with our distinction than the original definition (Pulcu and Browning, 2019).
An interesting feature of our model is the competition it induces between volatility and unpredictability to "explain away" surprising observations.This leads to a predicted signature of the model in cases of lesion or damage affecting inference about either type of noise: disruption causing neglect of one of the terms leads to overestimation of the other term.For example, if a module responsible for volatility learning were disrupted, the model would overestimate unpredictability, because surprising observations that are due to volatility would be misattributed to the unpredictability.This allowed us to revisit the role of amygdala in associative learning and explain some puzzling findings about its contributions to reversal learning.
Similar explanations grounded in the current model may also be relevant to a number of psychiatric disorders.Abnormalities in uncertainty and inference, broadly, have been hypothesized to play a role in numerous disorders, including especially anxiety and schizophrenia.More specifically, abnormalities in volatility-related learning adjustments have been reported in patients or people reporting symptoms of several mental illnesses (Brazil et al., 2017;Browning et al., 2015;Cole et al., 2020;Deserno et al., 2020;Diaconescu et al., 2020;Katthagen et al., 2018;Lawson et al., 2017;Paliwal et al., 2019;Piray et al., 2019;Powers et al., 2017;Pulcu and Browning, 2017).The current model provides a more detailed framework for better dissecting these effects, though this will ideally require a new generation of experiments manipulating both factors.
In the present work, we have developed these ideas mostly in terms of pathological decision making in anxiety, which is one of the areas where earlier work on volatility estimation has been strongest and where further refinement using our theory seems most promising (Gagne et al., 2018;Hartley and Phelps, 2012;Huys et al., 2015;Paulus and Yu, 2012).We considered an account by which anxious individuals systematically misidentify outcomes occurring due to chance (unpredictability) as instead a signal of change (volatility) (Huang et al., 2017).This account is also broadly consistent with studies suggesting that anxious individuals might feel overwhelmed when faced with uncertainty (Luhmann et al., 2011) and fail to make use of long-term statistical regularities (Huang et al., 2017).These are also hypothesized to be related to symptoms of excessive worry (Dugas et al., 1998;Ladouceur et al., 2000).Misestimating unpredictability -moreso than volatility -also seems consonant with the idea that anxious individuals tend to fail to discount negative outcomes occurring by chance (i.e., unpredictability) and instead favor alternative explanations like self-blame (Beck, 1970).
More generally, this modeling approach, which quantifies misattribution of unpredictability to volatility and vice versa, might be useful for understanding various other brain disorders that are thought to influence processing of uncertainty and have largely been studied in the context of volatility in the past decade (Brazil et al., 2017;Cole et al., 2020;Deserno et al., 2020;Diaconescu et al., 2014;Lawson et al., 2017;Paliwal et al., 2019;Powers et al., 2017;Reed et al., 2020).As another example, positive symptoms in schizophrenia have been argued to result from some alterations in prior vs likelihood processing, perhaps driven by abnormal attribution of uncertainty (or precision) to top-down expectations (Stephan et al., 2006).But different such symptoms (e.g.hallucinations vs. delusions) manifest in different patients.One reason may be that these relate to disruption at different levels of a perceptual-inferential hierarchy, i.e. with hallucination vs delusion reflecting involvement of more or less abstract inferential levels, respectively (Baker et al., 2019;Horga and Abi-Dargham, 2019;Wengler et al., 2020).In this respect, the current model may provide a simple and direct comparative test, since unpredictability enters at the perceptual, or outcome, level (potentially associated with hallucination) but volatility acts at the more abstract level of the latent reward (and may be associated with delusion; see Figure 1).
Our work touches upon a historical debate in the associative learning literature about the role of outcome unpredictability (i.e., in our terms, noise) in learning.One class of theories, most prominently represented by Mackintosh (1975), proposes that attention is preferentially allocated to cues that are most reliably predictive of outcomes; whereas Pearce and Hall (1980) suggest the opposite: that attention is attracted to surprising misprediction.We address only a subset of the experimental phenomena involved in this debate (those involving learning rates for cues presented alone), but for this subset we offer a very clear resolution of the apparent conflict.Our approach and goals also differ from classic work in this area.A number of important models of attention in psychology also attempt to reconcile these theories by providing more phenomenological models that hybridize the two theories to account for various and often paradoxical experimental work (Haselgrove et al., 2010;Le Pelley, 2004;Le Pelley et al., 2016;Pearce and Mackintosh, 2010).Our goal is different and is more descended from a tradition of normative theories that provide a computational understanding of psychological phenomena from first principles by first addressing what is the computational problem that the corresponding neural system is evolved to solve (Dayan et al., 2000;Marr, 1982).
Any probabilistic model relies on a set of explicit assumptions about how observations have been generated, i.e. a generative model, and also an inference procedure to estimate the hidden parameters that are not directly observable.Such inference algorithms typically reflect some approximation strategy because exact inference is not possible for most important problems, including our generative model (Figure 1).In previous work in this area, we and others have relied on variational approaches to approximate inference, which factors difficult inference problems into smaller tractable ones, and approximates the answer as though they were independent (Mathys et al., 2011;Piray and Daw, 2020).Interestingly, although one of the most promising successes of this approach in neuroscience has been in hierarchical Kalman filters with volatility inference, we found it difficult to develop an effective variational filter for the current problem, when unpredictability is unknown.The core problem, in our hands, was that effective explaining away between the two noise types was difficult to achieve using simplified variational posteriors that omitted aspects of their mutual dependency (simulations not shown).
Interestingly, there are two other algorithms that, in principle, address similar learning problems, either using an explicitly variational approach extending the HGF (code exists as hgf_jget in TAPAS toolbox (Aponte et al., 2020), but has not been documented or tested in published articles; Mathys, personal communication) or an analogous simplified learning rule based more on neural considerations (Silvetti et al., 2018).While these have not yet been applied to the full range of problems we investigate here, we suspect that future work comparing these approaches will find challenges in explaining away.In any case, in the current modeling, we have adopted a different method based on Monte Carlo sampling, in particular a variant of particle filtering that preserves many of the advantages of variational methods by incorporating exact conditional inference for a subset of variables (Doucet et al., 2000).The inference model employed here combines Kalman filtering for estimation of reward rate (Kalman, 1960) conditional on the volatility and unpredictability, with particle filtering for inference about these (Doucet and Johansen, 2011).One drawback of the particle filter, however, is that it requires tracking a number of samples on every trial.In practice, we found that a handful (e.g.100) of particles results in a sufficiently good approximation.
Finally, in this study, we only modeled the effects of volatility and unpredictability on learning rate.However, uncertainty affects many different problems beyond learning rate, and a full account of how subjects infer volatility and unpredictability (and how these, in turn, affect uncertainty) may have ramifications for many other behaviors.Thus, there have been important statistical accounts of a number of such problems, but most of them have neglected either unpredictability or volatility, and none of them have explicitly considered the effects of learning the levels of these types of noise.These problems include cue-or feature-selective attention (Dayan et al., 2000); the explore-exploit dilemma (Daw et al., 2006;Gittins, 1979); and the partition of experience into latent states, causes or contexts (Gershman et al., 2010;Redish et al., 2007;Wilson et al., 2014).The current model, or variants of it, is more or less directly applicable to all these problems and should imply new predictions about the effects of manipulating either type of noise across many different behaviors.
respectively.In other words, larger values of  = and  J result in faster update of volatility and unpredictability, respectively.Intuitively, this is because a smaller  = increases the mean of  " and results in a larger update of  " .Since volatility is the inverse of  " , therefore, smaller  = results in slower update of volatility.This has been formally shown in our recent work (Piray and Daw, 2020).In addition to these two parameters, this generative process depends on initial value of volatility and unpredictability,  M and  M .
For inference, we employed a Rao-Blackwellised Particle Filtering approach (Doucet et al., 2000), in which the inference about  " and  " were made by a particle filter (Doucet and Johansen, 2011) and, conditional on these, the inference over  " was given by the Kalman filter (Eqs 1-4).The particle filter is a Monte Carlo sequential importance sampling method, which keeps track of a set of particles (i.e.samples).The weights depend on the probability of observed outcome: ).For every particle, Eqs 1-4 were used to define  " O and update  " O and  " O for every particle.Learning rate and estimated reward rate on trial  was then defined as the weighted average of all particles, in which the weights were given by  " O .Particles were resampled if the ratio of effective to total particles fall below 0.5.We have used particle filter routines implemented in MATLAB.

Simulation details
In simulations related to Figures 1 and 3, timeseries were generated according to the Markov random walk with constant volatility and unpredictability.For Figure 2, outcome variance for generating data was assumed to be 0.01 and model parameters were  = =  J = 0.2, and  M =  M = 0.1.For Figure 3, we followed the design of the prediction task in Nassar et al. (Nassar et al., 2010).The timeseries were generated according to a hidden reward rate plus a noise term, in which the variance of noise was either 1 (small) or 9 (large).The reward rate was subject to a random jump in the range of 0 to 10. Change points occurred after at least five trials plus a random draw according to an exponential distribution with a rate of 0.05 (i.e.mean 20).Model parameters were similar to those of Figure 2, except for the initial unpredictability,  M , which was assumed to be 5 (average over small and large true unpredictability).For Figure 4, we assumed  = = 0.1 and  J = 0.1;  M = 1 (average over small and large true volatility) and  M = 4 (average over small and large true unpredictability).For simulations presented in Figure 5, the weak and strong shock was 0.3 and 1, respectively, plus a very small noise with variance of 10 -6 .Model parameters were similar to those for Figure 4.For simulations presented in Figure 6, model parameters were the same as those used for Figure 4 and the outcome variance was 0.01.For Figure 7, model update parameters were the same as Figure 4, and initial volatility and unpredictability were assumed to be equal to the average of their corresponding true values.For simulating the experiment in Figure 8, rewards were generated with a very small outcome variance, 10 -6 .Here, the model was trained to predict both the tone (give the light) and the reward (given the light) on every trial.Initial volatility and unpredictability were assumed to be 0.1 and  = =  J = 0.2.Volatility was assumed for the lesioned model to be small and fixed (0.01 for the reward; 10 -7 for the tone).Figure 8b shows the average learning rate on the last trial of phase 2 (i.e. the first trial of the test) across tone and outcome and all simulations.Figure 8c-f shows volatility and unpredictability signals for the reward.For simulation of the reversal task in Figure 9, a small outcome variance similar, 10 -6 , was used for generating outcomes.Initial volatility and unpredictability were assumed to be 0.5 and  = =  J = 0.1.Volatility was assumed for the lesioned model to be small and fixed at 0.01.We have used softmax with a decision noise parameter as the choice model.We assumed that the decision noise parameter is 3 and 1 for the control and lesioned animals, respectively.These parameters were used to reproduce the general reduction of performance in lesioned animals, which is independent of the difference between the deterministic vs stochastic task in the two groups explained by our learning For all simulations, we have assumed initial mean and variance of the Kalman filter are 0 and 1, respectively.All simulations were repeated 100 times with 100 particles per simulation.

Figure 1 .
Figure 1.Statistical difference between volatility and unpredictability.a-b) Examples of generated time-series based on a small and large constant volatility parameter given a small (a) or a large (b) constant unpredictability parameter are plotted.c) Given a surprising observation (e.g. a negative outcome), one should compute how likely the outcome is due to the unpredictability (left balloon) or due to the volatility (right balloon).Dissociating these two terms is important for learning, because they have opposite influences on learning rate.d) Structure of the (generative) model: outcomes were stochastically generated based on a probabilistic model depending on reward rate, unpredictability and volatility.Only outcomes were observable (the gray circle), and value of all other parameters should be inferred based on outcomes.The observed outcome is given by the true reward rate,  " , plus some noise whose variance is given by the true unpredictability,  " .The reward rate itself depends on its value on the previous trial plus some noise whose variance is given by the true volatility,  " .Both volatility and unpredictability are dynamic and probabilistic Markovian variables generated by their value on the previous trial multiplied by some independent noise.The multiplicative noise ensures that these variables are positive on every trial.See Methods for the details of the model.e-f) It is possible to infer both volatility and unpredictability based on observed outcomes, because these parameters have dissociable statistical signatures.In particular, although both of them increase variance (e), but they have opposite effects on autocorrelation (f).In particular, whereas volatility increases autocorrelation, unpredictability tends to reduce it.Here, 1-step autocorrelation (i.e.correlation between trial  and  − 1) was computed for 100 time-series generated with parameters defined in b and c.Error-bars are standard error of the mean.Small and large parameters for volatility were 0.1 and 0.4 and for unpredictability were 1 and 4, respectively.

Figure 2 .
Figure2.The model elevates its learning rate in volatile environments.a) Simulations of our model in the volatility learning paradigm, in which subjects undergo stable (bluish) and volatile blocks (orangish) of learning.Dashed and solid line show true reward and estimated reward by the model, respectively.b) learning rate is larger in the volatile block compared with the stable one, similar to those reported in humans.This is because the volatility term (c) increases more than the unpredictability term (d) in the volatile condition.Errorbars reflect standard error of the mean over 100 simulations and are, for some parameters, too small to be visible.

Figure 3 .
Figure 3.The model reduces the learning rate in unpredictable environments.a-b) The prediction task by Nassar and colleagues, in which the participant makes a new prediction of outcome on every trial.Outcomes are generated based on a true reward rate, which undergoes occasional jumps, plus small or large amount of noise.c-d) Behavior of the model in this task.Increases in the noise level is analogous to increases in unpredictability, which decreases the learning rate (c).This results in predictions of the model to change more slowly under the larger noise level (d).

Figure 4 .
Figure 4. Performance of the model in task with constant but unknown volatility and unpredictability parameters.a) Learning rate in the model varies by changes in both the true volatility and unpredictability.Furthermore, these parameters have opposite effects on learning rate.In contrast to volatility, higher unpredictability reduces the learning rate.b) Estimated volatility captures variations in true volatility (small: 0.5; large: 1.5).c) Estimated unpredictability captures variations in the true unpredictability (small: 2; large: 6).In (a-c), average learning rate, estimated volatility and unpredictability in the last 20 trials were plotted over all simulations (100 simulations).d-f) Learning rate, volatility and unpredictability estimates by the model for small true unpredictability.g-i) The three signals are plotted for the larger true unpredictability.Estimated volatility and unpredictability by the model capture their corresponding true values.In panels a-c, mean over the last 10 trials of each task has been plotted.Error bars reflect standard error of the mean over 100 simulations.
Figure 4. Performance of the model in task with constant but unknown volatility and unpredictability parameters.a) Learning rate in the model varies by changes in both the true volatility and unpredictability.Furthermore, these parameters have opposite effects on learning rate.In contrast to volatility, higher unpredictability reduces the learning rate.b) Estimated volatility captures variations in true volatility (small: 0.5; large: 1.5).c) Estimated unpredictability captures variations in the true unpredictability (small: 2; large: 6).In (a-c), average learning rate, estimated volatility and unpredictability in the last 20 trials were plotted over all simulations (100 simulations).d-f) Learning rate, volatility and unpredictability estimates by the model for small true unpredictability.g-i) The three signals are plotted for the larger true unpredictability.Estimated volatility and unpredictability by the model capture their corresponding true values.In panels a-c, mean over the last 10 trials of each task has been plotted.Error bars reflect standard error of the mean over 100 simulations.

Figure 5 .
Figure 5.The model explains Pearce and Hall's conditioned suppression experiment.a) The design of experiment(Hall and Pearce, 1982), in which they found that the omission group show higher speed of learning than the control group.b) Median learning rate over the first trial of the retraining.The learning rate is larger for the omission group due to increases of volatility (c), while unpredictability is similar for both groups.Errorbars reflect standard error of the median over 100 simulations.

Figure 6 .
Figure6.The model explains partial reinforcement extinction effects.a) The experiment consists of a partial condition in which a light cue if followed by reward on 50% of trials and a full condition in which the cue is always followed by the reward.b) Learning rate over the first trial of retraining has been plotted.Similar to empirical data, the model predicts that the learning rate is larger in the full condition, because partial reinforcements have relatively small effects on volatility (c), but it considerably increases unpredictability.Errorbars reflect standard error of the mean over 100 simulations and are, for some parameters, too small to be visible.

Figure 7 .
Figure 7. Behavior of the lesioned model.a-c) Unpredictability and volatility module inside the model compete to explain observation noise.Two characteristic lesioned models produce seemingly contradictory behaviors, because if the volatility module is lesioned, noise due to volatility is misattributed to unpredictability (b), and vice versa (c).d-f) Mean learning rate is plotted for the 2x2 design of Figure4for the healthy and lesioned models.For both lesion models, lesioning does not merely abolish the corresponding effects on learning rate, but reverses it.Thus, the volatility lesion model shows reduced learning rate with increases in volatility (e) and the unpredictability lesion model shows elevated learning rate with increases in unpredictability (f).This is due to misattribution of the noise due to the lesioned factor to the existing module.g) The volatility lesion model makes erroneous inference about unpredictability and increases its unpredictability estimate for higher volatile environments.h) The unpredictability lesion model makes erroneous inference about volatility and increases its volatility estimate in higher unpredictable environments.In fact, both lesion models are not able to distinguish between volatility and unpredictability and therefore show similar pattern for the remaining module.Small and large true volatility are 0.5 and 1.5, respectively.Small and large true unpredictability are 1 and 3, respectively.

Figure 8 .
Figure8.The model displays the behavior of amygdala lesioned rats in associative learning.a) The task used for studying the role of amygdala in learning by Holland and Galagher.Rats in the 'consistent' condition received extensive exposure to a consistent light-tone in a partial reinforcement schedule (i.e.only half of trials led to reward).In the 'shift' condition, however, rats were trained on the same light-tone partial reinforcement schedule in the first phase, but the schedule shifted to a different one in the shorter second phase, in which rats received light-tone-reward on half of trials and light-nothing on the other half.b) Empirical data showed that while the contingency shift facilitates learning in the control rats, it disrupts performance in lesioned rats.c) learning rate in the last trial of second phase shows the same pattern.This is because the shift increases volatility for the control rats (d) but not for the lesioned rate (e).In contrast, the contingency shift increases the unpredictability for the lesioned rats substantially more than that for the control rats, which results in reduced learning rate for the lesioned animals (f-g).The gray line shows the starting trial of the second phase.Data in (b) was original reported in(Holland and Gallagher, 1993) and reproduced here from(Holland and Schiffino, 2016).Errorbars reflect standard error of the mean over 100 simulations.

Figure 9 .
Figure 9.The model displays the behavior of amygdala lesioned monkeys in probabilistic reversal learning.a)The probabilistic reversal learning task byCosta et al. (2016).The task consists of 80 trials, in which animals chose one of the two stimuli by making a saccade to it and fixating on the chosen cue.A probabilistic reward was given following a correct choice.The stimulus-reward contingency was reversed in the middle of the task (on a random trial between trials 30-50).The task consists of different schedules, but we focus here on