Rats adopt the optimal timescale for evidence integration in a dynamic environment

Decision making in dynamic environments requires discounting old evidence that may no longer inform the current state of the world. Previous work found that humans discount old evidence in a dynamic environment, but do not discount at the optimal rate. Here we investigated whether rats can optimally discount evidence in a dynamic environment by adapting the timescale over which they accumulate evidence. Using discrete evidence pulses, we exactly compute the optimal inference process. We show that the optimal timescale for evidence discounting depends on both the stimulus statistics and noise in sensory processing. When both of these components are taken into account, rats accumulate and discount evidence with the optimal timescale. Finally, by changing the volatility of the environment, we demonstrate experimental control over the rats’ accumulation timescale. The mechanisms supporting integration are a subject of extensive study, and experimental control over these timescales may open new avenues of investigation.


Supplementary Note 1 -Training procedure
The training process for the Dynamic Clicks task involves "classical" training during which the rats learn to associate ports with rewards. The "classical" pipeline is the same as Ref. [1]. After completing classical training, then rats then learn the standard Poisson clicks task as described in Ref. [2]. Finally, environmental state switching is introduced. The following tables outline keep changes at each stage in the procedure. Rats typically spend a few days on classical, and about 6 weeks moving through the clicks training. In this section we outline our rat training process, and then provide several model free analyses of behavior for each rat individually. First, we include the psychometric curve with respect to the total click difference on each trial. Second, we include the psychometric curve with respect to the optimal inference process (assuming no sensory noise). Third, the chronometric plot shows rat accuracy with respect to time since the last hidden state change. Fourth, the chronometric plot with respect to total trial duration. Fifth, the reverse correlation curves with best fit exponential for each rat.
Supplementary Figure 1: Psychometric graph for all rats Each trial performed by the rat was binned by the total click difference in the trial. The rat's average accuracy in each bin is shown (dots). A four parameter logistic function is fit to the data with 95% confidence intervals (line).
Supplementary Figure 2: Psychometric graph for all rats Each trial performed by the rat was binned by the total click difference in the trial. The rat's average accuracy in each bin is shown (dots). A four parameter logistic function is fit to the data with 95% confidence intervals (line).
Supplementary Figure 3: Psychometric graph for all rats against ideal observer Each trial performed by the rat was binned by the accumulation value (log-odds) of the ideal observer (ie, no sensory noise). The rat's average accuracy in each bin is shown (dots). A four parameter logistic function is fit to the data with 95% confidence intervals (line).
Supplementary Figure 4: Psychometric graph for all rats against ideal observer Each trial performed by the rat was binned by the accumulation value (log-odds) of the ideal observer (ie, no sensory noise). The rat's average accuracy in each bin is shown (dots). A four parameter logistic function is fit to the data with 95% confidence intervals (line).
Supplementary Figure 5: Chronometric graph for all rats with respect to final state duration Each trial was binned by the amount of time since the last change in the hidden environmental state. The average accuracy of each bin is shown.
Supplementary Figure 6: Chronometric graph for all rats with respect to final state duration Each trial was binned by the amount of time since the last change in the hidden environmental state. The average accuracy of each bin is shown.
Supplementary Figure 7: Chronometric graph for all rats with respect to total trial duration Each trial was binned by the total trial duration. The average accuracy of each bin is shown. Most trials were drawn from the range (0.5 -2) seconds; however, some rats experience a small number of shorter trials, leading to greater uncertainty for those durations.
Supplementary Figure 8: Chronometric graph for all rats with respect to total trial duration Each trial was binned by the total trial duration. The average accuracy of each bin is shown. Most trials were drawn from the range (0.5 -2) seconds; however, some rats experience a small number of shorter trials, leading to greater uncertainty for those durations.

H067
Supplementary Figure 9: Reverse Correlation for all rats Reverse Correlation curves for each rat (black), as well as the best fit exponential discounting function (blue).

Supplementary Note 3 -Derivation of optimal inference
Here we provide more detail on the derivation from main text equation 2 to equation 3. This derivation was developed by Ref. [3], see equations 3.2 and 3.3. However, we do not approximate the evidence term into its first two moments, instead evaluating the evidence term. For this reason we report the same derivation but halting at the intermediate step not shown in Ref. [3].
Beginning with the evidence ratio, equation 2 in the present study, and equation 3.2 in Ref. [3].
Dividing each side by R t−1 Now, defineâ t = log (R t ), and take the logarithm of both sides: Using the approximation log (1 + a) ≈ a, which is valid when |a| << 1. Here, h∆t << 1.
Using sinh(x) = 1 2 (e x − e −x ): Plugging in the evaluation of the log-evidence term from the main text: Here, we again use ∆t << 1 to justify replacingâ t−1 withâ t on the right hand side. Evaluating the evidence term as derived in the main text, and rescaling by κ: Taking the limit of ∆t → 0: Here we are making the assumption that the action of the auditory clicks happen instantaneously with respect to the accumulation equation. Care needs to be used when interpreting the δ terms in Supple-

Supplementary Note 4 -Decreasing click reliability lengthens integration timescales
To more clearly see how less reliable clicks (smaller κ) results in a longer integration timescale, we can expand the discounting function in a taylor series around the origin: The even terms drop out, and we collect the odd terms: We find κ only appears with even power exponents in odd powers of x. Increasing κ will increase the strength of the discounting function. Increasing the strength of the discounting function leads to shorter integration timescales. In short, increasing κ shortens the integration timescale. Decreasing κ lengthens the integration timescale.

Supplementary Note 5 -Click mislocalization from model parameters
Our analysis of sensory noise uses an estimation of the rate of click mislocalization. Both our estimate of mislocalization for our rats, and rats from Ref. [2] are derived from parameters in the trial-bytrial model. Here we provide details on how we use those model parameters to estimate the click mislocalizaton, n.
The three model parameters that influence n are the sensory noise σ 2 s , the adaptation strength φ, Plugging in the adapted mean and variance for the average click, and x = 0: To estimate uncertainty on n due to uncertainty on the underlying model parameters, we performed a bootstrapping analysis. Each parameter was sampled 100 times according to the Hessian-derived covariance matrix. Each parameter sample was used to generate a sample n as described above. Figure   5C shows the maximum likelihood estimate of n and its standard error.

Supplementary Note 6 -Click mislocalization is reliable across a wide range of click rates
In order to evaluate the reliability of the click mislocalization probability, we calculated it separately for different trials in Ref. [2] based on the click rates used in the trials.
For each rat in Ref. [2], we fit all parameters of the trial-by-trial model presented in the main text using all trials performed by the rat. In this dataset, each rat performed a mixture of trials with different click rates that were randomly interleaved. Different rats performed differently sets of click rates titrated to their performance. We then fit the model separately to each subset of trials with the same click rate. On each of these click-rate specific subsets, we fit only the three parameters that impact click mislocalization: sensory noise σ 2 s , adaptation strength φ, and the adaptation time constant τ φ . For each model fit, we then computed the click mislocalization as defined in the previous section.
Supplementary Figure 11 shows the results. The vertical pink line shows the click rates for the dynamic task, and the horizontal line shows the value of n used in our theoretical analysis. Click mislocalization is reliable across a wide range of click rates.
average rat sensory noise in Brunton, 2013 click rates in dynamic task Supplementary Figure 11: Click mislocalization across click rates is reliable. Click mislocalization was calculated separately for trials with different click rates in Ref. [2]. Click mislocalization is constant across a wide range of click rates.

Supplementary Note 7 -Alternative sensory noise models
In this section we provide details on sensory noise. We provide analysis of different ways to parameterize sensory noise. The main analysis in the text derives optimal inference given sensory noise that is discrete, clicks are either localized on one side or the other. It is easy to imagine many other forms of sensory noise, including Gaussian fluctuations in the click amplitude, or simply missing clicks. Here we demonstrate by evaluating the log-evidence term that decreases in click reliability are primarily driven by click mislocalization, not fluctuations in the perceived amplitude of the clicks, or missed clicks.
Finally, we provide a general argument for why only click mislocalization matters

Click reliability with Gaussian sensory noise
Consider Gaussian noise where the clicks played from the right/left are perceived with amplitude given by N (±µ, σ 2 ). Here we interpret clicks with positive amplitude as right clicks, and negative amplitude as left clicks. Note that if σ 2 is sufficiently large, clicks will be mislocalized.
To compute the reliability of an individual click with a specific amplitude fluctuation, ξ, we need to compute the probability of a click generated on the right being observed with amplitude ξ: P r (ξ), as well as the probability of a click generated on the left being observed with amplitude ξ: P l (ξ). Formally we need to integrate the Gaussian probability density function over a small window centered at ξ.
This expression for κ seems hard to interpret, but notice what happens if P l = 0.
In this case, P r drops out entirely, and we get the same value of κ as the no-noise case. This demonstrates that click mislocalization is necessary for a decrease in click reliability.
Next, we will compare how the Gaussian click reliability scales with the rate of mislocalization. We generated a dataset of trials where each click had an amplitude drawn from a Gaussian distribution. We asked what was the accuracy of the nonlinear inference using the Gaussian click reliability derived above, and what is the discounting rate of the best linear discounting agent? We refer to this as "quenched Gaussian noise," the meaning of quenched is explained below. We then considered a second dataset where the Gaussian amplitudes were thresholded to either be ±1 reflecting whether the amplitude was above or below 0. We refer to this as "discrete noise." We compute the click mislocalization probability for corresponding to each Gaussian variance σ 2 by: Supplementary Figure 12 shows the results of the comparison. The discrete noise has a slight decrease in accuracy, and a slightly smaller discounting rate. The difference is due to clicks that weakly change sign. The discrete noise doesn't distinguish between small and large amplitude clicks, where the quenched Gaussian noise does. Importantly, in the noise regime we expect the rats, there is no difference between these interpretations of sensory noise.

Unquenched Gaussian noise in the trial-by-trial model
Gaussian noise subjects the clicks to large amplitude fluctuations in how they are perceived. Our trial-by-trial model handles these fluctuations slightly differently from the normative theory outlined in the section above. First, observe that in the optimal inference theory, the evidence reliability term quenches large amplitude fluctuations. Following the derivation in the section above, κ (r 1 , r 2 , P r , P l ) is bounded between ±κ (r 1 , r 2 ), so the evidence added to the accumulation variable after each click is bounded ("quenched") and not subjected to large amplitude fluctuations.
Second, we asked whether the presence of large amplitude fluctuations of click amplitudes if they are not quenched, would cause a linear approximation to favor a stronger evidence discounting in order to damper the fluctuations. Specifically, we asked whether an evidence discounting agent with unquenched Gaussian noise: would maximize accuracy with a larger λ than the same click mislocalization strength implemented as quenched noise in the normative theory. Quenched noise as properly implemented in the normative theory would look like: Supplementary Figure 12 shows a comparison between quenched and unquenched Gaussian noise. We find no difference between these interpretations. In panel B, the accuracy of the unquenched Gaussian noise is from the best linear discounting agent, because we do not have a normative theory for unquenched noise (precisely what the simulation was asking to compare).

Click reliability with missed clicks
An alternative form of sensory noise might parameterize the probability that a subject just fails to hear a click at all. Using this framework, we show that missed clicks doesn't change the click reliability function. Assume a click that is generated is not detected at all with probability m. Then, the click reliability of a click on the right can be computed as: We can interpret this expression as the probability of having a click be generated on one side and not missed and a click generated on the other side and missed, or the probability of a click being generated on one side and not being missed and no click is generated on the other side. Given that ∆t << 1, we can remove second order terms in ∆t: We find the click reliability is independent of the probability of missing a click, m.

A general argument for click mislocalization
In the previous sections we demonstrated that in the case of missed clicks, or gaussian clicks, mislocalization is necessary for decreasing click reliability. Here we provide a general argument for why that is true under any form of sensory noise. The auditory evidence takes on two possible values S = {+1, −1}.
Let y be the value of each auditory stimuli after being noisily encoded by the sensory transduction process (y = f (S)). If f () maps left and right clicks separately into non-overlapping distributions of click amplitudes, then an ideal observer can bin y into groups y < 0 and y > 0, and perfectly recover the original signal S. If f () maps left and right clicks into overlapping distributions, then an observer cannot bin y to perfectly recover the original signal. If the observer uses the same binary binning scheme as before, then the error rate in the recovered signal will be equal to the mislocalization rate. Notice that an observer with perfect knowledge of the distribution of f () can do slightly better by using a different binning scheme. If the observer recognizes that clicks in the domain where the left and right distributions overlap are less trustworthy, then the observer can use multiple bins to discount specifically those clicks near 0. The Gaussian reliability function above κ(r 1 , r 2 , P r , P l ) can be considered an observer with an infinite number of bins. As seen in Supplementary Figure 12, this strategy slightly improves accuracy above the two-binning strategy. We thus conclude that click mislocalization is the source of decreasing click reliability from sensory noise.

Supplementary Note 8 -Linear approximation details
We performed three analyses in order to demonstrate that the linear approximation is difficult to distinguish from the full nonlinear inference process on a trial-by-trial basis. First, we compared the cross-consistency between models to the self-consistency of each model across repeated simulation of the same trials with different sensory noise realizations. The nonlinear inference process, and the linear approximation were simulated on a dataset of 30,000 trials. The simulations were run 20 times with different noise realizations on each simulation. We calculated on average how self-consistent the linear (71.98% ± 0.48) and nonlinear (72.87% ± 0.52) models were across noise realization. In addition, we calculated the cross-consistency between the linear and nonlinear models when the noise realization was different (72.40% ± 0.50), or the same (96.98% ± 0.18). The value reported is the average percentage agreement between all possible pairs in each group with 95% confidence intervals. Given that the cross-consistency between models is the same as the self-consistency for each model, we conclude the two models are hard to distinguish, and have similar predictive ability on new trials. Importantly, the linear model achieves similar behavior with one less parameter.
Our second analysis looked at the small set (3%) of trials where the nonlinear and linear models disagreed when given the same noise realization. Trial difficulty on our task is determined by the length of time since the last hidden state switch. We found that the two models disagreed slightly more often immediately after a state transition, but overall disagreements were spread across all trials (Supplementary Figure 13C). We then computed the chronometric curves which show model accuracy as a function of time since the last hidden state switch. We find the two models have overlapping confidence intervals at all time points (Supplementary Figure 13B). We conclude it would be difficult to distinguish these two models on this dataset. The models disagree slightly more often immediately following state changes, but consistently disagree at the same rate across time.

Supplementary Note 9 -Psychophysical reverse correlation details
Here we present four control analyses on our reverse correlation method. First, we show our reverse correlation derived time constants with confidence intervals. Main figure text omits them for readability.
Second, we show that our method is not biased by the presence of a lapse rate, unlike logistic regression.
Third, we rule out degenerate strategies like deciding based on only the last click. Fourth, we rule out an alternative hypothesis to explain the observed discounting rates based on poor estimation of the environmental volatility.
Supplementary Figure 15: Reverse Correlation Timescales with uncertainty. Same analysis as Figure 4B, but with 95% confidence intervals on the exponential time constant fits.  Supplementary Note 10 -Ruling out last click strategies One possible concern is that the rats might be relying on degenerate strategies like choosing based on the last click they heard. Or that the rat's integration timescale is so short, that their behavior shouldn't really be considering integration. Supplementary Figure 17A shows a quasi-fixed point analysis of the optimal accumulation equation given a noise level. Assuming the environment stays in one state for a long time, we then replace the evidence term with the expected rate of clicks, and solve for the steady state accumulation value. We can see that for all noise levels, the fixed point lies above 1 click, so the optimal behavior necessarily involves integrating clicks. For the average rat noise level, we see integration of about 5 clicks. We conclude that the ideal strategy involves integrating many clicks.
Supplementary Figure 17B shows the recovered discounting rate from the reverse correlation method against a simulated discounting agents, similar to Figure 3. Here, we include much stronger discounting agents, and find the recovered discounting rate asymptotes at just under 36 Hz, which is the expected total click rate(r 1 − r 2 ≈ 36). The last click strategy could be considered a discounting agent with an infinite discounting rate, and would be recovered in our analysis as a discounting rate of about 36. We find our rats are well away from this limit. Thus we confidently rule out a last click strategy. Supplementary Note 11 -Ruling out poor estimates of environmental volatility Ref. [4] examined human discounting behavior on a comparable task, finding that human subjects failed to discount at the optimal rate due to poor estimates of the environmental volatility. Our linear approximation to the full inference process does not directly parameterize environmental volatility.
In order to evaluate whether our rat's discounting rates could be explained by a poor estimate of environmental volatility rather than sensory noise, we trained two rats on trials with r 1 = 40hz, and r 2 = 0hz. If the rats had no sensory noise, then the reliability of each click at informing the current state becomes infinite κ(r 1 , r 2 ) = log r 1 r 2 = ∞. An infinite click reliability means the rats could achieve perfect accuracy by choosing the side based on the last click played. Crucially, with an infinite click reliability, the estimate of the environmental volatility becomes irrelevant. If the observed discounting rates in the main text were driven by poor estimates of the environmental volatility and not sensory noise, then in this high reliability environment the rats would make their choices based on the last click. However, if sensory noise is present, then the theory presented in the main text predicts the rats should discounting evidence at a rate comparable to the main results. Supplementary Figure 18A shows the reverse correlation discounting rates, comparable to Figure 4B. In addition, the discounting parameter for the model fit to these rats is displayed. We see the rats do not adopt a last-click strategy, consistent with sensory noise being the primary driver of the rat's discounting rates. Supplementary Figure 18B shows the session by session accuracy for each rat, showing no trend of increasing accuracy or adaptation to the high click reliability environment. Supplementary Note 12 -Trial-by-trial model details In this section we provide additional analysis on our trial-by-trial model. First, we provide parameter estimates and parameter uncertainty values for each rat, and compare them to rats from Ref. [2] (Supplementary Figure 19, Supplementary Table 1)   The discounting parameter for each rat in blocks of 7500 trials over changing environmental statistics. Each rat was moved from a hazard rate = 0.5Hz environment, to a static environment, and back to a 0.5 hz environment. Rat H034 was removed from training during the 0Hz block and thus did not perform the return to 0.5 hz trials. Rats H045 and H046 were first moved from a 1Hz environment to a 0.5 Hz environment before moving to a static environment.