Trait anxiety is associated with hidden state inference during aversive reversal learning

Updating beliefs in changing environments can be driven by gradually adapting expectations or by relying on inferred hidden states (i.e. contexts), and changes therein. Previous work suggests that increased reliance on context could underly fear relapse phenomena that hinder clinical treatment of anxiety disorders. We test whether trait anxiety variations in a healthy population influence how much individuals rely on hidden-state inference. In a Pavlovian learning task, participants observed cues that predicted an upcoming electrical shock with repeatedly changing probability, and were asked to provide expectancy ratings on every trial. We show that trait anxiety is associated with steeper expectation switches after contingency reversals and reduced oddball learning. Furthermore, trait anxiety is related to better fit of a state inference, compared to a gradual learning, model when contingency changes are large. Our findings support previous work suggesting hidden-state inference as a mechanism behind anxiety-related to fear relapse phenomena.


Differences between studies Supplementary
Trait anxiety distributions across the three studies.

Experiment 1
Inclusion criteria -Participant is willing and able to give informed consent for participation in the study -Healthy adults -Male or Female -Aged 18 to 40 years -Not currently taking any medications (except the contraceptive pill) -Right handed Exclusion criteria (if any apply) -History of neurological or psychiatric illness (including chronic or remittent pain) -Contraindications to MRI scanning (including but not limited to a history of claustrophobia, certain metallic implants and metallic injury to eye) -Pregnancy or are likely to become pregnant during the study -Recent or current use of psychoactive substances (in the past 3 months)

Experiment 2 Inclusion criteria
willing and able to provide informed consent male or female, aged 18-40 body mass index (BMI) of 18-30 kg/m2 fluent English skills non-or light-smoker (< 5 cigarettes a day) Exclusion criteria -Female participant who is pregnant or breast-feeding -CNS-active medication during the last 6 weeks -Current blood pressure or other heart medication (especially aliskiren or beta blockers) -Intravascular fluid depletion -Past or present DSM-IV axis-I diagnosis -Alcohol or substance abuse -First-degree family member with a history of a severe psychiatric disease -Impaired liver or kidney function -Lifetime history of epilepsy or other neurological disease, systemic infection, or clinically significant hepatic, cardiac, obstructive respiratory, renal, cerebrovascular, metabolic, endocrine or pulmonary disease or disorder which, in the opinion of the investigator, may either put the participants at risk because of participation in the study, or may influence the result of the study, or the participant's ability to participate in the study. -Insufficient English skills participated in another study involving certain medication during the last 6 week

Experiment 3
Inclusion criteria -Participant is willing and able to give informed consent for participation in the study -Healthy adults -Male or Female -Aged 18 to 40 years -Not currently taking any medications (except the contraceptive pill).
-Right handed Exclusion criteria -History of neurological or psychiatric illness (including chronic or remittent pain) -Pregnancy or are likely to become pregnant during the study -Recent or current use of psychoactive substances (in the past 3 months) Pain Catastrophizing Scale (PCS) 5 Centre for Epidemiological Studies Depression Symptoms Index (CES-D) 6

Instructions
While the exact wording of the instructions is slightly different for the three studies, the following key points are maintained: 1. Each of the images is associated with a probability to receive an electrical stimulation 2. This probability can change for any image at any point, so you need to keep paying attention. 3. The breaks in the task don't signal change in probability or restart the task -it is one learning experience all through (This is slightly different in Study III where we explicitly say that each of the three sessions is starting from a scratch)

Supplementary Note 1: Ratings
Average ratings in stable cues Supplementary

Switch point and steepness
To isolate subject-specific learning metrics, we used the trial-by-trial expectancy ratings to estimate switch point and switch steepness for each individual change in probability. Only trials from the reversal cue were included in this analysis. For each switch, we extracted 5 trials prior and 10 trials post reversal and demeaned the time-series. Using this 'chunk', we determined the point of steepest change ('switch point') by calculating the cumulative sum and identifying the point of highest (high-to-low switch) or lowest (low-to-high switch) value in the series.'' Next, a smaller chunk of 10 trials (5 preceding and 5 following) around the identified switch point was extracted. A sigmoidal curve (Eq. 1) with a free parameter for steepness was fitted to this smaller chunk. X corresponds to the time series of 10 trials, b is the inflection point which was fixed to b=5.5 (midpoint) and a represents the steepness. Eq The fitted value for a was recorded and log transformed before being used as an estimate for switch steepness in the main analysis.
As reported in the main text, the reversal time of the switching cue was semi-random and not cued, and participants had to infer reversals from observations. This was likely to cause delays before updating could begin, and to vary between participants and even between reversals. Because averaging over variable time courses can lead to incorrect conclusions e.g., 12 , we estimated the time-point of the steepest change post-reversal separately for each participant and reversal. This also allowed us to estimate the steepness at that point using a sigmoidal fit. This analysis provided an estimate of how many trials after the reversal ratings began to change as well as an estimate of how fast ratings changed once the change began (steepness). Splitting data by estimated switch point and steepness (Supp Figs 2a and 2b) showed that our approach indeed captured variability in these two aspects. Analyzing the steepness of estimated switches, it increased across sessions (-2.30, -2.10 and -1.4 on log scale), as reflected in a main effect of session, F(2, 205)=20.80, p<.001, η 2 p=.16 [.09, .25]. This was driven by significantly higher steepness in the 90/10 condition compared to 60/40, t(226)=-6.14, p<.001 η 2 p=.14 [.06, .22], and to 75/25, t(181)=-4.61, p<.001, η 2 p=.10 [.04, .19]. The steepness was also positively associated with trait anxiety TA, F(1,94)=7.39, p=.008 η 2 p=.07 [.01, .19], indicating that more anxious participants adjusted their shock probability ratings faster than less anxious participants. Turning to analysis of the estimated switch point. The average switch occurred 2.91, 2.90 and 3.28 trials after the true reversal (60/40, 75/25, 90/10 respectively). There was no relationship between the estimated switch point, session and TA. This indicates that while high trait anxious participants performed more abrupt switches, these did not occur earlier or later compared to individuals lower in trait anxiety, and this was true for all sessions.

Relationship between stable cues and the reversal cue
To check whether the anxiety effects in stable cues and the two phases of the reversal cue were related, we correlated the difference in stable-high and stable-low cues ("stables difference") with the difference in low-and high phases ("phase difference"). There was a positive association in all three sessions (r

Meaningful versus oddball learning
To assess the impact of the selection window more systematically, we used alternative decision criteria (i.e., cut-offs at trials 5. 7, 10 and 13) to generate meaningful/oddball trials (see the calculated learning rates for the four analyses plotted separately for each condition below). Notably, the different cutoffs produced very similar results. They all clearly capture the main effect difference between meaningful and oddball learning. Indeed, using a formal LMM, there was no significant main effect or interaction involving the cutoff, all ps > 0.9.

Supplementary Figure 3 Selection window for oddball categorization.
Model-free learning rates categorized by oddball (surprising events during relatively stable periods that don't signal change in environment) and meaningful trials (trials following a true reversal). The different colors represent different cutoff points used to classify meaningful and oddball trials (5-7-10-13). The three conditions contain N=36, N=88 and N=37 participants. Boxplots indicate the median and interquartile range (IQR, i.e., distance between the first and third quartiles). The lower and upper hinges correspond to the first and third quartiles (the 25th and 75th percentiles). The upper whisker extends from the hinge to the largest value no further than 1.5* IQR from the hinge. The lower whisker extends from the hinge to the smallest value at most 1.5* IQR of the hinge.

Internal parameters of the winning model
To better understand the fitted n-state model, we explored it's behavior and fitted parameters in more detail. First, we analyzed whether the number of states the fitted model created was related to TA, but this was not the case (mean number of states high TA: 1.82 vs. 1.81 in low TA). Second, we investigated the timepoint when the model tended to switch states. A corresponding LMM found no effect of session or anxiety, consistent with our behavioral results reported above (see "estimated switch point"; model inferred switch points were 2.93, 3.20 and 3.19 trials after the true reversal, for the three sessions). Next, we analyzed the fitted step sizes for the positive and negative outcomes τ + and τ − . A LMM with parameter type (τ + / τ − ), TA and session as fixed effects found a significant main effect of outcome type, F(1,228)=37.03, p<.001, η 2 p=.14 [.07, .22], which reflected that shocks elicited larger updates than no-shocks τ + =1.13 vs. τ − =0.73). There was no main effect of TA, F(1,83)=.21, p>.05, η 2 p=.00 [.00, .06], or interaction of outcome type and TA, F(1, 228)=.16, p>.05, η 2 p=.00 [.00, .02]. Note that the same two parameters of the 1-state model, had a similar difference τ + =1.17 vs.τ − =0.86), suggesting that differential learning from shock and no-shock events alone was unable to explain our behavioral effects of TA. Lastly, no effects of session, TA or interaction where found when analyzing the remaining parameters η (switch threshold), 0 and 0 .

Experienced uncertainty
One possibility is that a drive to reduce uncertainty in high TA motivates increased state inference. We used the standard deviation of the current state (calculated based on trial-specific alpha and beta) summed across trials as an overall estimate of uncertainty. Using a LMM, we then explored the relationship between uncertainty and anxiety. The model found no significant association between uncertainty and anxiety, or interaction with session. Albeit not significant, there was a general trend towards a negative association between TA and experienced uncertainty in the 90/10 condition as one would expect from an agent which successfully infers state structure (see plot below). We also found a weak positive association between TA and uncertainty in the 60/40 condition.

Supplementary Figure 4 Experienced uncertainty.
Relationship between experienced uncertainty (estimated by the n-state model) and trait anxiety separately for the three sessions. Trait anxiety did not relate to experienced uncertainty in any of the sessions. The three panels include N=36, N=88 and N=37 data points. Shown correlation coefficients reflect Pearson's correlation (two-tailed) and uncorrected p-values.

Differences between studies
To compare difference between studies we ran a number of analyses using the 75/25 condition. We did not find any significant difference between studies. However, the trait anxiety scores were generally lowest in the drug study. Therefore, there are only 6 high TA participants in Study 2.
The uncertainty in high TAlow phase condition is too high to conclude whether we see the same effect as in the other two studies. Apart from the behavioral data, we also looked at differences between studies in the slope after reversal. As reported in the main text we see a main positive effect of trait anxiety. There is no interaction between study and TA.
Supplementary Figure 5 Ratings across studies in the 75/25 session. Mean ratings for trials 10+ after reversal split by phase (high/low), session (60/40, 75/25, 90/10) and median-split trait anxiety. Statistical test was performed using LMM model and follow-up ANOVAs. The three conditions contain N=36, N=88 and N=37 participants. Boxplots indicate the median and interquartile range (IQR, i.e., distance between the first and third quartiles). The lower and upper hinges correspond to the first and third quartiles (the 25th and 75th percentiles). The upper whisker extends from the hinge to the largest value no further than 1.5* IQR from the hinge. The lower whisker extends from the hinge to the smallest value at most 1.5* IQR of the hinge.  Figure 6 Proportions of participants best fitted by each model. Percentage of participants best fitted by the 1-state (purple) and the n-state (green) models separately for each session.

Best model fit by condition
To assess how model fit developed within individual over the three sessions we binned participants according to the best fitting model in each condition. In the plot below "1" stands for gradual (1-state) learning while "n" stands for structure learning (n-state). For example, "1-1-n" therefore represents a case where in the 60/40 and 75/25 conditions the 1-state model fitted best and in the 90/10 condition the n-state model fitted best. Summarizing data in such a way allows us to assess strategy switches between sessions. We plot the results below.
Interestingly, there appears to be a high degree of internal consistency -bins with the same strategy across the three sessions (1-1-1, n-n-n) seem to stand out (38%; chance level 11.1%). This is followed by a group of participants which relied on gradual learning in the noisier conditions but that employed structure learning in the 90/10 condition (1_n_n and 1_1_n; together 32%).

Out-of-sample model fit and behavioral markers
In the main text we report the relationship between two key markers of state inference (slope and differential learning from oddballs compared to trials after reversal). To check generalizability of model fit in relation to these behavioral markers we fitted the models to the first half of trials (phases 1-3) and correlated the relative model fit with behavioral measures derived from second half of each session (phases 4+). In both cases, the behavioral marker from the second half correlated with the first half relative model fit. Specifically, the Spearman's rho correlation coefficients were r=.332, p=.002 for slopes and r=.21, p=.046 for meaningful/oddball.

Order effects
We investigated whether the order of sessions impacted the relative fit of gradual and state inference model.. Specifically, focusing on order effects in the relationship between model fit and trait anxiety, we found no significant main effect or interaction of order. We present the sessions by the order they occurred in in the figure below. Additionally, we specifically investigated the tendency to use state inference in the 60/40 condition depending whether this occurred first or whether it was preceded by 90/10. The mean reliance on state inference was numerically higher when 90/10 preceded 60/40 (deltaBIC = 16.4) compared to when 60/40 occurred first (deltaBIC = 6.86), but this effect was not statistically significant, t(42)=1.61, p=.11, η 2 p=.06 [.00, .23] (Supp Fig 10a).

Raw data for most gradual and SI participants
To verify that the two models capture gradual (1-state model) and state inference (n-state) behavior we calculated the BIC score difference between 1-and n-state models for each participant. As reported in the main text (Fig. 6a) participants better fitted by the n-state showed steeper learning. Here, we extend this report by presenting the raw data of the "most gradual" and "most switchy" participants (See Supp. Fig 11). Visual inspection of the data helps us further validate the model fits: participants with positive BIC difference (green) show large jumps in contingencies following reversal while participants with negative BIC difference show shallower learning curves.

1-state model, Pearce-Hall and Rescorla-Wagner
The 1-state model behaves very similarly to commonly used gradual learning models such as Pearce-Hall 13,14 . We opted to use the beta-based model for gradual learning as this allowed us to minimize technical differences between the 1-state and n-state models. To check that the 1state model performs similarly to Pearce-Hall (PH) and Rescorla-Wagner (RW; later 'associative learning models') we fitted both models to the data. Because the 1-state model and the associative learning models use different likelihood functions (the beta-based models use the beta distribution likelihood while the associative learning models use the root mean squared error) we used likelihood-independent measure to assess model fit: correlation between data and model predictions. Using this measure we found no difference between models, F(2,379)=2.04, n.s. (Supp. Fig. 13a). Next, we used the fitted parameters to generate posterior predictive checks for all three models (Supp. Fig. 13b). These show that all three models in fact fit the data in a very similar manner.
An interesting aspect of the 1-state model is that it can closely approximate both RW and PH models. The main difference between the associative learning models is stationarity of the learning rate. In RW, the learning rate is the same for all trials while in PH it changes depending on the previous prediction error (i.e., 'associability'). Under PH, learning rates increase after reversal and decay as prediction errors decrease over time. In the case of our beta-based 1-state model, a constant amount τ is added either to alpha or beta on each trial. While intuition might suggest that this results in constant learning rates, in fact, learning rates behave very similarly to those of PH -larger updates occur after reversal. For demonstration of this, see Supplementary Figure 12.
Supplementary Figure 12 learning rates and probability estimates simulated using the 1-state model. Data generated using the 1-state beta model (blue) with parameters α 0 = 4, β 0 = 4, λ = 0.8, and τ +/− = 0.5. Learning rates calculated for each trial show an initial bump after contingency reversal followed by a decay (red). Averaged over 500 simulations.