Towards an emotional ‘stress test’: a reliable, non-subjective cognitive measure of anxious responding

Response to stress or external threats is a key factor in mood and anxiety disorder aetiology. Current measures of anxious responding to threats are limited because they largely rely on retrospective self-report. Objectively quantifying individual differences in threat response would be a valuable step towards improving our understanding of anxiety disorder vulnerability. Our goal is to therefore develop a reliable, objective, within-subject ‘stress-test’ of anxious responding. To this end, we examined threat-potentiated performance on an inhibitory control task from baseline to 2–4 weeks (n = 50) and again after 5–9 months (n = 22). We also describe single session data for a larger sample (n = 157) to provide better population-level estimates of task performance variance. Replicating previous findings, threat of shock improved distractor accuracy and slowed target reaction time on our task. Critically, both within-subject self-report measures of anxiety (ICC = 0.66) and threat-potentiated task performance (ICC = 0.58) showed clinically useful test-retest reliability. Threat-potentiated task performance may therefore hold promise as a non-subjective measure of individual anxious responding.

to distractor stimuli, in line with previous findings. Critically, we predicted that this would be reliable across testing sessions and hence constitute a non-subjective, behavioural measure of individual anxious responding: an emotional equivalent of the cardiac 'stress test' . In order to provide a more accurate estimation of the population level mean and variance -critical prior to any clinical application -we also analysed data from a larger, heterogeneous sample (N = 157).

Results
Test-retest reliability analyses were run on the individual measures across two and three testing sessions (See Tables 2, 3
For those participants who completed two sessions only, the reliability for trait anxiety was significant, with an "excellent" ICC of 0.96 across 2 sessions (F ( Population variance. Threat potentiated accuracy to no-go stimuli (threat minus safe accuracy at correctly withholding a response to distractor stimuli), in 157 subjects revealed increased a population mean of 0.072, median of 0.05, a standard deviation of 0.15 (See Fig. 1,i).

Discussion
We show that threat of shock can reliably shift within-subject cognitive and self-report measures of anxious responding across three sessions and three quarters of a year, improving accuracy to distractor stimuli and slowing down responses to target stimuli. This improvement in accuracy replicates previous studies 5, 8 and an increase in reaction time under threat of shock replicates a line of previous research (see preprint and data * ). Importantly, threat-potentiated performance on this task also shows good within-subject reliability over 2-4 weeks in the full sample and again over a 5-9 months. For reference, this means that the emotional manipulation (i.e. threat of shock) on this task, is considerably more reliable than the emotional manipulation (i.e. emotional stimuli) on the emotional Stroop and dot probe tasks 12,13 (see Table 1).
We note that there is imprecision surrounding the term, 'stress' , which is used to describe the prolonged HPA axis response or psychological experiences (of which there is a perception of a mechanistic link 14,15 ). Considering the time course of the glucocorticoid response and our manipulation, it is unlikely this link would be observed using the present design. We regard our findings as indicative of anxious responding and use the term, 'stress test' , because our goals align with those of the cardiac 'stress test' . 'Stress tests' are of great value in cardiac medicine as  Table 3. Reliability of measures across participants who completed two testing sessions only (N = 28). *p < 0.05.

Individual measures of interest ICC value Repeated measures ANOVA
Accuracy to "no-go" stimuli across safe conditions 0.80* F (21,42)   they are able to identify patients who may be more vulnerable and need closer monitoring around surgery, which in turn leads to improved outcomes 16 . Prior work has shown that participants with clinical anxiety showed impaired "go" accuracy overall compared to controls on this, suggesting that there is an overactive adaptive defensive mechanism in clinical anxiety 17 . Consequently, we suggest that the effect of threat on reaction time observed here could be a reflection of this behavioural inhibition. The impact of induced anxiety on response inhibition could therefore be a behavioural marker for clinical anxiety so, given the reliability of this task, we believe is has clinical potential.
According to the diathesis model of anxiety disorders, when stressful life events are coupled with an underlying vulnerability and a threshold is reached, a disorder is triggered. Understanding individual responses to stress is therefore key to understanding the differences in vulnerability to anxiety disorders. Pathological feelings of anxiety contribute to the most common psychiatric disorders, and it is suggested that over the next 20 years these rates will continue to rise 18 . Identifying vulnerability prior to disorder onset with a non-subjective cognitive task could consequently lower costs and reduce time in treatment. Additionally, cognitive paradigms which show good reliability are important for research, impacting replicability and the accurate interpretation of existing findings 19 .
It should be noted that self-report trait anxiety also has a high ICC in this study. However, interpretation of this is limited due to anchoring effects 20 and demand characteristics 21 . Our task is not obviously subject to these effects and also benefits from being a concurrent (i.e. not retrospective) measure. The poorer test-retest reliability on our accuracy delta variable for the two to four week follow up may be due to a reduction in power resulting from the smaller number of no-go responses, and suggests that go reaction time differences may prove the more reliable target. It is also worth noting that we see poorer test-retest reliability in the sub-sample who only completed two sessions ( Table 3). The reasons for this are unclear, but may be driven by self-selection bias in individuals unwilling or unable to return for a third testing session. Of course, this sample is also underpowered per our power calculation so inference should be approached with caution.
In summary, we argue that the impact of threat of shock on cognition might hold promise as a putative probe of threat sensitivity, and a phenotype of anxious responding.

Method
Fifty healthy participants (25 female, mean age = 26.5, SD = 8.47), completed the SART in two testing sessions, separated by a period of between two and four weeks. Twenty two participants (11 female, mean age = 28.5, SD = 11.00) completed the task for the third time in a follow up session between five and nine months later. A screening procedure prior to participation verified that participants had no history of neurological, psychiatric, or cardiovascular conditions. Exclusion criteria also included alcohol dependence and any recreational drug use in the last 4 weeks.
The methods were identical on each session. Participants provided written informed consent to take part in the study (UCL ethics reference: 1764/001). Prior to participation, subjects were screened to ensure that they had no history of neurological, psychiatric, or cardiovascular conditions. All methods were carried out in accordance with relevant guidelines and regulations and all protocols were approved by UCL ethics committee (reference 1764/001).
An a priori power analysis was run in G*Power 22 . The power analysis was based on previous results of the SART 5 that gave an effect size of 0.56 for the effect of threat of shock on response accuracy to "no-go" distractor stimuli. We wanted 95% power (with alpha 0.05, two tailed) to detect an effect size of 0.56. A power calculation determined that we needed 46 participants. We recruited an extra 4 to allow for ~8% participant drop-off. This sample size also has 99% power to detect a reliability of at least 0.5 (a minimum value we consider acceptable for clinical relevance) at alpha = 0.05 (one-tailed). For the final 5-9 month follow up we showed considerable (56%) drop-off. A post hoc matched t-test power analysis showed that with 22 participants (with alpha = 0.05, two tailed) we had only 70.96% power to detect an effect of this magnitude. Notably, however, this still has 83% power to detect a reliability of at least 0.5 (a minimum value we consider acceptable for clinical relevance) at alpha = 0.05 (one-tailed). As such, this three-session analysis is powered for reliability analysis only. Given that 3 session and full 2 session samples overlap, we also include a separate 2 session analysis on those who only attended twice for completeness.
Anxiety manipulation. Two electrodes were attached to the back of the participants' non-dominant wrist.
A Digitimer DS5 Constant Current Stimulator (Digitimer Ltd., Welwyn Garden City, UK) delivered the shocks. A short shock-level work up increased the level of the shock until the subject rated it as "unpleasant, but not painful" 23 . As in previous versions of this task 5 during a threat block, in which the background was red, the participants were told they were at risk of an unpredictable shock (which was independent of their behavioural response). When in a safe block, the background was blue (and participants were told that no shocks would be delivered). Colours were not counterbalanced as prior work has shown this effect to be independent of background colour 24,25 . After completing the task, participants provided retrospective subjective anxiety ratings for the threat and safe conditions. This manipulation check is consistent with current clinical diagnoses (i.e. self-report of symptoms) and is used by many other researchers in the field (for a review see ref. 4).
Task structure. Participants completed a previously used task 5 recoded using the Cogent (Wellcome Trust Centre for Neuroimaging and Institute of Cognitive Neuroscience, UCL, London, UK) toolbox for Matlab (2014b, The MathWorks, Inc., Natick, MA, United States). Participants were instructed to respond to "go" target stimuli ("= ") by pressing the space bar as quickly as possible, and withhold a response to "no go" target stimuli ("O"). They were instructed to make their response using their dominant hand.
Scientific RepoRts | 7:40094 | DOI: 10.1038/srep40094 47 "go" stimuli and 5 "no-go" stimuli were presented in each block. The stimuli were presented for 250 ms, followed by an interstimulus interval of 1750 ms, before presentation of the next stimulus. There were 8 blocks in total, alternating between threat and safe blocks (order counterbalanced). Each block lasted 104 seconds (See Fig. 2). For 3 seconds at the beginning of each block, "YOU ARE NOW SAFE FROM SHOCK!" or "YOU ARE NOW AT RISK OF SHOCK!" appeared on the screen. Participants received a shock in the first threat block, (after trial 45), the second threat block (after trial 8), and the fourth threat block (after trial 17). Total task duration was approximately 14 minutes and 30 seconds (task script available online: https://figshare.com/articles/ SART_script/3443093).
Wider sample. Threat-potentiated task performance data from a larger (n = 157) heterogeneous sample collected across UCL, UK and NIH, USA are also presented to explore population level statistics.
Statistical Analyses. Reaction time and accuracy data (data available online: https://dx.doi.org/10.6084/ m9.figshare.3398764.v1) were analysed using repeated-measures general linear models in SPSS version 22 (IBM Crop, Armonk, NY). For all analyses, p = 0.05, was considered significant. Performance accuracy for each condition (threat/safe) and trial type ("go"/"no-go") was calculated by dividing the number of correct trials by the total number of trials. As "go" accuracy was 97.5% across two sessions, only "no-go" trials were included in the accuracy analysis. Reaction time analysis was performed on "go" stimuli only as, by definition, "no-go" reaction times are limited and restricted to error trials.
For the first two fully powered sessions, repeated measures ANOVAs were run to investigate reaction time and accuracy differences across conditions. Due to lack of power resulting from attrition (see above) these were not run for the third session.
Task reliability over two and three sessions was tested using two-way mixed model ICCs run in Matlab (2014b) using an "Intraclass Correlation Coefficient" script (http://uk.mathworks.com/matlabcentral/fileexcha nge/22099-intraclass-correlation-coefficient-icc-). This determined whether the influence of threat of shock on various performance measures remained consistent in individuals between testing sessions. In accordance with 11 an ICC coefficient was considered 'fair to good' if between 0.4 and 0.75, and 'excellent' if above 0.75. In our power calculation we deemed 0.5 the minimum reliability required for clinical relevance. Reliability analyses were completed on the critical delta variables (the difference between threat and safe condition for that variable) to look at the reliability of the threat-potentiated effect for reaction time, accuracy and shock rating. Analyses were also run for shock level and trait anxiety scores (for which there are only one measurement per session, so no deltas). Estimates for each condition separately demonstrating the reliability of the individual measures themselves are presented in Table 2, 3 and 4. Figure 2. Participants were instructed to press the space bar as quickly as possible for "go" stimuli and withhold responses to infrequent "no-go" stimuli. (A) Participants received an unpredictable electric shock (independent of behavioural response) during the threat condition. (B) Participants were not at risk of shock during the safe condition.