Real-world stress resilience is associated with the responsivity of the locus coeruleus

Individuals may show different responses to stressful events. Here, we investigate the neurobiological basis of stress resilience, by showing that neural responsitivity of the noradrenergic locus coeruleus (LC-NE) and associated pupil responses are related to the subsequent change in measures of anxiety and depression in response to prolonged real-life stress. We acquired fMRI and pupillometry data during an emotional-conflict task in medical residents before they underwent stressful emergency-room internships known to be a risk factor for anxiety and depression. The LC-NE conflict response and its functional coupling with the amygdala was associated with stress-related symptom changes in response to the internship. A similar relationship was found for pupil-dilation, a potential marker of LC-NE firing. Our results provide insights into the noradrenergic basis of conflict generation, adaptation and stress resilience.

(13 items, adapted for the present study from Weiss et al., 2010). The experienced severity of these events was individually quantified on a scale ranging from 1 (not stressful) to 4 (extremely stressful). In order to derive an individual final score for adverse events, we multiplied their number and severity for each participant and correlated this measure with observed individual symptom changes.
This revealed no significant relationship between stressful event exposure during the internship and mean depression symptom changes (Pearson-correlation p=0.346; R=-0.139). However, we did find that mean anxiety symptoms increased significantly with experienced adverse events (Pearson-correlation p=0.006; R=-0.393, see supplemental methods above for details).
We tested whether this explained any additional variance above and beyond our original prospective predictors, by including adverse events in our original full model. This showed that adverse events did not explain any substantial variance (R 2 ) or adjusted variance (adj R 2 ), while both LC and LC-Amygdala connectivity remained strong predictors of anxiety symptom changes (See supplemental table 4). These findings also did not depend on the choice of LCmask (1SD or 2SD).

Stimulus presentation
All stimuli were displayed on a grey projection screen (using the Cogent2000-toolbox, http://www.vislab.ucl.ac.uk/cogent_2000.php, implemented in Matlab, The MathWorks, Inc., Natick, Massachusetts, United States) that participants viewed by means of a mirror system mounted atop the MR head coil. The task comprised 50 congruent, 50 incongruent, and 20 neutral (grey display without a face) trials, presented in pseudorandom order and counterbalanced for equal numbers of congruent-congruent (CC), congruent-incongruent (CI), incongruent-congruent (IC), and incongruent-incongruent (II) stimulus pairings. To avoid any priming effects, there were neither direct repetitions of the same face with varying word distracters nor direct repetitions of exact face-word-distracter combinations 6,7 . Genders, identities, and affective expressions on the faces were randomized throughout the task and stimulus occurrences were counterbalanced across trial types and response buttons. Subjects were instructed to respond as fast as possible to the stimulus by pressing one of two buttons (left: happy, right: fear or vice-versa) on an MR-compatible response box, while trying to maintain high accuracy.

Data analysis Behavioral analyses.
We adopted identical behavioral analysis procedures as the previous work investigating behavioral and neural conflict processing [8][9][10] . Behavioral data consisted of both reaction times (excluding error and post-error trials) and accuracy rates. A response was considered correct when the emotional valence of the face expression was correctly identified. Trials with response times above 2 standard deviations from the mean (across all trials) were excluded from analysis (and regarded as trials of no interest in the fMRI-models [8][9][10] , see below). 45 out of 48 Participants viewed 200 trials (2 runs, 100 trials per run). Some subjects missed a few trials and 3 subjects did only 1 run (100 trials). Since we removed trials with an RT larger than 2 standard deviations above the mean, the median trial number was 190 trials (minimum: 94, maximum: 196 trials). For the congruency-sequence-effect analysis, we mean-centered the RTdata for each individual to focus on within-subject variability of response times.

fMRI image acquisition.
Subjects performed two fMRI sessions of the emotional stroop task, each lasting 9.75 minutes.
During each session, we acquired 225 T2*-weighted whole-brain echo planar images using a Philips Achieva 3 T whole-body scanner (Philips Medical Systems, Best, The Netherlands) equipped with an 8-channel Philips sensitivity-encoded (SENSE) head coil. Imaging parameters were: 2600 ms repetition time (TR); 37 slices (transversal, ascending acquisition); 2.6 mm slice thickness; 2.5 mm x 2.5 mm in-plane resolution; 0.65 mm gap; 90° flip angle. To measure at fully equilibrated magnetic field, five dummy-image excitations were performed and discarded before functional image acquisition started. Additionally, we acquired a highresolution T1-weighted 3D fast-field echo structural scan used for image registration during post-processing (sequence parameters: 181 sagittal slices; matrix size: 256 x 256; voxel size: 1 x 1 x 1 mm; TR/TE/TI: 8.3/2.26/181 ms). fMRI image pre-processing.
Image preprocessing and analysis were conducted using SPM8 (Wellcome Trust Centre for Neuroimaging). Functional images were slice-time corrected (to the middle slice acquisition time) and realigned (accounting for subjects' head motion). Each subjects' T1-weighted structural image was co-registered to the mean functional image and normalized to the standard T1-MNI template using the "Unified Segment" procedure provided by SPM8 11 . The procedure incorporates spatial normalization and tissue class segmentation within the same model so that an optimal solution is found for both within the same framework. The procedure uses 6 tissue probability classes for MW, GM, CSF, skull, soft tissue and other (i.e.: eyes). These standardized probability maps for different tissue classes were constructed from a large number of brains that are registered into a common space. In the 'unified segment' Bayesian framework, these maps represent the prior probability of any voxel belonging to a particular tissue class (priors). The procedure warps the standard tissue probability maps to match the current subjects' maps by maximizing their mutual information. The inverse transform can then be used to normalize the functional images to standard MNI space. A recent report stated that when taking prior tissue class information into account, the 'unified segment' approach as implemented in SPM outperforms several other methods in both precision of registration as well as tissue classification 12 . The functional images were normalized to the standard MNI template using this transformation, spatially resampled to 2.5 mm isotropic voxels, and smoothed using a Gaussian kernel (FWHM, 6mm).

Eye measures.
During scanning, eye movements were sampled at 250Hz using an MR-compatible infrared EyeLink II CL v4.51 eye-tracker system (SR Research Ltd.). In order to account for the effects of eye movements and blinks on BOLD responses, we added them as regressors of no interest to the general linear model (see below). Saccades were defined as eye movements larger than 0.5 degrees visual angle 13 . Blinks were defined as periods of signal loss lasting longer than 80 ms and shorter than 2000 ms 14 ; these epochs were removed from the pupil data and filled in by linear interpolation.

Pupil dilation cluster-correction.
To identify time windows during which the pupil dilation significantly differs between relevant trial types (I>C, CI>II, IC>CC, Figure 6), while avoiding false positive clusters, we applied Bonferroni-correction 15,16 via a cluster-based permutation test following 17 . We used a clusterforming threshold of T = 2.02 corresponding to a two-sided p-value of 0.05 given 47 degrees of freedom (N=48). The procedure first calculates the one-sample t-statistic across all participants' average difference between relevant contrasts for each 4-millisecond bins in the pupil dilation time series. Next, the size of continuous temporal clusters, defined as the number of adjacent time bins exceeding the cluster-forming threshold were identified and tested against cluster sizes observed by chance. To this end, a null distribution of cluster sizes was generated by permuting the labels for each trial and time bin within-participant by flipping the sign of each time bin randomly 1000 times and recomputing the t-statistic across all time bins for each iteration. On each iteration the largest permuted temporal cluster was identified and stored in the null distribution. A cluster corrected p-value is computed by dividing the number of clusters in the null distribution exceeding the number of clusters in the data by the number of iterations.
We estimated a general linear model (GLM) to identify regions associated with arousal upregulation, defined as the response difference between CI and II trials. This contrast was used to test whether LC-NE arousal system responsivity predict individual stress resilience.
The GLM contained four indicator functions placed at the onset of each of the possible trial types, based on current and previous trial congruency (CI, II, IC, & CC). For instance, CI is an incongruent trial (I) preceded by a congruent trial (C) while II represents an incongruent trial (I) preceded by an incongruent trial (I), and so forth. An additional indicator function modelled the onsets for trials of no interest, which included: the first trial of each session that cannot be classified with respect to preceding trial type, trials with reaction times 2 standard deviations above the participant's overall mean response time, as well as error and post-error trials that may be associated with error-related cognitive processing 9,10 .
Six motion parameters (obtained during the realignment procedure) were also included as regressors of no interest to account for participants' head motion. Furthermore, we included additional regressors that accounted for variance induced by eye-related variables (blinks and saccades), to ensure that neural conflict responses and stress-resilience predictions are not confounded by these variables 18  performed with a random-effects General Linear Model and cluster-level inference based on non-parametric permutation tests and pseudo t-statistics for independent observations within the SnPM framework (http://warwick.ac.uk/snpm). The whole-brain FWE-corrected statistical threshold was set to P < 0.05 with an initial cluster-defining voxel-level threshold of T = 3.275 (equivalent to uncorrected P < 0.001) 17,19 . For hypothesis-guided ROI analysis of the LC-NE arousal system, we applied the identical non-parametric statistical procedure as above restricted to a 2SD-locus coeruleus volume mask 20 . To test for regions previously associated with the CI>II contrast in earlier seminal studies 9,10 (Supplemental

Locus Coeruleus responsivity is a robust and reliable bio-marker for stress resilience
In a final analysis we attempt to quantify the usefulness of the identified bio-markers in predicting stress resilience by asking two questions: First, how much more variance can be explained by our identified biological markers compared to the current gold standard (selfreport surveys assessing either past potentially traumatic experiences or the current symptom level status quo)? To this end we used a multiple GLM-approach by comparing the predictive power of a base model (containing only the scores from the traditionally used clinical resources, i.e.: respective symptom severity survey at T0 and a survey assessing prior experience of adverse and potentially traumatic events) to increasingly more complex models, by adding our behavioral and physiological variables of interest as defined above (classic RT congruency sequence effect (CSE), LC upregulation response (LC), Pupil dilation distance (pupil), and LCamygdala functional coupling during upregulation (LC-connectivity)). Secondly, we ask, which are the most parsimonious parameter combinations to predict individual anxiety or depression symptom change? To this end we use a stepwise-regression approach (Methods).
For a comprehensive list of parameter test-statistics, goodness-of-fit measures and model comparisons please see Supplemental Tables S7 and S8 using the weighted average LC-1SD mask from the physio-corrected, unsmoothed fMRI data as well as tables S9 and S10 for data without these corrections regarding anxiety and depression symptom changes respectively.
Lastly, we assess the out-of-sample prediction accuracy between the base model, full model (containing all parameters) and most parsimonious model using LTSO (Methods). Here we report the statistics for the data without optimizing brainstem imaging corrections (tables S9 and S10), while the main text reports these results using the weighted average LC-1SD mask from the physio-corrected, unsmoothed fMRI data.
We found that the base model, representing the current gold standard for assessing acute anxiety symptomology, was not a reliable predictor for anxiety symptom severity changes.
Neither of the two self-report scores were significant (p<0.05) nor did this model predict prospective resilience above chance; it only explained 4% of the adjusted variance. In contrast, our identified bio-marker substantially improved predictions of anxiety symptom changes. The adjusted explained variance was increased by 400% and 300% respectively when adding either LC (p=0.017) or pupil (p=0.039) separately. The classic behavioral CSE score was neither significant on his own (p>0.1, model 2) nor in models containing either LC (model 3) or pupil (model 4). Having both LC and pupil regressors compete in explaining anxiety changes (model 5) further increases adjusted variance explained (effectively 20%) and establishes LC as reliable predictor (p=0.041) while, potentially due to shared variance, pupil becomes less reliable (p=0.09). Importantly, adding the individual connectivity strength between LC and amygdala during the upregulation response (model 6) leads to a massive increase in adjusted explained variance (effectively 50%, approximately 12 times the base model) and significantly above chance out-of-sample predictions (p<0.001, 61.8%). These results establish noradrenergic responsivity in the LC (p=0.034) and LC-amygdala-connectivity (p<0.001) during upregulation as important biological markers for anxiety symptom changes and thus stress resilience. Both these variables were also identified as major contributors in the most parsimonious model, which contained LC-connectivity (p<0.001), LC (p=0.007), and CES (p<0.01). This model delivered the highest adjusted explained variance of 51% and predicted symptom severity change out-of-sample (p<0.001, 62.6%) despite only containing 3 parameters.
The LC conflict response is also the most reliable predictor for depression symptom severity changes, even though the base model already explained 23.3% adjusted variance, primarily driven by the PHQ-depression score at T0 (p=0.0002, model 1). However, it has to be noted that PHQ at T0 was inversely related to symptom changes, suggesting a ceiling effect or regression to the mean. Participants with increased depression scores prior to the medical internship were unlikely to increase their depression symptoms further due to real-world stress.
Nevertheless, irrespective of model complexity, the individual LC upregulation response was the only biological marker that reliably related to depression symptom changes (at least p < 0.04), despite controlling for behavioral CSE (p>0.74), pupil distance (p>0.14), or LCconnectivity (p=0.57). The LC upregulation regressor added 7% of adjusted variance (30.3%, model 3) to that achieved by the base model, which was similar to that achieved by the full model including all parameters (29.7%, model 6, with 65.5% out-of-sample accuracy). LC was also the only biological marker (p=0.01) identified in the most parsimonious model along with the PHQ score at T0 (p=0.004). This model explained 33.3% adjusted variance and significantly predicted mean symptom severity changes out-of-sample (p<0.001, 68% accuracy). Taken together, the comprehensive regression analyses for anxiety and depression symptom changes establish that the noradrenergic LC responsivity is a much better prospective predictor of individual stress resilience than the currently used clinical measures. Adverse events and mean Anxiety symptom changes are correlated (Pearson-correlation p=0.0058; R=-0.3926). However, the adverse events regressor (orange shading) does not add substantial variance (R 2 ) or adjusted variance (adj R 2 ), while both LC and LC-Amygdala connectivity remain strong predictors of anxiety symptom changes. Please note: Adverse events and mean depression symptom changes are not correlated (Pearson-correlation p=0.35; R=-0.14). The models: (1) original LC (2) LC_1SD + weighted average (3) LC_1SD + weighted average + physio-corrected (4) LC_1SD + weighted average + physio-corrected + un-smoothed data (5) LC_2SD + weighted average (6) LC_2SD + weighted average + physio-corrected (7) LC_2SD + weighted average + physio-corrected + un-smoothed data Supplemental  PDD). PPI: LC→Amydala (CI>II) refers to the individual strength of functional coupling between the LC and amygdala during the upregulation process. Regressors were z-scored across participants before submission to the multiple regression. Please see the methods section for a detailed description of the quantification of these variables. Each numbered column represents a different linear model regressing the average anxiety symptom severity change on increasingly more complex combinations of predictors from (1)-(6), while column (7) represents an optimal model with respect to a goodness-of-fit vs. modelcomplexity trade-off determined using a step-wise regression procedure (Methods). Blue shading indicates information regarding Goodness-of-fit. R 2 = variance explained, adjR 2 = variance explained adjusted for number of predictors. Bold numbers in these rows indicate the model with the highest R 2 and adjR 2 . AIC = Akaike information criterion, BIC = Bayesian information criterion. Bold numbers in these rows indicate the model with the lowest AIC and BIC. Grey shading indicates information regarding improvements of goodness-of-fit with respect to the base-model (1), which does not contain any behavioral and/or physiological conflict adaptation predictors. Δ R 2 from (1) indicates the differential increase in variance explained. Δ adj.R 2 from (1) indicates the differential increase in adjusted variance explained. The table presents the results of seven multiple regression models (different columns) regressing mean depression symptom severity change (average symptom changes between 3 and 6 months of real-world stress exposure) on different sets of predictors. Please note, the LC data were extracted from LC-1SD-mask voxels using weighted averaging in the physiocorrected, unsmoothed data to optimize brainstem signals. For each predictor the first row indicates the standardized beta-estimate, the second row indicates the standard error (SE) and the third row indicates the p-value (p). Bold p-values indicate p<0.05. The predictors and goodness-of-fit indices are identical with those used and explained in Supplemental Table S7. The table presents the results of seven multiple regression models (different columns) regressing mean anxiety symptom severity change (average symptom changes between 3 and 6 months of real-world stress exposure) on different sets of predictors. Please note, the LC data were extracted without optimizing brainstem signal procedures. For each predictor, the first row indicates the standardized beta-estimate, the second row indicates the standard error (SE), and the third row indicates the p-value (p). Bold p-values indicate p<0.05.

Supplemental
Red shading indicates information pertaining to variables currently considered gold-standard in predicting anxiety symptom change prior to real-world stress. STAI at T0 is the standard anxiety survey obtained before real-world stress and PreTrauma refers to adverse events experienced before the internship. Purple shading indicates information pertaining to behavioral and physiological variables related to the emotional-stroop task. Behav-CSE refers to the classic behavioral reaction time congruency sequence effect. LC (CI>II) refers to the LC upregulation response, i.e.: the neural activity difference comparing CI>II trials measured in the locus coeruleus. Pupil-Dist quantifies the impact of the previous trial CI>II difference in pupil dilation on the current trial CI>II difference in pupil dilation (See methods PDD). PPI: LC→Amydala (CI>II) refers to the individual strength of functional coupling between the LC and amygdala during the upregulation process. Regressors were z-scored across participants before submission to the multiple regression. Please see the methods section for a detailed description of the quantification of these variables.
Each numbered column represents a different linear model regressing the average anxiety symptom severity change on increasingly more complex combinations of predictors from (1)-(6), while column (7) represents an optimal model with respect to a goodness-of-fit vs. modelcomplexity trade-off determined using a step-wise regression procedure (Methods). Blue shading indicates information regarding Goodness-of-fit. R 2 = variance explained, adjR 2 = variance explained adjusted for number of predictors. Bold numbers in these rows indicate the model with the highest R 2 and adjR 2 . AIC = Akaike information criterion, BIC = Bayesian information criterion. Bold numbers in these rows indicate the model with the lowest AIC and BIC. Grey shading indicates information regarding improvements of goodness-of-fit with respect to the base-model (1), which does not contain any behavioral and/or physiological conflict adaptation predictors. Δ R 2 from (1) indicates the differential increase in variance explained. Δ adj.R 2 from (1) indicates the differential increase in adjusted variance explained.

Supplemental Figures
Supplemental Figure S1: Response conflict (incongruent trials > congruent trials) in dorsomedial prefrontal cortex and replication of classic behavioral conflict results and trial sequence effects.
The imaging analyses focus on incongruent trials because during these trials, the incompatibilities between stimulus dimensions (emotion vs. word) induce response conflict. In order to further confirm the generality of the response conflict induced by the paradigm, we contrasted incongruent with congruent trials irrespective of trial sequence. Not surprisingly, this contrast revealed the dorsomedial prefrontal cortex (DMPFC, cluster extent = 197, degrees of freedom (df) = 47, nonparametric P(FWE) = 0.032, X/Y/Z: -7/13/55), a region strongly associated with response conflict in several prior studies 10,23,24 . Supplemental Figure S2:

Confirmation of regions previously associated with (CI > II).
In the present data-set we replicate previous upregulation regions identified using the CI>II contrast. Please not color-scale limits range 0-80 in order to make tSNR contrasts within the brainstem more easily apparent. See next figure for coronal LC-mask comparison and overlay.

LC-mask and tSNR
Mean temporal signal-to-noise ratio (N= 48) in 3 coronal views slicing through the LC. For easy comparison each different slice-view (indicated by the Y-coordinate above) displays a zoomed view of the same slice containing the probabilistic LC-2SD map on the left (in shades of green) and the identical slice and tSNR overlay on the right.