Introduction

Negative emotionality refers to the general tendency to show various forms of negative affect including (exaggerated) anxiety, guilt, moodiness, angriness, insecurity and dissatisfaction1. Furthermore, individuals characterized by high negative emotionality report enhanced distress in response to novelty, threat, or stress in real-life and in the laboratory1,2,3,4 as well as in the absence of real danger5. Negative emotionality is used synonymously with other terms in the literature as Neuroticism, Negative Affectivity and Dispositional Negativity. Here, we use the term ‘negative emotionality’ to refer to the broad umbrella construct.

Negative emotionality is a well-established and relatively stable risk factor for the development of affective disorders—in particular anxiety and depression, for a review see1,6,7,8,9,10,11,12. Even within patient samples, those individuals with higher levels of negative emotionality typically report more severe symptoms, and clinical prognosis is less optimistic13,14,15. To allow for the development of targeted prevention and intervention approaches, there is an urgent need to deepen our understanding on the basic neuro-cognitive mechanisms1,16 underlying the elevated risk associated with negative emotionality.

Differential vulnerability might hinge on individual differences in associative learning processes17,18,19. Associative learning processes represent a core mechanism of the development as well as the maintenance of pathological fear and anxiety. These processes can be captured experimentally in fear conditioning paradigms, which serve as translational models in fear and anxiety research20,21,22. During fear acquisition training, an initially neutral stimulus (the to-be-conditioned stimulus, CS+) is paired with an aversive event (the unconditioned stimulus, US) and thereby becomes a predictor of the US while a second stimulus (CS−) is never paired with the US. Subsequently, the CS+ elicits (anticipatory) defensive responses that can be assessed at different response levels, all capturing slightly different time-windows and sub-processes, for a review see23. These include self-report (e.g., ratings of fear or US expectancy), physiological responding [e.g., skin conductance responses (SCRs), fear-potentiated startle responses (FPS) and neuro-functional activation (e.g., BOLD fMRI)]. Skin conductance responses are the most commonly used measures of conditioned responding and are assessed as phasic arousal-related changes in sweat gland activity24,25. Fear potentiated startle, which follows a valence gradient in responding26,27, measures the increase in the startle reflex elicited by a sudden event (such as a burst of white noise) in the presence of threat as compared to the absence of threat.

Focusing on individual differences in negative emotionality1, e.g.28, and combining it with fear conditioning research29 holds promise to provide critical insights into the mechanisms underlying individual risk and resilience for the development of anxiety and/or stress-related disorders19,29. A recent review identified three scales linked to the broader construct of negative emotionality that have been consistently associated with individual differences in fear conditioning performance29 and vulnerability to pathological fear and anxiety: the trait anxiety scale of the Spielberger’s State-Trait Anxiety Inventory (STAI-T30), the Big Five neuroticism scale of the NEO-FFI (NEO-FFI-N31) and the intolerance of uncertainty scale (IUS32).

Trait-anxiety, reflects the general tendency to react anxiously and to show cognitive as well as affective styles related to pathological anxiety to a wide range of events and contexts. There has been a long-standing debate on whether the STAI-T is a “good” measure of anxiety. Based on confirmatory factor analytical approaches in large samples of healthy individuals33,34 and patients34, it was suggested that the STAI-T measures “general negative affect” rather than “measuring anxiety or depression in a strict sense”. The latter two hypothetical sub-factors had been proposed previously35 but lacked sufficient discriminant validity in newer work using larger samples33,34.

In turn, neuroticism, one of the “Big-Five” constructs derived factor-analytically, reflects the tendency to show negative affect such as anger, envy, guilt, and depressed mood and to be emotionally highly reactive and vulnerable to stress36. Neuroticism has also been described as “sensitivity of defensive distress systems that become active in the face of threat, punishment or uncertainty”37 and is considered an established risk factor for psychopathology38. Recently, it was reported that neuroticism may be associated with experiencing more intense negative emotions, but not with the variability in experiencing negative emotions39.

Finally, intolerance of uncertainty is defined as the dispositional cognitive bias to perceive and interpret ambiguous situations as threatening32,40, which has been suggested to be a possible trans-diagnostic factor contributing to maintaining affective disorders including anxiety disorders and depression41,42. Relatedly, patients suffering from affective disorders are characterized by heightened scores on the IUS43. Of note, several different scales assessing intolerance of uncertainty co-exist32,44.

All three constructs (trait anxiety, neuroticism and intolerance of uncertainty), as assessed through the above mentioned scales, are related to—or can be subsumed under—the broader umbrella negative emotionality. All three have been associated with individual differences in fear conditioning performance29. Yet results in the literature are heterogeneous and partly inconclusive at the behavioral and neuro-functional level. The fear conditioning field, similar to the field of personality neuroscience in general45, suffers from a number of well described problems. These problems include: (1) generally underpowered samples (typically below N = 30 per group29, and (2) sub-optimal statistical approaches such as dichotomizing continuous variables which gives rise to interpretation problems, causing massive loss of power and increases in Type II error rates (i.e., false negatives)46,47,48,49. Furthermore the majority of results in the fear conditioning field originate from univariate analyses focusing on (3) single constructs related to negative emotionality (for a discussion see29, for a few exceptions see50,51,52,53,54) and (4) singular outcome measures (such as ratings, SCRs, FPS or BOLD fMRI) each tapping into slightly different underlying processes23. Attempts for multivariate integration are thus far rare. As a consequence, separate lines of research and isolated findings have emerged that are notoriously difficult to integrate and interpret into one bigger picture. Hence, we echo recent calls for a paradigm shift embracing more complex multivariate approaches, the use of larger data sets and the dimensionality of the data16,55. The overarching aim of this work is to enhance our understanding of the mechanisms through which negative emotionality may convey risk for affective psychopathology by integrating separate lines of research (using different scales and outcome measures) that have emerged in parallel and are difficult to integrate.

To achieve this aim, we start by integrating dimensional measures as derived from three commonly employed scales in the field (i.e., STAI-T, NEO-FFI-N and IUS) with the three most commonly used measures of conditioned responding (ratings, SCRs, FPS)—as identified and summarized by a recent review by our group29. These measures are obtained in a large sample (Study 1, N = 356) and combined into one statistical model that is set up to investigate whether any of these scales is linked to specific fear conditioning performance over-and-beyond the other scales. Subsequently we investigate whether it is the shared variance across the scales and across outcome measures that explains these potential associations and thus supports a prominent role for the general construct negative emotionality, or whether the scales remain specifically associated with specific measures of conditioned responding. Specific and directed hypotheses on the outcome of this model are difficult to derive from the existing literature because results in the field are extremely heterogeneous.

Additionally, we aim to replicate the main findings from Study 1 in a re-analysis of a second pre-existing sample (Study 2, N = 113) which also allows to extend our inferences to the neuro-functional level (a brief Introduction to Study 2 is provided below).

Study 1: Methods

Participants

Three-hundred-fifty-six healthy individuals participated in Study 1. The study was originally designed to investigate individual differences in fear acquisition and to investigate post-acquisition manipulations on return of fear responding (see also “Experimental design”). Participants were recruited primarily through online advertisement on a local student-job website.

Prior to inclusion, individuals were subject to a pre-experimental telephone screening in which individuals were selected only when reporting absent previous or current diagnosis of psychiatric or neurological disorders, or hormonal disturbances such as thyroid dysfunction. This sample includes 255 females and 100 males between 18 and 40 years old, with an average age of 25 ± 4 (SD) years. Note that gender information is missing for one participant. Written informed consent in accordance with the Declaration of Helsinki was obtained from each participant, and the Ethical Review Board of the German Psychological Association (DGPS) approved the study. Participants received 10 Euros/h for their participation. Please note that this sample and the association between differential SCRs during fear acquisition training and scores on the STAI-T scale (see below) has been included as a case example in our recent publication focusing on the impact of performance-based exclusion of participants (i.e., exclusion based on differential SCRs cut-offs) to illustrate a potential sample bias with respect to individual differences in anxiety related traits56.

Questionnaires

Participants filled in a batch of questionnaires prior to the experiment. This batch included (1) questions to obtain demographic information, (2) the State-Trait Anxiety Inventory30, (3) the NEO-FFI31,57 (4) the Intolerance of Uncertainty Scale32 and (5) the locus of control IPC Scale (Internal control, Powerful others external control, Chance control)58. Upon completion of the experiment (i.e., after extinction and reinstatement), participants filled in a post-experimental awareness questionnaire23 of which answers were orally confirmed with the experimenter. The questionnaire included estimations on the total number of received electrotactile stimuli and the total number of experimental stimuli presented during the experiment. Also, it contained questions about perceived CS-US contingencies during the experiment (first as a free recall then as a forced choice). Based on this, participants were classified as either aware (N = 236, able to correctly report CS-US contingencies in free recall and/or forced choice) or unaware (N = 87, unable to report correct CS-US contingencies across questions). Twenty-one participants that reported a tendency towards the correct contingencies but also some unsureness were counted as aware. Data on CS-US awareness were missing from twelve participants.

The trait scale of the STAI (STAI-T) consists of 20 items, evaluated on a four-point Likert scale, allowing individuals to score between minimally 20 and maximally 80 points. Despite its potential misleading name (i.e. trait anxiety inventory), the STAI-T more likely assesses how a respondent generally feels, and is thought to target relatively stable aspects significant for “anxiety proneness”, including calmness, confidence and security59. Congruently, the STAI-T has been criticized for representing a psychometrically inhomogeneous scale itself33,34 representing facets of anxiety and depression. Based on confirmatory factor analytical approaches in large samples of healthy individuals33,34 and patients34, it was suggested that the STAI-T measures “general negative affect” rather than “measuring anxiety or depression in a strict sense”. The latter two hypothetical sub-factors had been proposed previously35 but lacked sufficient discriminant validity in newer work using larger samples33,34.

The neuroticism scale of the NEO-FFI (NEO-FFI-N) consists of 12 out of 60 items, which were derived factor analytically and should represent one of the five higher order Big-Five personality traits31, i.e., neuroticism. Scores on this NEO-FFI-N scale can range between 0 and 48. Neuroticism refers to the tendency to express negative emotionality and has been suggested to be associated with defensive responding to uncertainty, threat, and punishment60.

The IUS consists of 27 items that aim to assess an individual’s tendency to react to the uncertainties of life, or more precisely their intolerance towards these uncertainties32,40. Each item is evaluated on a five-point Likert scale, allowing respondents to score between 27 and 135. Intolerance of uncertainty is defined as a cognitive bias that affects how uncertain situations are perceived, interpreted, and dealt with cf.61,62. Several factor solutions have been suggested including a four-factor solution40 which include: (1) uncertainty is stressful and upsetting, (2) uncertainty causes inability to act, (3) uncertain events are negative and should be avoided, and (4) being uncertain is unfair. No official German version of this questionnaire exists, however, Gerlach et al.32 created a German translation and investigated the underlying factor structure in which this four-factor structure could not be replicated. In Study 1, this German translation of the full 27 items of the IUS is used.

For those individuals having one or more, but not all, missing items on either of the questionnaires, missing values were imputed using a single imputation with the predictive mean matching method in the MICE package in R. This imputation method draws observed values from other subjects with a similar response pattern on other variables. In total, data was imputed for 8 subjects on the STAI-T, for 5 subjects on the NEO-FFI-N, and for 26 subjects on the IUS. Forty-six participants have missing data for the full STAI-T and IUS, one misses the full STAI-T only, and one participant has full missing data on the NEO-FFI-N. Because this data cannot be considered as missing at random, it is not imputed, but maintained as missing data.

Overall reliabilities of the questionnaires and subscales of interest were high in the final sample, as indicated by Cronbach’s α: 0.91 for STAI-T; 0.86 for NEO-FFI-N; 0.94 for IUS. In addition, a wide range of scores was covered in the acquired sample: 21–76 for STAI-T with 38 ± 9 (mean ± SD); 1–40 for NEO-FFI-N with 20 ± 8 (mean ± SD); and 27–135 for IUS with 62 ± 18 (mean ± SD).

Experimental design

All participants underwent a fear conditioning, extinction and return of fear paradigm. Data of interest for the current research question concerns fear acquisition training only. Data acquired during the same experimental session but involving experimental manipulations after this fear acquisition training phase or involving a methodological validation in sub-samples of this sample is published elsewhere63,64. Therefore, experimental details will only be provided for the fear acquisition training phase and the preceding habituation phases of the experiment.

Instructions

Participants were not instructed with respect to the CS-US contingencies or the learning element of the study.

Visual material—conditioned stimuli

Black geometrical shapes (i.e., a rectangle and an ellipse) served as conditioned stimuli (CS). One of these shapes (CS+) co-terminated with the unconditioned stimulus (US) during all fear acquisition training trials, whereas the other shape did not (CS−). In other words, a 100% reinforcement ratio was used during this experimental phase. Each CS type was presented consecutively for maximally two times, and nine times in total during fear acquisition training (9 CS+ and 9 CS− trials). Allocation of the shapes to CS+ or CS− was counterbalanced across participants, as well as the order in which the CS+/CS− appeared. The CSs were presented for 6 s on a colored computer screen (blue, purple, green, or yellow). The background color served as contextual stimulus, which has no relevance to the fear acquisition training phase, but is of value in the context of post-acquisition experimental manipulations—not included here. The background color remained constant within experimental phases and was counterbalanced across participants. CS presentations were interleaved with inter trial intervals (ITI) consisting of a white fixation cross on a black computer screen, with variable durations (11.5 ± 1.5 s). Prior to fear acquisition training, subjects underwent an explicitly US-free CS habituation phase in which both stimulus types (i.e., the CS+ and CS−) were presented two times each.

Electro-tactile material—unconditioned stimulus

A train of three electro-tactile square wave pulses, 2 ms each, with 50 ms intervals, served as US. The US was produced by a DS7A electrical stimulator (Digitimer, Welwyn Garden City, UK) and delivered through a surface electrode with a platinum pin (Specialty Developments, Bexley, UK) to the dorsal part of the right hand. The intensity of the electro-tactile US was individually adjusted using a stair-case procedure to reach an unpleasant but tolerable level (range US intensities 0.3–70 mA, mean ± SD = 4.7 ± 5.2, median 3.5). The intensity of the US was gradually increased after conferring with the participant. The participant could then herself/himself elicit the US by pressing the space bar. After delivery of each US, the participant rated the averseness of the US on a scale from one to ten, with one not being aversive at all to ten being unbearable. The experimenter aimed to reach a final averseness rating of seven, which was not explicitly communicated to the participant.

Acoustic material—startle probes

A burst of 95 dB(A) white noise was used to elicit a startle response. Startle probes were presented binaurally via headphones (Sennheiser, Wedemark, Germany) four or five seconds after CS onset in half of all CS habituation trials (one out of two trials) and in two thirds of all CS fear acquisition training trials (six out of nine trials). Additionally, startle probes were presented in one third of all ITI’s, either five or seven seconds after ITI onset. To obtain a stable baseline for startle reactivity, five consecutive startle probes—interleaved six seconds—were presented during a white fixation cross on a black computer screen.

Procedure

Experimental instructions were provided in written form and importantly did not contain instruction with respect to CS-US contingencies. Participants were instructed to attend to the visual stimuli presented on the screen, and ignore the acoustic startle probes. It was made explicit that startle probes had the sole purpose of enabling physiological data acquisition.

Participants started by filling out the questionnaires. After, they proceeded with the US intensity calibration phase. In a step-wise procedure the US intensity was increased to a level described by the participant as very annoying but not painful equaling to a rating of at least 7 on a ten-point scale (with ten being the maximally aversive sensation that could be induced by the electrode). Next, the actual experiment started with the startle habituation phase and continued with an explicitly US-free CS habituation phase. Subsequently the uninstructed fear acquisition training phase of interest started. Presentation of all stimuli was controlled using Presentation Software (NeuroBehavioral Systems, Albany California, USA). After completing the full experiment, thus after the post-acquisition training phases that included extinction training, reinstatement administration, and return of fear test phases, participants completed the post experimental awareness questionnaire. Twenty-eight participants had to be excluded for the fear conditioning experiment either due to voluntarily discontinuation or technical failure during data acquisition.

Subjective data recording—fear ratings

Participants indicated their level of fear, anxiety, and distress towards both CS types within intermittent rating blocks. The following text was presented on screen: “How much stress, fear or anxiety did you experience the last time you saw symbol X?”, with the “X” referring to one of the CS types at a time. Participants were given seven seconds to provide their response on the computerized visual analogue scale (VAS) ranging from 0 (none) to 100 (maximum), which had to be confirmed within the given time window by pressing the enter key. One rating block was presented at the end of the habituation phase, and three rating blocks were presented during fear acquisition training. Rating blocks were always presented after minimally one and maximally four CS+ and CS− presentation(s) (see65 for a graphical overview on the design including ratings). The last rating in the fear acquisition training phase either occurred after the seventh or eighth acquisition trial. Nineteen participants failed to confirm their selected VAS values within seven seconds, and have therefore missing rating data. Post processing was conducted in R version 3.6.0 (2019-04-26).

Physiological data recording and processing—skin conductance and startle responding

Methods and procedures to acquire physiological data have been previously described in Sjouwerman et al.65. Physiological data were recorded using a BIOPAC MP100 amplifier, (BIOPAC Systems Inc., Goleta, CA, USA) and AcqKnowledge 3.9.2 software. Data preprocessing was conducted in MATLAB (version2014b), response quantification was conducted manually in a custom made program, and post processing was conducted in R version 3.6.0 (2019-04-26). For physiological measurements, additional data is missing for some participants (SCR = 8, startle = 30) due to technical failures including for example saving failure of the physiological data file only, data extraction problems, erroneously adjusting the gain during the experiment causing SCR amplitudes to be uninterpretable, or electrode misplacement during data acquisition.

Skin conductance

For skin conductance recording, participants first cleaned their hands with warm water. After, two hydrogel and Ag/AgCl sensor recording electrodes (Ø 55 mm) were attached to the distal and proximal hypothenar eminence of the left hand. Skin conductance data were recorded continuously at 1,000 Hz with a gain of 5mΩ. In case participants’ skin conductance moved beyond the scaling window, the gain (i.e. resistance) was increased or decreased to reduce or increase sensitivity of the skin conductance being recorded prior to the start of the experiment. Offline, data was down sampled to 10 Hz. According to published guidelines24, data were scored manually as foot-to-peak responses with response onsets starting between 0.9 and 4.0 s after CS or US onset. Increases smaller than 0.02 µS were scored as zero responses. Responses confounded by recording artifacts, such as electrode detachment, responses moving beyond the sampling window, or excessive baseline activity were discarded and scored as missing values. Raw skin conductance response (SCR) amplitudes were normalized by log transformation and range corrected by division through an individuals’ maximum response amplitude (either CS or US).

Participants not showing valid SCRs in over two thirds of all fear acquisition training to the US (i.e., six out of nine trials56) were classified as physiological non responder (n = 16) and all SCR trials were set to missing values.

Startle responding

Startle responding was measured by using Ag/AgCl electromyogram (EMG) electrodes. Two electrodes were placed below the right eye over the orbicularis oculi muscle and one electrode was placed on the participants’ forehead to obtain a reference signal. Startle data filtered online (band-pass: 28–500 Hz), rectified, and integrated (averaged over 20 samples). According to published guidelines66 data were scored manually as foot-to-peak with response onsets within 20–120 ms post startle probe onset. Responses confounded by a blink occurring up to 50 ms before startle probe onset were scored as missing value. Similarly, trials confounded by recording artifacts or excessive baseline activity within the same time window were scored as missing values. Raw data were t-transformed across the experimental phases up to the fear acquisition training phase.

Participants not showing valid startle responses in over one third of all trials from the habituation phases and the fear acquisition training (i.e., more than 9 out of 28) were classified as physiological non responder (n = 16), and startle responses for these participants were set to missing values. Note that these startle non-responders do not overlap with SCR non-responders.

Fear acquisition is operationalized as CS+/CS− discrimination during the fear acquisition training phase (i.e., average CS+ minus average CS− responding). This includes responding towards 9 CS+ and 9 CS− trials for SCRs, and 6 CS+ and 6 CS− trials for startle responses (not all trials were ‘startled’, see above), and all 3 intermittent CS+ and 3 intermittent CS− ratings.

Study 1: Statistical analyses

Study 1 consists of three analysis steps. First, univariate zero-order correlational analyses are conducted between the three questionnaires, or their respective subscale, and the three outcome measures of conditioned responding recorded in our study. For each outcome measure, a CS-discrimination value is calculated by subtracting CS− responding from CS+ responding. All variables are treated as dimensional. For all univariate analyses, multiple testing will be corrected by using the Benjamini Hochberg method (pBH). Correlation coefficients were compared using freely available online computer software67. Exploratory correlational analyses with awareness and US-intensity are reported in the Supplementary Information.

In a second step which serves the aim to integrate potential effects of these independent and dependent measures in a single model, a path model is constructed in which relationships between the three independent variables and the three dependent variables are estimated simultaneously. In this path model, correlations among the independent measures, i.e., questionnaires or scales, are allowed.

In a third step we take into account that the three independent measures are likely to highly correlate with each other which makes it possible that these questionnaires, or subscales of questionnaires, all tap into a same larger construct, i.e., “negative emotionality”. Similarly, the three measures of conditioned responding highly correlate and might be part of the larger construct “fear learning”. This hypothesis, i.e., whether it is the shared variance across the questionnaires or scales or unique variance of individual questionnaires or scales that explains differences in specific outcome measures will be examined by employing a structural equation model. In this model, two latent variables will be defined, one for the three questionnaires/scales and one for the three outcome measures. The regression weight between the latent variable negative emotionality and the first questionnaire/scale, as well as the regression weight between the latent variable fear learning and the first outcome measure will be fixed to 1. Subsequently, a complementary structural equation model will be constructed that in addition to the paths defined in the model described above (under step 3) includes the paths that showed significant associations in the established path model in step 2.

For both, path models and structural equation models, two-sided model fit will be evaluated based on root mean square error of approximation (RMSEA) values, indicating excellent fit when < 0.01, good fit when < 0.05, fair fit when < 0.08, mediocre fit when < 0.10, and poor fit at > 0.10 RMSEA values68,69. To improve model fit, backward selection of significant and trend-significant paths will be executed. Trends (p < 0.1) will be included in interim models, but not in final models. Full models, interim models, as well as final models will be reported.

Univariate statistical analyses and data visualization was performed with R version 3.6.0 (2019-04-26) using packages corrplot, dplyr, ggplot2, tydr, mice, psych and cowplot. Multivariate analyses (path and structural equation models) were performed in AMOS 26 for Windows (Armonk, NY).

Study 1: Results

Study 1 aims to enhance our understanding of the mechanisms through which negative emotionality may convey risk for affective psychopathology by integrating separate lines of research (using different scales and outcome measures) that have emerged in parallel and are difficult to integrate. In a first step, we present univariate analyses illustrating associations between the three commonly employed scales in the field (i.e., STAI-T, NEO-FFI-N and IUS) with the three most commonly used measures of conditioned responding (ratings, SCRs, startle responding). We then move to multivariate analyses integrating these variables into a single model (Step 2) and exploring the role of potentially latent higher-order factors (Step 3).

Step 1: Univariate analyses

Univariate analyses revealed a significant albeit small negative correlation between STAI-T and CS discrimination in SCRs (r = − 0.19, pBH = 0.007), whereas correlations between CS discrimination in SCRs and either NEO-FFI-N or IUS were not significant when correcting for multiple testing but were trend wise significant and significant only when not correcting for multiple comparisons (see Fig. 1). This effect seems descriptively driven by the combination of weakly and non-significantly increasing CS− responding, and weakly and non-significantly decreasing CS+ responding with increasing scores on these questionnaires/scales. The correlation coefficients for CS discrimination in SCRs and the three questionnaires do however not differ significantly from each other (SCR discrimination for STAI-T and NEO-FFI-N: z = − 1.691, p = 0.091; STAI-T and IUS: z = − 0.977, p = 0.329; NEO-FFI-N and IUS: z = 0.498, p = 0.619).

Figure 1
figure 1

Scatterplots showing associations between the three independent variables, i.e., questionnaires/scales STAI-T (A,D,G), NEO-FFI-N (B,E,H), and IUS (C,F,I) and the three dependent variables, i.e., outcome measures SCR (skin conductance responses; AC), startle responses (DF), and ratings (GI) for CS discrimination in grey, CS+ responding in red, and CS− responding in blue. Corresponding correlation coefficients (r) are displayed in corresponding colors, #p < 0.1, *p < .05, **p < 0.01 for raw and Benjamini Hochberg (BH) adjusted p values separated by a semicolon. NS or blank spaces indicate p > 0.1. Density distributions of scores on the questionnaires are displayed on top of the figure, density distribution per stimulus type (CS+, CS−) and for CS discrimination are displayed on the right side of the figure for each dependent variable (SCR, startle, ratings). Note that the scale of CS discrimination values for startle responding are transformed with + 50 for illustrative purposes to ease comparison with CS+ and CS− startle responding.

Furthermore, a small negative correlation between IUS and CS discrimination in startle responding (r = − 0.15, pBH = 0.039) during fear acquisition training was observed, with no (BH corrected) significant associations for STAI-T and NEO-FFI-N and CS discrimination in startle responding but only trend wise significant associations when not controlling for multiple comparisons. The negative association between IUS and startle responding is driven by significantly increasing CS− responding with increasing scores on the IUS, while the association between the CS+ and scores on the IUS is not statistically significant. Even though the associations for the NEO-FFI-N and STAI-T with CS discrimination were uncorrected only trend significant, a similar pattern (i.e., significant positive association; r = 0.157–0.183, pBH < 0.05) between these questionnaire scores and CS− responding was observed. The correlation coefficients for the CS discrimination and each of the three questionnaires/scales were not significantly different (all z < 0.69, all p > 0.492).

CS discrimination in ratings was not significantly associated with any of the three questionnaires, or scales of questionnaires. But note that CS− responding in ratings, similar to the pattern observed in startle responding, was positively—albeit only trend wise—associated with all three questionnaires/scales. Exploratory analyses for associations between STAI-T scores and US intensities as well as with awareness are reported in the Supplementary Information.

Step 2: Multivariate analyses

Path analysis on the sum scores on the three questionnaires/scales and the three outcome measures of fear learning revealed expected significant positive associations between on the one side the scores on the three questionnaires (all p’s < 0.001) and on the other side also revealed positive correlations between the three outcome measures (all p’s < 0.001). In addition, a significant path between STAI-T and CS discrimination in SCRs was observed in this full model in which all paths between all variables were included (Fig. 2, grey font and grey paths). The final model generated through backwards selection (Fig. 2, blue font and blue path), also yielded a significant path between IUS and CS discrimination in startle, which was only trend-significant in the full model.

Figure 2
figure 2

Path model reflecting the associations between the three measured questionnaires or scales of questionnaires (STAI-T, NEO-FFI-N, and IUS) with three outcome measures of fear acquisition (CS+, CS− discrimination in SCR, STARTLE, and RATINGS). Paths are labeled with standardized regression coefficients (βst) and p values. Full model paths and coefficients are shown in grey; paths and corresponding coefficients selected in the final model are overlaid in blue. Note that solid grey lines represent non-significant paths (p > 0.1) in the full model. Blue dashed lines represent paths that were trend significant (p < 0.1) in the full model, and significant in the final model (p < 0.05), solid blue lines represent paths significant (p < 0.05) in the full model and significant in the final model. Correlation paths between independent and between dependent measures are shown in black, all p < 0.001 unless otherwise specified. Coefficients appear in grey for the full model and in blue for the final model. The full model shows poor fit (RMSEA > 0.10) while the final model shows excellent fit (RMSEA < 0.01).

Step 3: Multivariate analyses testing for shared vs. unique variance

To test the hypothesis whether the associations observed in step 2 are specific to specific questionnaire scores or driven by shared variance of the three questionnaires or scales, we set up three structural equation models: (1) an initial model with two latent variables (“negative emotionality” and “fear learning”, Fig. 3, grey lines and font), (2) an interim model (Fig. 3, black lines and font) in which we added the unique paths identified in Step 2 to the initial model (i.e., STAI-T and SCR discrimination as well as IUS and startle discrimination), and (3) a final reduced model generated through backward selection (Fig. 3, blue lines and blue font).

Figure 3
figure 3

Structural equation model reflecting the relation between two latent factors: “negative emotionality” comprised of the three measured questionnaires/scales STAI-T, NEO-FFI-N, and IUS, and “fear learning” comprised of the three outcome measures of fear learning (CS+, CS− discrimination in SCR, STARTLE, and RATINGS). Paths are labeled with standardized regression coefficients (βst) and p values. Initial model coefficients are shown in grey; paths and corresponding coefficients selected in the final model are overlaid in blue. Paths and coefficients of the interim model are shown in black, note that the dotted black line reflects a path from an interim model significant in the initial model, but non-significant (p > 0.1) when adding specific questionnaire to outcome paths, and thus is excluded for the final model. Factor loadings between latent and respective independent and dependent measures appear in grey for the initial model, in black for the interim model and in blue for the final model. All p < 0.001 unless otherwise specified. All p < 0.001 unless otherwise specified. The initial model shows good fit (RMSEA < 0.05), interim and final models show excellent fit (RMSEA < 0.01).

The initial model shows that the three questionnaires or scales, STAI-T, NEO-FFI-N, and IUS are indeed closely related to the latent variable “negative emotionality” with all factor loadings > 0.69. Similarly, the three measures of fear acquisition, SCR, startle responding and ratings are also closely related to the latent variable “fear learning” with slightly lower factor loadings (yet, all > 0.46). This pattern is maintained in the interim and final model. Importantly, in this initial model (Fig. 3, grey lines and font) in which the path between STAI-T and SCR, and IUS and startle responding were not yet included, a significant negative relation (βst = − 0.246) between the two latent factors “negative emotionality” and “fear learning” was observed, suggesting that there is a general predictive effect of higher negative emotionality being linked to reduced differential fear learning. This model shows good fit (RMSEA = 0.019).

In the interim model the two unique significant paths identified in Step 2 (i.e., STAI-T and SCR discrimination as well as IUS and startle discrimination) were added to the initial model. This resulted in the relation between the two latent factors disappearing (black dotted line in Fig. 3), whereas the two unique path turn out significant in this interim model. This path structure is entered in the final model and the paths between STAI-T and SCR and IUS and startle remain significant. The interim and the final model both show excellent fit (RMSEA = 0). This suggests that it may be the unique variance in STAI-T that predicts fear learning in SCR responding (βst = − 0.167), and the unique variance in IUS that predicts fear learning in startle responding (βst = − 0.149), rather than the shared variance across the questionnaires/scales, and across outcome measures. Note that these standardized coefficients of the final model are identical to the coefficients estimated with the final path model in Step 2.

Study 1: Interim discussion

As expected, we observed moderate to strong links between the scores derived from the three questionnaires/scales linked to negative emotionality (STAI-T, NEO-FFI-N, IUS) in a large sample, whereas links between the three outcome measures of conditioned responding (SCRs, startle, ratings) were weak to moderate. These weak to moderate correlations among outcome measures are consistent with the idea that they tap into slightly different processes and capture different timings with respect to CS processing discussed in29. Uncorrected univariate analyses, which would mirror the approach employed in studies focusing on these questionnaires/scales and outcome measures in isolation, suggest significant associations between STAI-T and SCR, IUS and SCR, and IUS and startle responding. These significant correlations were small and each explained between 10 and 20% of the variance. Even when ignoring the strong correlations among the measured questionnaires/scales, this leaves space for other individual traits beyond negative emotionality to affect physiological responding. Additionally these univariate analyses suggest trends for the NEO-FFI-N and SCR and startle, as well as for STAI-T and startle responding. Importantly, the correlation coefficients between one outcome measure and the different questionnaire scores do not differ significantly from each other.

The results from the multivariate path model on the other hand may suggest a slightly different conclusion. When all measures are integrated in one statistical model, only associations between STAI-T and CS discrimination in SCR, and between IUS and CS discrimination in startle responding remain significant. This may suggest a certain level of specificity indicating that more general measures (such as the STAI-T) of negative emotionality might be specifically associated with outcome measures that also reflect very general physiological arousal or general affective processes (SCR). Our results suggest that these may be distinct from more specific measures (such as the IUS) in the negative emotionality domain, which seem to be associated with measures reflecting valence specific processes related to fear learning.

Remarkably, the structural equation model that includes latent variables for negative emotionality and fear responding only, reveals a strong association between both latent variables with good fit. All questionnaires/scales load strongly on the latent factor negative emotionality, with STAI-T showing the strongest factor loading. These high factor loadings indicate that including negative emotionality as latent factor may be indeed appropriate and informative. The fear learning latent variable is represented by less strong, but still medium sized factor loadings, suggesting that these readouts are related and there might be a broad underlying “fear learning” variable, but it also underlines that there is room for dissociation between them in particular because from a theoretical and neurobiological perspective70 these different outcome measures capture slightly different cognitive-affective processes and timings with respect to CS processing23,56,71. The initial effect between the latent variables is eliminated when adding the two identified specific paths.

In sum, our results speak in favor of a substantial shared variance and the existence of a latent “negative emotionality” factor across the three questionnaires and scales included, speaking in favor of convergent validity. In addition, specific paths between specific scales and outcome measures were identified—yet it cannot be excluded that these specific results may represent overfitting in this particular sample. Our results may indicate that it might rather be the shared variance across the questionnaires/scales that predicts fear learning and the specificity of the here identified paths needs to be replicated and further investigated in another well-powered sample potentially including item-level factor analytical approaches which would require a substantially larger sample than included in this study. Similarly, additional individual difference factors in the personality domain that for example target positive emotionality, could improve the amount of explained variance between individual difference factors and physiological measures. Ultimately this might contribute to a more dimensional framework in understanding the mechanisms behind individual differences in fear acquisition.

Study 2: Brief introduction

Surprisingly, the neurocognitive processes underlying an association between negative emotionality and the ability to discriminate signals of danger and safety remain largely unknown to date29. A bunch of studies has investigated neural associations with trait anxiety52,72,73,74,75, intolerance of uncertainty52,72,73,74,75 or neuroticism52,72,73,74,75, but studies integrating fMRI results with concurrently acquired psychophysiological measures and measures of negative emotionality in one and the same study are rare or even non-existent in the field, reviewed in29.

Therefore, we aim to address this fundamental gap in the literature and extend our findings from Study 1 by exploring the neuro-functional mechanisms potentially underlying the observed specific association between the STAI-T score with CS+/CS− discrimination in SCRs that has been observed in Study 1. To achieve this aim, we re-analyzed data from a large pre-existing sample (N = 113).

Of note, recording of startle responses in the MR scanner has been challenging due to technical challenges, which we have only very recently been able to overcome26. To date, we do not have sufficiently large samples with simultaneous recordings of startle and BOLD-fMRI to follow up upon the association between the IUS and CS+/CS− discrimination in startle.

Study 2: Methods

Participants

Study 2 is based on a pre-existing dataset. One-hundred and twenty four participants were included in Study 2. Participants were recruited from a large screening sample described in76 in which they had been screened by a psychologist for neurological disorders using the M.I.N.I. interview77 as a diagnostic tool. Any current or prior psychiatric neurological disorder or self-reported abuse of illegal drugs led to exclusion. Participants were re-invited for two separate studies based on exposure to life adversity in order to study its impact on a post-extinction manipulation (i.e., reinstatement; an experimental phase not included in the analyses of the current manuscript and published elsewhere78, first study also in (Scharfenort and Lonsdorf79), second unpublished). These two studies employed identical experimental protocols (including experimenter) for fear acquisition, and thus participants were pooled across both studies into Study 2 of this manuscript.

Out of the 124 participants, eleven participants had to be excluded due to technical issues (N = 6), pathological anatomy (N = 2) or missing items on the STAI-T (N = 3) which resulted in 113 participants included for fMRI, rating and SCR analyses of fear acquisition. The final sample included 44 females and 69 males, ranging in age between 19 and 34 years. On average participants were 25 ± 3.5 (SD) years old. All participants had normal or corrected to normal vision.

The studies were conducted in accordance with the Declaration of Helsinki and approved by the Ethical Review Board of the General Medical Council Hamburg. All participants gave written informed consent to participate. Participants received 50 Euros for their participation.

Questionnaires

Participants filled in a batch of questionnaires prior to the experiment. This batch included (1) questions to obtain demographic information, (2) the state scale of the State-Trait Anxiety Inventory30, and (3) the NEO-FFI31,57. The trait scale of the State-Trait Anxiety Inventory (STAI-T) was already acquired within the context of the screening sample, and based on the results obtained in Study 1, the STAI-T is of main interest here.

Overall reliability of the STAI-T was high in the final sample as indicated by Cronbach’s α = 0.93. Notably, a smaller range of STAI-T scores was obtained in Study 2 as compared to Study 1. STAI-T scores ranged between 20 and 59 with a mean of 35 ± 9 (SD).

After completing the experiment (i.e., after fear acquisition training in the MR-environment), participants filled in a post-experimental awareness questionnaire23 and results were orally confirmed with the experimenter. Participants were asked to estimate the total number of received electrotactile and other experimental stimuli, as well as about perceived CS-US contingencies curing the experiment (first as free recall, then forced choice). Consequently, participants were classified as either aware (N = 101, able to correctly report CS-US contingencies in free recall and/or forced choice) or unaware (N = 12, unable to report correct CS-US contingencies across questions).

Instructions

As in study 1, participants were not instructed with respect to the CS-US contingencies or the learning element of the study.

Visual material

Two different white fractals presented on a grey background (RGB [230,230,230], 340 × 320 pixel, resolution: 1,024 × 768) served as conditioned stimuli (CSs; duration: 6–8 s, mean: 7 s). A white cross in the middle of the grey background screen served as inter trial interval (ITI; duration: 10–16 s; mean: 13 s). One of the fractals (CS+) co-terminated with the unconditioned stimulus (US) during all fear acquisition training trials (100% reinforcement rate), whereas the other fractal did not (CS−). A 100% reinforcement rate was chosen to render it likely that all participants learn the association between CS+ and US within the 14 presentations. To allow for a differentiation between CS+ and US-related neural activity despite of this high reinforcement rate, the CS+ duration was jittered between 6 and 8 s (mean 7 s). Allocation of the fractals to CS+/CS− was counterbalanced over all subjects.

Electrotactile US

Similar to Study 1, the US consisted of a train of three electrotactile stimuli (interval 50 ms, duration 10 ms). The US was administered through a surface electrode on the dorsal part of the right hand via a DS7A electrical stimulator (Digitimer, Elwyn Garden City, UK). Before the experiment started, intensity was calibrated individually to a maximum tolerable level with a mean US intensity ± SD of 7.18 ± 4.47 mA, see study 1 for details on the US calibration procedure.

CS-US awareness

CS-US awareness was assessed as in Study 1 with the exception that the interview was conducted immediately after fear acquisition training and not at the end of the whole experiment. Participants were classified as aware (N = 101) or unaware (N = 12) of CS-US contingencies.

Procedure

Fear acquisition training occurred between 1 and 6 pm. As mentioned, experimental phases on the subsequent day (extinction, reinstatement and reinstatement test) are not of interest to the current manuscript. Participants were instructed outside the MR-environment. They were instructed to attend the visual stimuli on the screen, no instruction with respect to CS-US contingency was given.

After positioning the participant within the MR-environment, two skin conductance recording electrodes were attached, as well as one stimulation electrode for US delivery. When positioned in the MR-scanner, the US calibration procedure was started. Next the actual experiment started with a CS habituation phase. Both CSs were explicitly unreinforced and presented seven times each. In the subsequent uninstructed fear acquisition training phase of interest, both CSs were presented 14 times each.

Presentation of all stimuli was controlled using Presentation Software (NeuroBehavioral Systems, Albany California, USA). After completing the experiment for that day, thus after fear acquisition training, participants completed the post experimental awareness questionnaire.

Subjective data recording—subjective ratings

Subjective ratings were acquired retrospectively, i.e., after all fear acquisition training trials. Participants indicated their level of stress/fear/tension elicited by the preceding CS+ and CS− presentations on a 25-stepped visual analog scale (VAS, anchored at 0 and 100). Ratings of participants failing to confirm their rating for both CS types, were set to missing values (N = 9). In case participants missing either the CS+ or the CS− rating, the respective rating was replaced with the mean CS+ or CS− rating of all valid responses of the other participants on that rating (number of replaced values CS+ = 7, CS− = 4).

Physiological data recording and processing—skin conductance responses

Similar to what has been described previously in79, SCRs were recorded using a BIOPAC MP100 amplifier (BIOPAC Systems Inc., Goleta, California, USA). Ag/AgCl electrodes were placed on the palmar side of the left hand on the distal and proximal hypothenar. Data were processed and scored according to published guidelines24. Skin conductance data were down-sampled to 10 Hz. The phasic SCRs to the CS onsets were manually scored off-line, using custom-made software. SCR amplitudes (in µS) were scored as the first response initiating 0.9–4.0 s after CS onset24. To normalize the distribution, the SCRs were log transformed80 and range-corrected by division through an individuals’ maximum response amplitude81.

BOLD-fMRI

MRI data acquisition and preprocessing has been previously described in Scharfenort and Lonsdorf79. MRI data were acquired on a 3-T MR-scanner (MAGNETOM trio, Siemens, Germany) using a 32-channel head coil. Functional data were obtained using an echo planar images (EPI) sequence (TR = 2460 ms, TE = 26 ms). For each volume, 40 slices with a voxel size of 2 × 2 × 2 mm (1 mm gap) were acquired sequentially. Structural images were obtained by using a T1 MPRAGE sequence. Preprocessing and analyses were performed using standard pre-procsessing in SPM8 (Welcome Trust Centre for Neuroimaging, UCL, London, UK). Preprocessing included, coregisteration to the individual structural image, realignment, normalization to group-specific templates (created via the DARTEL-algorithm82; as well as smoothing (6 mm FWHM).

Statistical analyses

For fMRI data, four effects-of-interest regressors were built at the first level (i.e. early and late half of the acquisition trials for CS+ and CS−) as well as ten nuisance regressors (BOLD responses for CS+ and CS− during habituation, USs, ratings, and six movement parameters derived from realignment). All regressors of interest were modeled as stick function and time locked to stimulus onset for acquisition analyses. The general linear model was used to compute regression coefficients (beta values) for the regressors in each voxel.

CS discrimination contrasts (CS+ > CS−; CS− > CS+ for the full acquistion phase) were estimated on the first level and taken into the second level analysis employing voxel-wise regression analyses with the STAI-T. The main effect of task was estimated in a full factorial model with two regressors for both CS+ and CS− (early and late) and contrasts of interest were set to CS+ > CS− and CS− > CS+ covering the full phase.

ROI analyses were based on key areas for fear conditioning [primary ROI: amygdala; secondary ROIs: hippocampus, dorsal anterior cingulate cortex/dmPFC (dmPFC), pallidum/putamen, ventromedial prefrontal cortex (vmPFC), thalamus, insula (Fullana et al.83; Sehlmeyer et al.82)]. Masks were derived from the Harvard–Oxford cortical and subcortical structural atlases; with a probability threshold of 0.7. As no vmPFC and dmPFC mask is provided by the Harvard–Oxford atlases, we used a box (20 × 16 × 16 mm) centered on coordinates from a previous (independent) study [vmPFC: x, y, z: 0, 40, − 12; dmPFC: 0, 43, 2984]. Due to strong a-priori predictions with respect to the amygdala and the use of additional regions as secondary ROIs, correction was performed separately for each ROI. A statistical threshold of p < 0.05 (FWE corrected within the ROI) was considered significant.

Parameter estimates from peak voxels were extracted from the first individual level. Correlation test were performed for CS discrimination as well as CS+ and CS− specific responding in SCRs and ratings with scores on the STAI-T. For ratings and SCR correlational analyses were also carried out with the parameter estimates to explore the link between physiological responding and brain activation in areas linked to the STAI-T. Exploratory correlational analyses with the STAI-T and awareness as well as with US-intensity are reported in the Supplementary Information.

Correlation analyses and data visualization was performed in R version 3.6.0 (2019-04-26) using the dplyr, tydr, corrplot, cowplot, ggplot2, and pych packages.

Study 2: Results

Main effects of task

Successful fear acquisition was evident by significantly larger SCR amplitudes for the CS+ than for the CS− during the full fear acquisition phase [t(112) = 10.14, p < 0.001, d = 0.95]. Similarly, post-acquisition fear ratings were higher for the CS+ as compared to the CS− [t(102) = 15.65, p < 0.001, d = 1.52]. On a neuro-functional level CS-discrimination (CS+ > CS−; Table 1) was reflected in areas typically activated in fear acquisition (i.e., thalamus, amygdala, dmPFC/dACC, insula/frontal operculum and putamen/pallidum). Stronger activation to the CS− than the CS+ was observed in the vmPFC (T-maps are available on neurovault https://identifiers.org/neurovault.image:305007). Note that when correcting for the number of ROIs (i.e., 7) the hippocampus would not meet the corrected significance threshold of 0.007.

Table 1 Neural activation reflecting CS  CS− discrimination during fear acquisition (main effects of task) in the defined ROIs for left (L) and right (R) regions separately.

Associations of CS+/CS− discrimination in SCRs and ratings with STAI-T scores

We explored associations between the STAI-T score and CS+/CS− discrimination in SCRs, post-experimental ratings as well as in BOLD-fMRI. In contrast to what was observed in Study 1, the STAI-T score was not significantly associated with CS+/CS− discrimination in SCRs or ratings during fear acquisition training in univariate correlation analyses (SCR: r = − 0.05, p = 0.59; rat: r = − 0.15, p = 0.13) as illustrated in Fig. 4. Despite the absence of differences in CS discrimination we explored possible associations with CS+ or CS− responding individually. STAI-T was weakly (r = 0.31, pBH < 0.01) and positively correlated with CS− responding (i.e., higher CS− ratings in individuals with higher STAI-T scores) in ratings, but not with either CS+ or CS− responding in SCR.

Figure 4
figure 4

Scatterplots showing associations between (A) the STAI-T and SCR (skin conductance responses) and (B) between STAI-T and ratings for CS discrimination in grey, CS+ responding in red, and CS− responding in blue. Corresponding correlation coefficients (r) are displayed in corresponding colors. The density distribution of scores on the STAI-T is displayed on top of the figure, density distributions per stimulus type (CS+, CS−) and for CS discrimination are displayed on the right side of the figure for each dependent variable (SCR, ratings). **p < 0.01 for raw and Benjamini Hochberg (BH) adjusted p values separated by a semicolon. Blank space indicates p > 0.1.

Neuro-functional associations of CS+/CS− discrimination with STAI-T scores

On a neural level, however, higher STAI-T scores were associated with significantly stronger CS+/CS− discrimination related activation of the right amygdala, the putamen (bilaterally) and the thalamus (bilaterally) during fear acquisition training (Table 2, Fig. 5). The corresponding T-map is available on Neurovault: https://identifiers.org/neurovault.image:305007). The association of the STAI-T scores seems to be driven by a positive association of the STAI-T scores with neural activation to the CS+ (right amygdala: x, y, z: 24, − 10, − 14; k = 21, T:3.57, Z:3.46, pSVC(FWE): 0.008; left Putamen: x, y, z: − 22, 18, − 2; k = 11, T:3.59, Z:3.48, pSVC(FWE): 0.018; right Putamen: x, y, z: 22, 20, − 4; k = 35, T:4.01, Z:3.87, pSVC(FWE): 0.005; left Thalamus: x, y, z: − 4, − 22, 14; k = 178, T:4.71, Z:4.49, pSVC(FWE): 0.001; right Thalamus: x, y, z: 16, − 30, 12; k = 38, T:3.65, Z:3.4, pSVC(FWE): 0.001) but not the CS− (no significant effects in the ROIs).

Table 2 Neural activation reflecting significant ROI-based results (p < 0.05 SVCFWE) for a regression of STAI-T on CS discrimination (CS+ > CS−) during fear acquisition training.
Figure 5
figure 5

Neural activation for CS+/CS− discrimination (CS+ > CS−) during fear acquisition training for areas significantly activated when regressed with STAI-T for the (A) amygdala [26, − 10, − 12], (B) putamen [28, 12, − 2; − 22, 16, − 4], and (C) thalamus [(− 8, − 8, 8; − 2); (− 20, 12; 4), (4, − 20, 12)] on the left. Coordinates are in MNI space. Scatterplots on the right presented in DF serve visualization purposes only and represent the association between STAI-T scores and extracted peak voxel parameter estimates. Note that for the putamen and thalamus multiple peak values per ROI are displayed in different shades. Density distributions for peak value estimates are shown on the right, density distribution of STAI-T values in Study 2 is displayed on top of the upper scatterplot. A display threshold of uncorrected (uc) p < 0.001 was used to illustrate the extent of peak activations, unless otherwise specified. Note that statistics are based on SVCFWE-corrected values.

Robustness checks revealed that a model including the covariate ‘life adversity’ (as participants in Study 2 were initially recruited based on this variable) yielded comparable results. More precisely, statistical values differed only at the last decimal place with the exception of the right thalamus which does not meet the 0.05 threshold when including the covariate ‘life adversity’ (data not shown). Of note, these areas are also significantly implicated in CS− discrimination irrespective of STAI-T in this sample (see above).

Additional, explorative analyses revealed that of these areas, CS discrimination in SCRs was only positively associated with peak voxel activation in the left putamen (r = 0.240, pBH = 0.028), and with the first cluster in the left thalamus (r = 0.322, pBH = 0.002). The latter might be driven by a positive association between SCRs for the CS+ and the left thalamus (r = 0.315, pBH = 0.002), whereas the CS− is does not show a significant association. A graphical representation of these associations can be found in the Supplementary Material (Supplemental Figure S1). Ratings for CS discrimination, the CS+ and the CS− were not significantly associated with peak voxel activation in any of our ROIs (all pBH > 0.512).

An exploratory analysis testing associations between STAI-T and US intensity scores as well as associations with awareness are reported in the supplementary information.

Comparing the STAI-T distributions across both samples (behavioral study vs. fMRI study)

To explore whether the distribution of STAI-T values in Study 1 and Study 2 are different (see Fig. 6A), a two-sample Kolmogorov–Smirnov test was performed. This test indicates that both samples come from different distributions, D = 0.219, p < 0.001. As can be derived from Fig. 6B the fMRI sample includes substantially more individuals with low STAI-T values (i.e., < 50) as compared to the behavioral sample.

Figure 6
figure 6

Distribution of STAI-T scores in the sample included in behavioral Study 1 (black) and fMRI Study 2 (grey) illustrated as (A) overlapping densities for both samples and (B) the empirical cumulative density function for both samples.

Study 2: Interim summary

Results of study 2 show a positive association of the STAI-T score with CS+/CS− discrimination on a neuro-functional level in the right amygdala, the putamen (bilateral) and the thalamus (bilateral). These regions have all been implicated in the ability to discriminate signals of danger from signals of safety, both in the literature83,85 and in the paradigm and sample reported here. Of note, the amygdala is a core region implicated in fear learning86,87,88,89,90 and has been previously linked to individual differences in discriminating signals of danger from signals of safety73. This previous work has often not included the simultaneous acquisition of both autonomic (i.e., SCRs) and neuro-functional measures in the same experimental phase91,92 while others have recoded both measures during fear acquisition72,74 or fear expression73. Importantly, also in domains of threat processing, similar positive associations between STAI-T score and amygdala reactivity as reported here have been observed93,94,95,96. In addition, our work provides evidence for an involvement of the amygdala in individual differences underlying the strength of fear learning beyond the average (i.e., a general role in fear acquisition and expression). This is important as evidence suggesting the role of the amygdala in fear acquisition has been questioned83 and also as there is accumulating evidence that aggregated results across a group do not necessarily generalize to individuals e.g.,97,98.

Despite the observed associations with the STAI-T score and CS+/CS− discrimination on a neural level, we did not observe a significant association between CS+/CS− discrimination in SCRs and the STAI-T score as observed in Study 1. Yet, the sample in Study 2 (N = 113) is substantially smaller than the sample in Study 1 (N = 356). Calculating the required sample size to determine if the correlation coefficient of −0.183—as observed between STAI-T and CS+/CS− discrimination in Study 1—does significantly differ from zero yields a required sample of 232 individuals (assuming an α of 0.05 and a power (1 − β) of 80%; https://www.sample-size.net/correlation-sample-size/). This is twice the number of subjects included in Study 2 and hence the non-replication of SCR results should be treated with caution. Furthermore, we highlight that the distribution of STAI-T scores between Study 1 and Study 2 (see Fig. 6) is significantly different. The distribution in Study 2 is substantially more left skewed. More precisely study 1 (behavioral) contained more individuals with a high STAI-T score (> 60) than Study 2 (fMRI)—as evident from the much flatter right tail of the density in Study 2. In Study 1 scores reach values up to 76, whereas in Study 2 the maximum score is 59. Moreover, in the imaging study (Study 2) there are proportionally more individuals included with STAI-T scores falling in the lower quartile and thus in a group that would be characterized as having no or low anxiety (STAI-T < 37). Hence, we call for caution when interpreting this null finding as a replication failure of findings in Study 1. Instead, sample bias—possibly originating from high anxious individuals not signing up for fMRI studies—may in addition to the differences in power between the studies also contribute to different results in both studies. Hence, we replicate a recent report of the existence of a profound sampling bias in MRI studies in a large set of pooled studies, which showed that participants in MRI studies had lower trait anxiety scores compared to participants in behavioral studies99. This implies that good characterization and reporting of study populations and experimental parameters is highly important especially in individual difference research29.

General discussion Study 1 and 2

The overarching aim of both studies presented here, was to investigate and to explore the putatively specific and shared variance between three commonly used questionnaires associated with negative emotionality and their relation to conditioned responding measured during a fear conditioning experiment in multiple units of analyses (ratings, skin conductance, startle, BOLD-fMRI). These relations were investigated in two large samples (NStudy1 = 356; NStudy2 = 113).

The three questionnaires selected for this study (the trait scale of the STAI, the neuroticism scale of the NEO-FFI and the Intolerance of Uncertainty Scale) were selected because of the abundance of literature in the field of individual differences in fear conditioning research, for a review29. These three questionnaires share a substantial part of their variance, here operationalized as a latent ‘negative emotionality’ variable. Our results hint to potentially specific associations between the STAI-T (Study 1), with the discrimination between cues signaling danger (CS+) or safety (CS−) in the arousal-related outcome measure of skin conductance responding, and intolerance of uncertainty with discriminating danger and safety in valence related outcome measures of startle responses. These results should be interpreted, however, with caution as overfitting in this particularly study cannot be excluded. Importantly, we also find support for the existence of a negative emotionality latent variable that has an effect on general fear learning.

Notably, not accounting for shared variance between measures of emotional negativity in univariate correlational analyses revealed comparable negative associations of all three questionnaires with CS+/CS− discrimination in SCRs—with the STAI-T showing the strongest and significant association. Note that association with NEO-FFI-N and IUS were not statistically significant but correlation coefficients were comparable to the one derived from the STAI-T/SCR association.

Results derived from the multivariate path model approach imply that the observed univariate associations of NEO-FFI-N and IUS with SCR CS+/CS− discrimination might be fully explained by their shared variance with STAI-T. Of note, the observed association between the STAI-T and CS+/CS− discrimination was negative in Study 1 and Study 2 (i.e., high scores are associated with less discrimination)—although clearly non-significant in Study 2—while others have observed positive associations in small samples73,91. There is a plethora of potential and plausible reasons underlying these seeming discrepancies. As discussed in a recent review29, these include a number of procedural factors including high vs. low reinforcement rate100, potency of the experimental situation101 instructions102, additional triggered outcome measures such as startle that impact on the learning process65 as well as sample biases or exclusion of specific participants56,99 to name just a few. Hence, it is possible that neither results may be necessarily ‘wrong’ as different associations may unfold depending on the specific sample, experimental context or boundary conditions. While the large sample in Study 1—in comparison to the typical sample size in the field in individual difference studies, systematically summarized in29—should contribute to trust in our findings, systematic investigations are highly warranted. Hence, we urge authors to focus more on procedural details, demands and related processes, and potential sampling bias in future studies to explore whether this may facilitate mechanistic conclusions29.

Of note, in the substantially smaller Study 2, we do, not observe an association between the STAI-T score with CS+/CS− discrimination in SCRs or ratings. Our sample size calculation revealed that Study 2 was most likely underpowered to detect an association between CS+/CS− discrimination an STAI-T scores (given the correlation coefficient observed in Study 1) and in addition represents individuals sampled from a different distribution, likely caused by the nature of the study (i.e., fMRI). Hence, we replicate the recent results by a report suggesting samples for fMRI and behavioral studies are drawn from different populations99.

Given that nearly all published studies in the literature (with few exceptions103) fall well below the sample size in Study 1 (N = 356), the zero findings across outcome-measures in these studies72,74,104,105,106,107 are difficult to interpret. More large-N studies are needed to determine whether these different results originate from fluctuations around the null (i.e., absence of a true effect) or whether there is indeed a true effect.

In Study 2, we did observe a positive association between the STAI-T and CS− responding in subjective ratings (i.e., higher CS−ratings in individuals with higher scores). This may be in line with the non-significant positive trend observed in univariate analyses for CS− ratings and STAI-T in Study 1. Note that for all three questionnaires/scales in Study 1 (i.e., STAI-T, NEO-FFI-N, IUS) the association between the respective questionnaire/scale and CS− responding in ratings was trend wise significant. A link between negative emotionality and CS− responding is in line with the suggestion of deficient safety signal processing in individuals with affective disorders or those at risk108; previous results in a similarly large sample103; as well as previous reports on associations between STAI-T and deficits in safety signal (e.g., CS−) processing103,109,110,111. Results should however be treated with caution, as this association was only observed in the smaller Study 2, and was only a trend in Study 1.

In Study 2, however, CS+/CS− discrimination in a number of brain areas of key relevance to fear processing and expression are positively associated with the STAI-T score as well as its sub-components (amygdala, bilateral putamen, bilateral thalamus). The construct of trait anxiety as assessed by the STAI-T has been criticized in the literature for representing a psychometrically inhomogeneous scale112, capturing facets of both anxiety and depression33,35,112,113. Factor analyses on the single items of these questionnaires in larger well powered studies could address this question further. Exploratory factor analyses performed on the items included in Study 1 are not included here as the results likely represented over-fitting in this particular study sample (i.e., items derived from one scale loaded primarily on a single factor and few plausibly expected cross loadings between similar items across scales were observed). Note that the sample size of Study 1 in relation to the number of items was too small for this purpose and hence, data are not shown and included. Future work in appropriately sized samples should focus on unraveling cross-questionnaire factors that may inform us on the specific mechanisms and components underlying the association with negative emotionality and the discrimination between danger and safety cues (CS+/CS− discrimination) in fear acquisition.

It is also noteworthy that the selection of questionnaires related to negative emotionality for study 1 was exclusively motivated by evidence from the available literature in the field of human fear conditioning29, in which the three selected measures have commonly been used (in isolation however). Hence future work should extend these findings by targeting additional measures not included in this report, such as specific measures of depression, to further unravel an underlying potentially mechanistic component of negative emotionality driving the link between dispositional negativity and fear learning on an autonomic level.

Furthermore, it is noteworthy, that we observe a specific association not only between differential SCRs and the STAI-T score but also between fear potentiated startle (i.e., CS+ > CS− in startle responding) and intolerance of uncertainty scores. Although awaiting replication of this potentially interesting finding, it its noteworthy that others have also observed intolerance of uncertainty scores to be negatively associated with startle responding during the uncertain but not certain threat condition114 suggesting that it was not predictive of general aversive responding, but specific to responses to uncertain averseness.

Importantly, despite our work providing clear evidence for substantially shared variance between the three questionnaires, the specific dissociations in outcome measures and questionnaire scores (i.e., specific association of STAI-T with CS-discrimination in SCRs, and IUS with CS-discrimination in FPS) may provide insights into the underlying processes. Different outcome measures capture and reflect diverse aspects and represent unique sources of variance in fear processing23 and emotional processing per se115,116. SCRs are thought to reflect general arousal. Startle in turn is considered a rather fear specific index23 that per definition reflects an enhanced reflexive response towards an unexpected, and therewith uncertain, event. Hence, both results may carry complementary mechanistic information corresponding to multi-causal vulnerability in fear and anxiety. As it was technically not yet feasible to implement combined EMG-fMRI measurements at the time of data acquisition, future studies profiting from this novel option26,117 are warranted to investigate the neurobiological mechanisms underlying the specific association between intolerance of uncertainty and FPS.

Our results clearly highlight the value of multimodal work and multivariate analyses tools and suggest that ‘compound profiles’ that integrate multiple input and outcome measures and hence potentially capture multiple processes may in the long run prove useful from a ‘personalized medicine’ perspective. Yet, the associations observed here between measures of negative emotionality and physiological responding are not substantially large or even of medium size. This has to be kept in mind when discussing implications for their potential for biomarker development. Yet, we speculate that a multivariate composite of different response patterns (‘profile’) may show stronger associations which may hold potential for the development of clinically useful products in the long run. Our results suggest that negative emotionality in general and trait anxiety specifically may serve as a potential starting point to identify individuals with difficulties in discriminating signals of threat from safety. These individuals might benefit from tailored discrimination training programs. Future well-powered multimodal studies including a wider range of personality aspects (e.g. depression, optimism) are needed to ultimately obtain more comprehensive personality profiles that might allow to link specific individuals to specific interventions.

In sum, it is fundamental to uncover factors and their potential interactions that contribute to individual risk and resilience to pathological fear—although fear conditioning protocols may rather model adaptive fear118. Hence, improved understanding of the personality related and neurobiological processes underlying individual differences in experimental fear learning can be expected to translate into improved understanding on how adaptive responding to threats turns into maladaptive fear responding119,120. It will thus be important to extend the investigation of individual differences and the underlying neurobiology beyond experimental fear acquisition to tests focusing on the long-term retention of fear and extinction memory (i.e., return of fear121), and ultimately to clinical populations. We provide a very first step towards this overarching aim towards ultimately improving our mechanistic understanding of pathological fear and emotional responding. We provide initial insights of inter-individual differences in fear processing by finding support for specific associations between trait anxiety and physiological responding, as well as for a general link between negative emotionality and fear learning, using multivariate approaches across units of analysis in two samples.