Genome﻿-wide stress sensitivity moderates the stress-depression relationship in a nationally representative sample of adults

We re-evaluate the findings of one of the most cited and disputed papers in gene-environment interaction (GxE) literature. In 2003, a paper was published in Science in which the authors demonstrated that the relationship between stress and depression is moderated by a polymorphism in the promoter region (5-HTTLPR) of the gene SLC6A4. Replication has been weak and led many to challenge the overall significance of GxE research. Here, we utilize data from Add Health, a large, nationally representative, and well-powered longitudinal study to re-examine the genetic determinants of stress sensitivity. We characterize environmental sensitivity using a genome-wide polygenic indicator rather than relying on one polymorphism in a single candidate gene. Our results provide support for the stress-diathesis perspective and validate the scientific contributions of the original paper.

www.nature.com/scientificreports/ els anticipates that the most sensitive individuals will respond more strongly to stress exposure than the least sensitive (LS), but they differ from one another with respect to the intercept, which has important substantive implications. The results presented by Caspi et al. are best characterized by the stress-diathesis (SD) model. Here, environmentally sensitive individuals and their less sensitive counterparts do not differ from one another with respect to depressive symptoms in the least stressful environments. Differences in overall sensitivity, however, lead to a departure such that the environmentally sensitive individuals report significantly higher levels of depressive symptoms in increasingly stressful environments. The emphasis of this model is on the toxicity of stressful environments rather than the benefits of the least stressful environments, per se. This distinction is made clearer when one considers the vantage sensitivity model (VS) shown with the bottom, thick-dashed line of Fig. 1 8,9 . As with the SD model, the VS model anticipates that environmentally sensitive individuals may respond more strongly to stress but suggests that these differences will be the most evident in the most positive (rather than low-stress) environments; that is, sensitive individuals derive the greatest psychological benefits in nurturing, supportive, and stress-free environments.
Finally, the Differential Susceptibility (DS) model combines elements of the SD and VS models and suggests that the most environmentally sensitive individuals will report both higher levels of depressive symptoms in the most stressful environments and lower levels of depressive symptoms in the least stressful environments 10 . This relationship is shown by the cross-over line with small dashes in Fig. 1. Thus, all three GxE models will have the same positive interaction term (i.e., the effects of stress on depressive symptoms will be stronger for environmentally sensitive individuals) but the value of the intercept (i.e., the difference in average depressive symptoms in the least stressful environments) differentiates the three models. Examining all three of these models with these updated data is an important contribution to this larger body of work.
The solid bold line (LS) represents the comparison group for all three models; that is, the genotype that is least sensitive to the environment. Thus, the other lines represent points at which other genotypes are comparatively more sensitive to broad environmental stress than the LS group. The thin, solid line represents the Stress-Diathesis Model (SD), the small-dashed line represents the Differential Susceptibility Model (DS), and the large-dashed line shows the Vantage Sensitivity Model (VS).
Polygenic sensitivity. The second important contribution of our paper is the application of polygenic score (PGS) techniques to the evaluation of the three models of genetically oriented environmental sensitivity. As described in great detail elsewhere 11 , a PGS is a value that is assigned to each individual that is simply the product of an individual's genotype at a single nucleotide polymorphism (SNP) and the value of the effect for that loci identified in an independent and well-powered discovery sample, and then summed across the total number of SNPs for which the individual was genotyped. These scores tend to be normally distributed and are standardized to have an intuitive interpretation. An important contribution to work on PGS construction came from Keers and colleagues 5 who used comparable techniques but instead of focusing on the mean level of an outcome to derive the effect size estimates for each SNP, they focused on discordance among twin pairs to identify the phenotype of environmental sensitivity. Genome-wide regression models were then used to retrieve the  www.nature.com/scientificreports/ beta estimates and risk allele for their overall environmental sensitivity PGS. Thus, reassessing the results of the Caspi et al. paper using an indicator of genetically oriented environmental sensitivity beyond the one candidate gene (i.e., SLC6A4) denotes an important contribution to work in this area. To our knowledge, ours is the second paper to apply this PGS to depression longitudinally, but offers a larger and more diverse sample and focuses more broadly on the stress-diathesis relationship 12 .
Gene-environment correlation and population stratification. Finally, we add to the literature by considering all respondents in the Add Health study for whom genotyped data are available (analytic n = 6472) 13 . Add Health is a nationally-representative, admixed sample of young adults in the U.S., allowing us to expand our analysis beyond individuals of European genetic ancestry, which has unfortunately become the norm 14 .
The original paper by Caspi and colleagues only included "Caucasian non-Maori study members" (n = 387) and research since that time, especially work utilizing PGS estimates, has limited the application of summary statistics to individuals from the same genetic ancestral group of the discovery sample. In our analyses, we analyze all genetic ancestry and racial/ethnic groups together for three reasons: (1) theoretically, we do not agree with the belief that the genetic associations for environmental sensitivity differ as a function of one's racial identity and experience; (2) substantively, the continued stratification of individuals by ethnic classification when examining genetic associations is a problematic practice foreseen nearly 30 years ago in Troy Duster's Backdoor to Eugenics (1990) and the scientific community must work diligently to stop such practices 15 ; and (3) methodologically, we are concerned not with a single, causal biological pathway but instead an overall indicator of genetic associations (i.e., a narrow-sense additive genetic variance component). In ancillary analyses we estimate the same models only with those within the European genetic ancestry group and who self-identify as non-Hispanic White to assuage any further concerns; as expected, the results are virtually identical (available upon request). Another possibility is that the sensitivity genotype is correlated with stress exposure (i.e., gene-environment correlation [rGE]). Those who are more sensitive to stressors may make greater efforts to avoid situations in which they may be exposed to additional sources of stress or strain. As others have pointed out 16 , this active form of rGE can make it difficult to interpret the meaning of a GxE interaction term. Accordingly, we estimated a weak baseline correlation between stress and our PGS for environmental sensitivity (r = 0.059, p < 0.001) that loses all significance (r = 0.011, p < 0.490) once controls for genetic ancestry are added.
In summary, in this paper, we reassess the work of Caspi et al. by (1) examining the utility of a genome-wide approach to understanding environmental sensitivity; (2) evaluating our results in terms of an updated theoretical backdrop; and (3) examining similar associations in a different environmental setting (i.e., a different country (U.S.), birth cohort and historical period, among a broader and older age group, and without restrictions to a single race/ethnic group.

Results
Tables 1 and 2 present the overall descriptive statistics for the analytic sample and bivariate associations between PGS sensitivity and all variables used in the analyses, respectively. Table 3 presents the results from an OLS model in which depressive symptoms are regressed on stress exposure, our environmental sensitivity PGS, and an interaction between the two; Fig. 2 offers a graphical presentation of these estimates. As shown, the models include controls for age, sex, race-ethnicity, educational attainment, and the top five principal components for the full sample of individuals included in the Add Health genetic data 17,18 . The three rows at the top of this table summarize the primary findings of our paper. We report a main effect of stress exposure (b = 0.181, p < 0.000) described in the Methods. Given that the environmental sensitivity PGS is standardized, this estimate reflects the effect of stress on depression for those with an average PGS value. The second value presents the beta estimate for the effect of the PGS on depression. As expected by the stress-diathesis (SD) model, the PGS (b = − 0.009, p < 0.491) is not significantly associated with depression among those with 0 stressful life events. The primary estimates are in bold and provide additional support for the SD model. Specifically, the interaction between stress and the PGS is positive and statistically significant (b = 0.026, p < 0.035). This suggests that the positive association between stress and depressive symptoms is roughly 14.4% stronger among those with a one standard deviationincrease in a genome-wide measure of environmental sensitivity. Figure 2 presents the estimated average value of our depressive symptom measure for individuals with a high (i.e., 75th percentile, line with circles) compared to a low (i.e., 25th percentile, line with x's) value on the environmental sensitivity PGS. These results support the notion that a genome-wide polygenic measure can capture individual differences in environmental sensitivity. These findings are in line with Caspi and colleagues' original work and support the SD model emphasizing the noxious nature of stress exposure rather than the salutary nature of a stress-free environment (VS or DS).
Estimates are derived from Model 3 of Table 3. The thicker line with x's presents individuals with a low (i.e., 25th percentile) value for the environmental sensitivity PGS. The thinner line with circles shows individuals with a high (i.e., 75th percentile) value for the PGS.

Discussion
The results presented here are not meant to replicate the results of the Caspi et al. paper directly. Rather, we use this study to demonstrate the continued significance of the GxE framework and to further our understanding of environmental sensitivity, writ large. Importantly, our understanding of environmental sensitivity is an important dimension of research in the social sciences, epidemiology, and public health in which there is already evidence that broad social-environmental factors can limit or enable small genetic associations to become more prominent. As an example, researchers have identified a significant association between stress exposure level and smoking that is moderated by 5-HTTLPR genotype that is nearly identical to the results presented by Caspi et al. but focused on a different outcome. Specifically, among pairs of brothers who are exposed to the same www.nature.com/scientificreports/  Figure 2. Gene-environment interaction between stress and differential susceptibility genotype as related to depression in adults. www.nature.com/scientificreports/ level of stress at the household level, the sibling with more S' alleles is more likely to smoke in light of increasing numbers of stressors. This same association was not evident among pairs of sisters which is likely due to gender differences in the socialization of appropriate stress-coping behaviors as internalized or externalized 19 . Other work has shown that the relationship between school-level norms regarding cigarette and alcohol consumption and individual-level behaviors is stronger among carriers of the S'-allele in the 5-HTT gene 20,21 . Such "environmentally susceptible" individuals smoke and drink more than they would in other contexts and do so relative to their peers in schools with a high prevalence of these behaviors. These different examples are precisely what Keers and others were trying to capture with their broad indicator of environmental sensitivity linked to genetic loci across the genome 5 . To further illustrate this point we estimate comparable models in which the PGS is calculated for depressive symptoms or major depressive disorder (Tables 4, 5). Both PGSs were positively associated with depressive symptoms but neither significantly interacted with stress to predict depression. While this is an interesting finding that could prove fruitful for future research, the present paper is more broadly focused on global stress sensitivity as a predictor. Taken together with the fact that the PGS estimates for environmental sensitivity are substantively independent from those for major depressive disorder (r = 0.008) and depressive symptoms ( r = − 0.039) ( Table 6), these results provide further evidence that this form of environmental sensitivity is unique from genetic pathways affecting depression and depressive symptoms directly.

Data. National Longitudinal Study of Adolescent to Adult Health (Add Health). Add Health is a nationally
representative cohort drawn from a probability sample of 80 U.S. high schools and 52 U.S. middle schools, representative of U.S. schools in 1994-1995 with respect to region, urban setting, school size, school type, and race or ethnic background (n = 20,745, ages 12-20 years at Wave 1 in 1994-1995). Our analyses use data from Wave V which was conducted during 2016-2018 to collect social, environmental, behavioral, and biological data with which to track the emergence of chronic disease as the cohort advanced through their fourth decade of life. Importantly, the Wave V survey was expanded to obtain retrospective reports of birth and childhood circumstances to supplement existing early life data. Wave V contains a total of 12,300 respondents of which 7033 had genome-wide data. After removing those with missing information on depressive symptom, our final sample contained a total of 6472 respondents. Descriptive statistics for this sample are shown in Table 1.
At Wave IV, Add Health collected Oragene saliva samples from consenting participants (96% of n = 15,701), and requested a second consent to archive their samples for future genomic studies. Approximately 80% consented to archive and were thus eligible for genome-wide genotyping 2 . Genotyping was completed over three Table 4. The influence of stress on depression as a function of major depressive disorder PGS. Stress*PGS is boldfaced to highlight. Reference category in brackets. Cell entries are as follows: b = unstandardized OLS regression estimates; se = standard error; t = test statistic; pr. ≤ two-tailed p-values; min and max = boundaries of the 95% confidence intervals. All data are weighted to reflect the design of the Add Health Study.  The two platforms utilized tag SNP technology to identify and include over 1.1 million and 2.5 million genetic markers, respectively, from Omni1 and Omni2.5 derived from the International HapMap Project and the most informative markers from the 1000 Genomes Project (1KGP). The genetic markers include known disease-associated SNPs from multiple sources, ancestry-informative markers, sex chromosomes, and ABO blood typing markers. The platforms also included probes for the detection of copy number variation (CNV) covering all common CNV regions and more than 5000 rare CNV regions. After quality control procedures, genotype data were available for 9974 individuals: n = 7917 from the Illumina HumanOmni1-Quad chip and for 2057 individuals from the Illumina HumanOmni2.5-Quad chip. After filtering, the Add Health genotype data contained n = 609,130 singlenucleotide polymorphisms (SNPs) common to both chips.

Measures.
Our primary outcome of interest, depression, is a concatenation of several questions asked in the interview. Specifically, we create a four-point scale measuring how frequently the respondent reported (1) being unhappy, (2) unable to "shake the blues, " (3) felt sad, or (4) felt depressed (self-diagnosed). Our scale is coded such that 1 = Generally Happy/Good Mood, while 4 = Extremely Unhappy across the aforementioned variables. Our measure of environmental stress was designed to capture the components/dimensions of stressed referenced in the original paper by Caspi et al. 1 . Specifically, we incorporated questions from Wave V concerning employment/job stress, financial stress, housing stress, physical/mental health stress, and relationship stress into an overall, five-point summative measure, with a value of 1 representing generally low stress and 5 representing  www.nature.com/scientificreports/ generally high stress. Our measure of genetic susceptibility to stress is captured by a PGS based on summary statistics from Keers et al. 5 , who instead of focusing on the mean level of an outcome to derive the effect size estimates for each polymorphism, emphasized discordance among twin pairs to identify the phenotype of environmental sensitivity. Genome-wide regression models were then used to retrieve the beta estimates and risk allele for their overall environmental sensitivity PGS. Our models also control for the first five genetic principal components, as well as age, biological sex, race/ethnicity, and educational attainment. PGSs are calculated as a weighted sum, such that the raw PGSs for environmental sensitivity are calculated as: where SNP ij is the allele frequency of the jth SNP for the ith individual and β j is the estimated association between SNP j and within-pair variability in emotional problems among monozygotic twins as reported by Keers et al. 5 .
The raw PGSs are then standardized (μ = 0 and σ = 1) within ancestry groups to account for between-group population stratification. The Add Health genotyped sample is restricted to four genetic ancestry groups: (1) European, (2) African, (3) Latin American, and (4) East Asian. To identify respondents in these four genetic ancestry groups, a principal component analysis is conducted on all unrelated members of the full genotyped sample. Estimates are then projected onto the remaining related individuals. Each genetic ancestry group is defined by distance from the mean of the first two principal components of the genetic data. To be included in the Latin American, East Asian, and European ancestry groups individuals must be within ± 1 standard deviation of the mean of the first two principal components of the genetic data estimated from all individuals in the Add Health genome-wide data who self-identified as Hispanic, Asian, and non-Hispanic White, respectively. To be included in the African ancestry group individuals must be within ± 2 standard deviations of the mean of the first principal component and ± 1 standard deviation of the mean of the second principal component estimated from all individuals in the genome-wide data who self-identified as non-Hispanic Black.
While genetic ancestry and race/ethnicity are correlated (r = 0.89), they are distinct constructs and attempts to conflate the two are problematic. More specifically, population stratification refers to differences in genetic variation between geographical ancestry groups. Due primarily to the genetic bottle neck created by the small number of humans (~ 2000) who migrated out of Africa early in human history and the tendency for people to procreate with individuals from the same or nearby geographic regions, genetic variance across the entire genome is highly correlated with geography (see 22 for more detail). However, genetic ancestry should not be conflated with race or ethnicity. Race and ethnicity are social constructs based on a multitude of factors, of which genetic ancestry may or may not be included depending on historical and societal differences in racialization 23 . Consequently, not all individuals included in a given genetic ancestry group may self-identify or be classified by others as the same race and/or ethnicity as other members of their genetic ancestry group.
See 24 for more details on the Add Health GWAS sample.
Statistical analyses. Models were estimated using OLS regression with the appropriate sampling weights to reflect the study design of Add Health. Our Stata .do-file (i.e., syntax script) with full coding of variables and models is available upon request. www.nature.com/scientificreports/ Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.