Genome-wide by environment interaction studies of depressive symptoms and psychosocial stress in UK Biobank and Generation Scotland

Stress is associated with poorer physical and mental health. To improve our understanding of this link, we performed genome-wide association studies (GWAS) of depressive symptoms and genome-wide by environment interaction studies (GWEIS) of depressive symptoms and stressful life events (SLE) in two UK population-based cohorts (Generation Scotland and UK Biobank). No SNP was individually significant in either GWAS, but gene-based tests identified six genes associated with depressive symptoms in UK Biobank (DCC, ACSS3, DRD2, STAG1, FOXP2 and KYNU; p < 2.77 × 10−6). Two SNPs with genome-wide significant GxE effects were identified by GWEIS in Generation Scotland: rs12789145 (53-kb downstream PIWIL4; p = 4.95 × 10−9; total SLE) and rs17070072 (intronic to ZCCHC2; p = 1.46 × 10−8; dependent SLE). A third locus upstream CYLC2 (rs12000047 and rs12005200, p < 2.00 × 10−8; dependent SLE) when the joint effect of the SNP main and GxE effects was considered. GWEIS gene-based tests identified: MTNR1B with GxE effect with dependent SLE in Generation Scotland; and PHF2 with the joint effect in UK Biobank (p < 2.77 × 10−6). Polygenic risk scores (PRSs) analyses incorporating GxE effects improved the prediction of depressive symptom scores, when using weights derived from either the UK Biobank GWAS of depressive symptoms (p = 0.01) or the PGC GWAS of major depressive disorder (p = 5.91 × 10−3). Using an independent sample, PRS derived using GWEIS GxE effects provided evidence of shared aetiologies between depressive symptoms and schizotypal personality, heart disease and COPD. Further such studies are required and may result in improved treatments for depression and other stress-related conditions.


Introduction
Mental illness results from the interplay between genetic susceptibility and environmental risk factors 1,2 . Previous studies have shown that the effects of environmental factors on traits may be partially heritable 3 and moderated by genetics 4,5 . Major depressive disorder (MDD) is the most common psychiatric disorder with a lifetime prevalence of approximately 14% globally 6 and with a heritability of approximately 37% 7 . There is strong evidence for the role of stressful life events (SLEs) as risk factor and trigger for depression [8][9][10][11][12] . Genetic control of sensitivity to stress may vary between individuals, resulting in individual differences in the depressogenic effects of SLE, i.e., genotype-byenvironment interaction (GxE) 4,[13][14][15][16] . Significant evidence of GxE has been reported for common respiratory diseases and some forms of cancer [17][18][19][20][21][22] , and GxE studies have identified genetic risk variants not found by genome-wide association studies (GWAS) [23][24][25][26][27] .
Interaction between polygenic risk of MDD and recent SLE are reported to increase liability to depressive symptoms 4,16 ; validating the implementation of genomewide approaches to study GxE in depression. Most GxE studies for MDD have been conducted on candidate genes, or using polygenic approaches to a wide range of environmental risk factors, with some contradictory findings [28][29][30][31][32] . Incorporating knowledge about recent SLE into GWAS may improve our ability to detect risk variants in depression otherwise missed in GWAS 33 . To date, three studies have performed genome-wide by environment interaction studies (GWEIS) of MDD and SLE [34][35][36] , but this is the first study to perform GWEIS of depressive symptoms using adult SLE in cohorts of relatively homogeneous European ancestry.
Interpretation of GxE effects may be hindered by gene-environment correlation. Gene-environment correlation denotes a genetic mediation of associations through genetic influences on exposure to, or reporting of, environments 2,37 . Genetic factors predisposing to MDD may contribute to exposure and/or reporting of SLE 38 . To tackle this limitation, measures of SLE can be broken down into SLE likely to be independent of a respondent's own behaviour and symptoms, or into dependent SLE, in which participants may play an active role exposure to SLE 39,40 . Different genetic influences, including a higher heritability, are reported for dependent SLE compared to independent SLE 38,[41][42][43][44] , suggesting that whereas GxE driven by independent SLE is likely to reflect a genetic moderation of associations between SLE and depression, GxE driven by dependent SLE may result from a genetic mediation of the association through genetically driven personality or behavioural traits. To test this, we analysed dependent and independent SLE scores separately in Generation Scotland (GS).
Stress contributes to many human conditions, with evidence of genetic vulnerability to the effect of SLE 45 . Therefore, genetic stress-response factors in MDD may also underlie the aetiology of other stress-linked disorders with which MDD is often comorbid 46,47 (e.g., cardiovascular diseases 48 , diabetes 49 , chronic pain 50 and inflammation 51 ). We tested the hypothesis that pleiotropy and shared aetiology between mental and physical health conditions may be due in part to genetic variants underlying SLE effects in depression.
In this study, we conduct GWEIS of depressive symptoms incorporating data on SLE in two independent UKbased cohorts. We aimed to: (i) identify loci associated with depressive symptoms through genetic response to SLE; (ii) study dependent and independent SLE to support a contribution of genetically mediated exposure to stress; (iii) assess whether GxE effects improve the proportion of phenotypic variance in depressive symptoms explained by genetic additive main effects alone; and (iv) test for a significant overlap in the genetic aetiology of the response to SLE and mental and physical stress-related phenotypes.

Materials and methods
The core workflow of this study is summarised in Fig. 1.

Cohort descriptions GS
GS is a family-based population cohort representative of the Scottish population 52 . At baseline, blood and salivary DNA samples were collected, stored and genotyped at the Wellcome Trust Clinical Research Facility, Edinburgh. Genome-wide genotype data were generated using the Illumina HumanOmniExpressExome-8 v1.0 DNA Analysis BeadChip (San Diego, CA, USA) and Infinium chemistry 53 . The procedures and details for DNA extraction and genotyping have been extensively described elsewhere 54,55 . In total, 21,525 participants were re-contacted to participate in a follow-up mental health study (Stratifying Resilience and Depression Longitudinally, STRADL), of which 8541 participants responded providing updated measures in psychiatric symptoms and SLE through self-reported mental health questionnaires 56 . Samples were excluded if: they were duplicate samples, had diagnoses of bipolar disorder, no SLE data (non-respondents), were population outliers (mainly non-Caucasians and Italian ancestry subgroup), had sex mismatches or were missing >2% of genotypes. Single nucleotide polymorphisms (SNPs) were excluded if: missing >2% of genotypes, Hardy-Weinberg equilibrium test p < 1 × 10 −6 , or minor allele frequency <1%. Further details of the GS and STRADL cohort are available elsewhere 52,56-58 . All components of GS and STRADL obtained ethical approval from the Tayside Committee on Medical Research Ethics on behalf of the NHS (reference 05/s1401/89). After quality control, individuals were filtered by degree of relatedness (pi-hat < 0.05), maximising retention of those individuals reporting a higher number of SLE. The final dataset comprised data on 4919 unrelated individuals (1929 men; 2990 women) and 560,351 SNPs.

Independent GS datasets
Additional datasets for a range of stress-linked medical conditions and personality traits were created from GS (N = 21,525) excluding respondents and their relatives (N = 5724). Following the same quality control criteria detailed above, we maximised unrelated non-respondents for retention of cases, or proxy cases (see below), to maximise the information available for each phenotype. This resulted in independent datasets with unrelated individuals for each trait. Differences between respondents and non-respondents are noted in the figure legend of Table 1.

UK Biobank (UKB)
This study used data from 99,057 unrelated individuals (47,558 men; 51,499 women) from the initial release of UKB genotyped data (released 2015; under UKB project 4844). Briefly, participants were removed based on UKB genomic analysis exclusion, non-white British ancestry, high missingness, genetic relatedness (kinship coefficient > 0.0442), QC failure in UK BiLEVE study and gender mismatch. GS participants and their relatives were excluded and GS SNPs imputed to a reference set combining the UK10K haplotype and 1000 Genomes Phase 3 reference panels 59 . After quality control, 1,009,208 SNPs remained. UKB received ethical approval from the NHS National Research Ethics Service North West (reference: 11/NW/0382). Further details on UKB cohort description, genotyping, imputation and quality control are available elsewhere [60][61][62] .
All participants provided informed consent.

Phenotype assessment SLEs
GS participants reported SLE experienced over the preceding 6 months through a self-reported brief life events questionnaire based on the 12-item list of threatening experiences 39,63,64 (Supplementary Table 1a). The total number of SLE reported (TSLE) consisted of the number of 'yes' responses. TSLE were subdivided into SLE Fig. 1 Study flowchart. Overview of the analyses conducted in this study: (i) identify loci associated with depressive symptoms through genetic response to SLE; (ii) test whether results of studying dependent and independent SLE support a contribution of genetically mediated exposure to stress; (iii) assess whether GxE effects improve the proportion of phenotypic variance in depressive symptoms explained by genetic additive main effects alone and (iv) test whether there is significant overlap in the genetic aetiology of the response to SLE and mental and physical stress-related phenotypes. Two core cohorts are used, Generation Scotland (GS) and UK Biobank (UKB). Summary statistics from genome-wide association studies (GWAS) and genome-wide by environment interaction studies (GWEIS) are used to generate polygenic risk scores (PRSs). Summary statistics from Psychiatric Genetic Consortium (PGC) Major Depressive Disorder (MDD) GWAS are also used to generate PRS (PRS MDD ). PRS weighted by: additive effects (PRS D and PRS MDD ), GxE effects (PRS GxE ) and joint effects (the combined additive and GxE effect; PRS Joint ), are used for phenotypic prediction. TSLE stands for total number of SLE reported. DSLE stands for SLE dependent on an individual's own behaviour. Conversely, ISLE stands for independent SLE. N stands for sample size. N noGS stands for sample size with GS individuals removed. N noUKB stands for sample size with UKB individuals removed potentially dependent or secondary to an individual's own behaviour (DSLE, questions 6-11 in Supplementary Table  1a), and independent SLE (ISLE, questions 1-5 in Supplementary Table 1a; pregnancy item removed) following Brugha et al. 39,40 . Thus, three SLE measures (TSLE, DSLE and ISLE) were constructed for GS. UKB Samples were maximised for retention of cases to maximise the information available for each trait. There was no preferential selection of relatives in pairs for quantitative phenotypes, in order to retain the underlying distribution. All individuals involved in the datasets listed above were non-respondents to the GS follow-up study. Compared with individuals included at GS GWEIS (respondents in GS follow-up), non-respondents were significantly: younger, from more socioeconomically deprived areas, generally less healthier and wealthier. Non-respondents were more likely to smoke, and less likely to drink alcohol, although they consumed more units per week, compared with respondents. At GS baseline, non-respondents experienced more psychological distress and reported higher scores in symptoms of GHQ-depression and GHQ-anxiety than respondents 56 The total target sample size (N), number of males and females in N, number of SNPs (N SNPs) in target sample size N: the number of SNPs used as predictors after clumping step range between 90,650 and 91,000. The number of cases and controls in the independent target sample is indicated for binary phenotypes only. Samples were mapping by proxy approach was used (i.e., where first-degree relatives of individuals with the disease were considered proxy cases and included into the group of cases) are indicated by (R) GS Generation Scotland, GWEIS genome-wide by environment interaction studies, GHQ General Health Questionnaire a Assessed through self-reported questionnaires participants were screened for 'illness, injury, bereavement and stress'' (Supplementary Table 1b) over the previous 2 years using six items included in the UKB Touchscreen questionnaire. A score reflecting SLE reported in UKB (TSLE UKB ) was constructed by summing the number of 'yes' responses.

Psychological assessment
GS participants reported whether their current mental state over the preceding 2 weeks differed from their typical state using a self-administered 28-item scaled version of the General Health Questionnaire (GHQ) [65][66][67] . Participants rated the degree and severity of their current symptoms with a four-point Likert scale (following Goldberg et al. 67 ). A final log-transformed GHQ was used to detect altered psychopathology and thus, assess depressive symptoms as results of SLE. In UKB participants, current depressive symptoms over the preceding 2 weeks were evaluated using four psychometric screening items (Supplementary Table 2), including two validated and reliable questions for screening depression 68 , from the Patient Health Questionnaire (PHQ) validated to screen mental illness 69,70 . Each question was rated in a four-point Likert scale to assess impairment/severity of symptoms. Due to its skewed distribution, a four-point PHQ score was formed from PHQ (0 = 0; 1 = 1-2; 2 = 3-5; 3 = 6 or more) to create a more normal distribution.

Stress-related traits
Targeted GS stress-related phenotypes and sample sizes are shown in Table 1 and detailed elsewhere 52 . These conditions were selected from literature review based on previous evidence of a link with stress 45 (see also Supplementary Material: third section). Furthermore, we created additional independent samples using mapping by proxy, where individuals with a self-reported first-degree relative with a selected phenotype were included as proxy cases. This approach provides greater power to detect susceptibility variants in traits with low prevalence 71 .

Statistical analyses SNP-heritability and genetic correlation
A restricted maximum likelihood approach was applied to estimate SNP-heritability (h 2 SNP ) of depressive symptoms and self-reported SLE measures, and within samples bivariate genetic correlation between depressive symptoms and SLE measures using GCTA 72 .

GWAS analyses
GWAS were conducted in PLINK 73 . In GS, age, sex and 20 principal components (PCs) were fitted as covariates. In UKB, age, sex and 15 PCs recommended by UKB were fitted as covariates. The genome-wide significance threshold was p = 5 × 10 -8 .

GWEIS analyses
GWEIS were conducted on GHQ (the dependent variable) for TSLE, DSLE and ISLE in GS and on PHQ for TSLE UKB in UKB fitting the same covariates detailed above to reduce error variance. GWEIS were conducted using an R plugin for PLINK 73 developed by Almli et al. 74 (https://epstein-software.github.io/robustjoint-interaction). This method implements a robust test that jointly considers SNP and SNP-environment interaction effects from a full model (Y~β 0 + βSNP + βSLE + βSNPxSLE + βCovariates) against a null model where both the SNP and SNP×SLE effects equal 0, to assess the joint effect (the combined additive main and GxE genetic effect at a SNP) using a nonlinear statistical approach that applies Huber-White estimates of variance to correct possible inflation due to heteroscedasticity (unequal variances across exposure levels). This robust test should reduce confounding due to differences in variance induced by covariate interaction effects if present 75 . Additional code was added (courtesy of Prof. Michael Epstein; 74 Supplementary Material) to generate betacoefficients and the p-value of the GxE term alone. In UKB, correcting for 1,009,208 SNPs and one exposure, we established a Bonferroni-adjusted threshold for significance at p = 2.47 × 10 -8 for both joint and GxE effects. In GS, correcting for 560,351 SNPs and three measures of SLE we established a genome-wide significance threshold of p = 2.97 × 10 -8 .

Polygenic profiling and prediction
Polygenic risk scores (PRSs) weighting by GxE effects (PRS GxE ) were generated using PRSice-2 77 (Supplementary Material) in GS using GxE effects from UKB-GWEIS. In UKB, PRS GxE were constructed using GxE effects from all three GS-GWEIS (TSLE, DSLE and ISLE as exposures) independently. PRS were also weighted in both samples using either UKB-GWAS or GS-GWAS statistics (PRS D ), and summary statistics from Psychiatric Genetic Consortium (PGC) MDD-GWAS (released 2016; PRS MDD ) that excluded GS and UKB individuals when required (N noGS = 155,866; N noUKB = 138,884). Furthermore, we calculated PRS weighted by the joint effects (the combined additive main and GxE genetic effects; PRS Joint ) from either the UKB-GWEIS or GS-GWEIS. PRS predictions of depressive symptoms were permuted 10,000 times. Multiple regression models fitting PRS GxE and PRS MDD , and both PRS GxE and PRS D were tested. All models were adjusted by same covariates used in GWAS/ GWEIS. Null models were estimated from the direct effects of covariates alone. The predictive improvement of combining PRS GxE and PRS MDD /PRS D effects over PRS MDD /PRS D effects alone was tested for significance using the likelihood ratio test (LRT).
Prediction of PRS D , PRS GxE and PRS Joint of stress-linked traits were adjusted by age, sex and 20 PCs; and permuted 10,000 times. Empirical-p-values after permutations were further adjusted by false discovery rate (FDR, conservative threshold at Empirical-p = 6.16 × 10 -3 ). The predictive improvement of fitting PRS GxE combined with PRS D and covariates over prediction of a phenotype using the PRS D effect alone with covariates was assessed using LRT, and LRT-p-values adjusted by FDR (conservative threshold at LRT-p = 8.35 × 10 -4 ).

GWAS of depressive symptoms
No genome-wide significant SNPs were detected by GWAS in either cohort. Top findings (p < 1 × 10 −5 ) are summarised in Supplementary Table 4. Manhattan and QQ plots are shown in Supplementary Figures 1-4. There was no evidence of genomic inflation (all λ 1000 < 1.01).

Post-GWEIS analyses: gene-sets enrichment
Significant gene-sets and GWAS catalogue hits from GWEIS are detailed in Supplementary Tables 11-14, including for UKB Biocarta: GPCR pathway; Reactome: opioid signalling, neurotransmitter receptor binding and downstream transmission in the postsynaptic cell, transmission across chemical synapses, gastrin CREB signalling pathway via PKC and MAPK; GWAS catalogue: post bronchodilator FEV1/FVC ratio, migraine and body mass index. In GS, enrichment was seen using TSLE and DLSE for GWAS catalogue: age-related macular degeneration, myopia, urate levels and Heschl's gyrus morphology; and using ISLE for biological process: regulation of transporter activity. All adjusted-p < 0.01.

Cross-cohort prediction
In GS, PRS D weighted by the UKB-GWAS of PHQ significantly explained 0.56% of GHQ variance (Empiricalp < 1.10 −4 ), similar to PRS MDD weighted by PGC MDD-GWAS (R 2 = 0.78%, Empirical-p < 1.10 −4 ). PRS GxE weighted by the UKB-GWEIS GxE effects explained 0.15% of GHQ variance (Empirical-p = 0.03, Supplementary Table 15). PRS GxE fitted jointly with PRS MDD significantly improved prediction of GHQ (R 2 = 0.93%, model p = 6.12 × 10 −11 ; predictive improvement of 19%, LRT-p = 5.91 × 10 −3 ) compared with PRS MDD alone. Similar to PRS GxE with PRS D (R 2 = 0.69%, model p = 2.72 × 10 −8 ; predictive improvement of 23%, LRT-p = 0.01). PRS Joint weighted by the UKB-GWEIS also predicted GHQ (R 2 = 0.58%, Empirical-p < 1.10 −4 ), although and of about 19% when PRS GxE was incorporated into a multiple regression model along with PRS MDD . Such a gain was not seen in UKB, but it must be noted that both PRS D and PRS MDD also explains much less variance of PHQ in UKB than of GHQ in GS. Also note, a noticeably reduction of variance explained by PRS Joint compared with combined polygenic risk scores (PRS)/effects the variance explained was significantly reduced compared with the model fitting PRS GxE and PRS D together (LRT-p = 4.69 × 10 −7 ), suggesting that additive and GxE effects should be modelled independently for polygenic approaches (Fig. 2a).

Prediction of stress-related traits
Prediction of stress-related traits in independent samples using PRS D , PRS GxE and PRS Joint are summarised in Fig. 3a and Supplementary Table 16. Significant trait prediction after FDR adjustment (Empirical-p < 6.16 × 10 −3 , FDR-adjusted Empirical-p < 0.05) using both UKB and GS PRS D was seen for: depression status, neuroticism and schizotypal personality. PRS GxE weighted by the GS-GWEIS GxE effect using ISLE significantly predicted depression status mapped by proxy (Empirical-p = 7.00 × 10 −4 , FDR-adjusted Empirical-p = 9.54 × 10 −3 ). Nominally significant predictive improvements (LRT-p < 0.05) of fitting PRS GxE , over the PRS D effect alone, using summary statistics generated from both UKB and GS were detected for schizotypal personality, heart diseases and chronic obstructive pulmonary disease (COPD) by proxy (Fig. 3b). PRS GxE weighted by GS-GWEIS GxE effect using ISLE significantly improved prediction over PRS D effect alone of depression status mapped by proxy after FDR adjustment (LRT-p = 1.96 × 10 −4 , FDRadjusted LRT-p = 2.35 × 10 −2 ).

Discussion
This study performs GWAS and incorporates data on recent adult SLEs into GWEIS of depressive symptoms, identifies new loci and candidate genes for the modulation of genetic response to SLE; and provides insights to help disentangle the underlying aetiological mechanisms increasing the genetic liability through SLE to both depressive symptoms and stress-related traits.
SNP-heritability of depressive symptoms (h 2 SNP = 9-13%), were slightly higher than previous estimates from African-American populations 34 , and over a third larger than estimates in MDD from European samples 78 . h 2 SNP for PHQ in UKB (9.0%) remained significant after adjusting for SLE (7.9%). Thus, although some genetic contributions may be partially shared between depressive symptoms and reporting of SLE, there is still a relatively large genetic contribution unique to depressive symptoms. Significant h 2 SNP of DSLE in GS (13%) and TSLE UKB in UKB (4%), which is mainly composed of dependent SLE items, were detected similar to previous studies (8 and 29%) 34,42 . Conversely, there was no evidence for heritability of independent SLE. A significant bivariate genetic correlation between depressive symptoms and SLE (rG = 0.72) was detected in UKB after adjusting for covariates, suggesting that there are shared common variants underlying self-reported depressive symptoms and SLE. This bivariate genetic correlation was smaller than that estimated from African-American populations (rG = 0.97; p = 0.04; N = 7179) 34 . Genetic correlations between SLE measures and GHQ were not significant in GS (N = 4919; rG = 1; all p > 0.056), perhaps due to a lack of power in this smaller sample.
Post-GWAS gene-based tests detected six genes significantly associated with PHQ (DCC, ACSS3, DRD2, STAG1, FOXP2 and KYNU). Previous studies have implicated these genes in liability to depression (see Supplementary Table 17), and three of them are genomewide significant in gene-based tests from the latest metaanalysis of major depression that includes UKB (DCC, p = 2.57 × 10 −14 ; DRD2, p = 5.35 × 10 −14 ; and KYNU, p = 2.38 × 10 −6 ; N = 807,553) 79 . This supports the implementation of quantitative measures such as PHQ to detect genes underlying lifetime depression status 80 . For example, significant gene ontology analysis of the UKB-GWAS identified enrichment for positive regulation of long-term synaptic potentiation, and for previous GWAS findings of brain structure 81 , schizophrenia 82 and response to amphetamines 83 .
The key element of this study was to conduct GWEIS of depressive symptoms and recent SLE. We identified two loci with significant GxE effect in GS. However, none of these associations replicated in UKB (p > 0.05). The strongest association was using TSLE at 53-kb downstream of PIWIL4 (rs12789145). PIWIL4 is brain expressed and involved in chromatin modification 84 , suggesting it may moderate the effects of stress on depression. It encodes HIWI2, a protein thought to regulate OTX2, that is critical for the development of forebrain and for coordinating critical periods of plasticity disrupting the integration of cortical circuits 85,86 . Indeed, an intronic SNP in PIWIL4 was identified as the strongest GxE signal in attention deficit hyperactivity disorder using mother's warmth as environmental exposure 87 . The other significant GxE identified in our study was in ZCCHC2 using DSLE. This zinc-finger protein is expressed in blood CD4 + T cells and is downregulated in individuals with MDD 88 and in those resistant to treatment with citalopram 89 . No GxE effect was seen using ISLE as exposure.
No significant locus or gene with GxE effect was detected in the UKB-GWEIS. Nevertheless, joint effects (the combined additive main and GxE genetic effects) identified two genes significantly associated with PHQ (ACSS3 and PHF2; see Supplementary Table 17). PHF2 was recently detected as genome-wide significant at the latest meta-analysis of depression 79 . Notably, PHF2 paralogs have previously been linked with MDD through stress-response in three other studies [90][91][92] . Joint effects analyses in GS also detected an additional significant association upstream CYLC2, a gene nominally associated (p < 1 × 10 −5 ) with obsessive-compulsive disorder and Tourette's syndrome 93 . Gene-based test from the GS-GWEIS identified a significant association with  (Table 1). (R) refers to traits using mapping by proxy approach (i.e., where first-degree relatives of individuals with the disease are considered proxy cases and included into the group of cases). Y axis shows the discovery sample and the effect used to weight PRS. Numbers in cells indicate the % of variance explained, also represented by colour scale. Significance is represented by asterixes according to the following significance codes: **p < 0.01; *p < 0.05; in grey Empirical-p-values after permutation (10,000 times) and in yellow FDR-adjusted Empirical-pvalues. b Predictive improvement by GxE effect in independent GS datasets. Heatmap illustrating the predictive improvement as a result of incorporating PRS GxE into a multiple model along with PRS D and covariates (full model), over a model fitting PRS D alone with covariates (null model); predicting a wide range of traits from GS listed in the x axis (Table 1). Covariates: age, sex and 20 PCs. (R) refers to traits using mapping by proxy approach (i.e., where first-degree relatives of individuals with the disease are consider proxy cases and included into the group of cases). PRS GxE are weighted by genome-wide by environment interaction studies (GWEIS) using GxE effects. PRS D were weighted by the genome-wide association studies (GWAS) of depressive symptoms additive main effects. The y axis shows the discovery sample used to weight PRS. Numbers in cells indicate the % of variance explained by the PRS GxE , also represented by colour scale. Notice that those correspond to the PRS GxE predictions in Fig. 3a when PRS GxE are weighted by GxE effects. Significance was tested by likelihood ratio tests (LRT): full model including PRS D + PRS GxE vs. null model with PRS D alone (covariates adjusted). Significance is represented by asterixes according to the following significance codes: ***p < 0.001; **p < 0.01; *p < 0.05; in grey LRT-p-values and in yellow FDR-adjusted LRT-p-values MTNR1B, a melatonin receptor gene, using DSLE (both GxE and joint effect; Supplementary Table 17). Genes prioritised using GxE effects were enriched in differentially expressed genes from several tissues including the adrenal gland, which releases cortisol into the bloodstream in response to stress, thus playing a key role in the stress-response system, reinforcing a potential role of GxE in stress-related conditions.
The different instruments and sample sizes available make it hard to compare results between cohorts. Whereas GS contains deeper phenotyping measurements of stress and depressive symptoms than UKB, the sample size is much smaller, which may be reflected in the statistical power required to reliably detect GxE effects. Furthermore, the presence and size of GxE effects are dependent on their parameterisation (i.e., the measurement, scale and distribution of the instruments used to test such interaction) 94 . Thus, GxE may be incomparable across GWEIS due to differences in both phenotype assessment and stressors tested. Although our results suggest that both depressive symptom measures are correlated with lifetime depression status, different influences on depressive symptoms from the SLE covered across studies may contribute to lack of stronger replication. Instruments in GS cover a wider range of SLE and are more likely to capture changes in depressive symptoms as consequence of their short-term effects. Conversely, UKB could capture more marked long-term effects, as SLE were captured over 2 years compared with the 6 months in GS. New mental health questionnaires covering a wide range of psychiatric symptoms and SLE in the latest release of UKB data provides the opportunity to create similar measures to GS in the near future. Further replication in independent studies with equivalent instruments is required to validate our GWEIS findings. Despite these limitations and a lack of overlap in the individual genes prioritised from the two GWEIS, replication was seen in the predictive improvement of using PRS GxE derived from the GWEIS GxE effects to predict stress-related phenotypes.
The third aim of this study was to test whether modelling GxE effects could improve predictive genetic models, and thus help to explain deviation from additive models and missing heritability for MDD 95 . Multiple regression models suggested that inclusion of PRS GxE weighted by GxE effects could improve prediction of an individual's depressive symptoms over use of PRS MDD or PRS D weighted by additive effects alone. In GS, we detected a predictive gain of 19% over PRS MDD weighted by PGC MDD-GWAS, and a gain of 23% over PRS D weighted by UKB-GWAS (Fig. 2a). However, these findings did not surpass stringent Bonferroni correction and could not be validated in UKB. This may reflect in the poor predictive power of the PRS generated from the much smaller GS discovery sample. The results show a noticeably reduced prediction using PRS Joint weighted by joint effects, which suggests that the genetic architecture of stress-response is at least partially independent and differs from genetic additive main effects. Overall, our results from multiple regression models suggest that for polygenic approaches main and GxE effects should be modelled independently.
SLE effects are not limited to mental illness 45 . Our final aim was to investigate shared aetiology between GxE for depressive symptoms and stress-related traits. Despite the differences between the respondents and nonrespondents (Table 1 legend), a significant improvement was seen in predicting depressive status when mapping by proxy cases using GxE effect from GS-GWEIS with independent SLE (FDR-adjusted LRT-p = 0.013), but not with dependent SLE. GxE effects using statistics generated from both discovery samples, despite the differences in measures, nominally improved the phenotypic prediction of schizotypal personality, heart disease and the proxy of COPD (LRT-p < 0.05). Other studies have also found evidence supporting a link between stress and depression in these phenotypes (see Supplementary Material for extended review) and suggest, for instance, potential pleiotropy between schizotypal personality and stressresponse. Our findings point to a potential genetic component underlying a stress-response-depressioncomorbidities link due, at least in part, to shared stressresponse mechanisms. A relationship between SLE, depression and coping strategies such as smoking suggests that genetic stress-response may modulate adaptive behaviours such as smoking, fatty diet intake, alcohol consumption and substance abuse. This is discussed further in the Supplementary Material.
In this study, evidence for SNPs with significant GxE effects came primarily from the analyses of dependent SLE and not from independent SLE. This supports a genetic effect on probability of exposure to, or reporting of SLE, endorsing a gene-environment correlation. Chronic stress may influence cognition, decision making and behaviour eventually leading to higher risk taking 96 . These conditions may also increase sensitivity to stress among vulnerable individuals, including those with depression, who also have a higher propensity to report SLE, particularly dependent SLE 38 . A potential reporting bias in dependent SLE may be mediated as well by heritable behavioural, anxiety or psychological traits such as risk taking 42,97 . Furthermore, individuals vulnerable to MDD may behave in a manner that exposes them more frequently to high risk and stressful environments 14 . This complex interplay, reflected in the form of a gene-environment correlation effect, would hinder the interpretation of GxE effects from GWEIS as pure interactions. A mediation of associations between SLE and depressive symptoms, through genetically driven sensitivity to stress, personality or behavioural traits would support the possibility of subtle genotype-by-genotype (GxG) interactions, or genotype-by-genotype-by-environment (GxGxE) interactions, contributing to depression 98,99 . In contrast, PRS prediction of the stress-related traits: schizotypal personality, heart disease and COPD, was primarily from derived weights using independent SLE, suggesting that a common set of variants moderate the effects of SLE across stress-related traits and that larger sample sizes will be required to detect the individual SNPs contributing to this. Thus, our findings support the inclusion of environmental information into GWAS to enhance the detection of relevant genes. The results of studying dependent and independent SLE support a contribution of genetically mediated exposure to and/or reporting of SLE, perhaps through sensitivity to stress as mediator.
This study emphasises the relevance of GxE in depression and human health in general and provides the basis for future lines of research.