Abstract
Gene-environment interactions (GxE) are often suggested to play an important role in the aetiology of psychiatric phenotypes, yet so far, only a handful of genome-wide environment interaction studies (GWEIS) of psychiatric phenotypes have been conducted. Representing the most comprehensive effort of its kind to date, we used data from the UK Biobank to perform a series of GWEIS for neuroticism across 25 broadly conceptualised environmental risk factors (trauma, social support, drug use, physical health). We investigated interactions on the level of SNPs, genes, and gene-sets, and computed interaction-based polygenic risk scores (PRS) to predict neuroticism in an independent sample subset (Nā=ā10,000). We found that the predictive ability of the interaction-based PRSs did not significantly improve beyond that of a traditional PRS based on SNP main effects from GWAS, but detected one variant and two gene-sets showing significant interaction signal after correction for the number of analysed environments. This study illustrates the possibilities and limitations of a comprehensive GWEIS in currently available sample sizes.
Similar content being viewed by others
Introduction
Neuroticism is a personality trait that is characterised by emotion dysregulation and negative affect. It has been thought to confer a general susceptibility to mental health problems, resulting in the frequent experience of negative emotions such as worry, sadness, self-consciousness, or anger1,2,3. High neuroticism is associated with increased psychiatric comorbidity, and there is a substantial overlap between neuroticism and a wide range of psychiatric disorders, particularly depression and anxiety4,5,6. The associated societal costs of neuroticism are substantial7, leading to increased use of both mental and physical health services due to poorer overall health and quality of life8.
Twin studies have estimated the heritability of neuroticism to be around 40%, with the rest typically attributed to non-shared environmental factors9,10,11,12,13. In recent years, the genetic aetiology of neuroticism has been studied using large-scale genome-wide association studies (GWAS) which have uncovered more than a hundred genomic loci that point towards genes and pathways involved in brain functioning14,15.
In the epidemiological literature, neuroticism and related phenotypes have been linked with a range of different environmental factors, with traumatic events, childhood maltreatment, and social support receiving the greatest attention16,17,18,19,20,21,22,23. Despite such studies consistently implicating environments that are shared within families, twin studies tend to assign very little or no proportion of variance to shared environmental factors:10,11,12,13 a phenomenon called the āthe shared environment paradoxā24.
It has been hypothesised that shared environments simply do not matter as much as do non-shared environments25, a notion which has been related to the distinction between the āobjectiveā and āeffectiveā environments26. That is, while an environment may āobjectivelyā be shared between family members, their āeffectiveā environment, i.e., the environment as they experience it, is nevertheless unique; as is then also the resulting impact of that environment on each individual.
More recently, Uher and Zwicker proposed that the most parsimonious explanation for this shared environment paradox is the presence of gene-environment interactions (GxE). They argue that GxE would lead monozygotic twins to respond more similarly to shared environmental exposures than dizygotic twins and that GxE should therefore result in a substantial proportion of the shared environmental influences being wrongly attributed to genetic factors, causing an inflation of the heritability estimate instead24.
From a biological perspective, GxE can be seen as the process by which environmental influences are moderated by genetic factors (or vice versa). GxE has been speculated to play an integral role in the aetiology of psychiatric phenotypes for a long, as it provides an explanation for why some develop psychiatric symptoms after particular risk exposures while others do not24,27,28,29,30. Though neuroticism has traditionally been viewed as a relatively stable trait, a more dynamic aetiology has been proposed whereby it is continuously influenced by ongoing gene-environment interactions throughout the life span31.
To date, however, there have been few truly genome-wide GxE studies (GWEIS) of psychiatric phenotypes, and the majority of molecular GxE research has been limited to candidate genes29,32,33,34,35,36. It is only quite recently that the available data and computational resources have begun to allow for the conduction of GWEIS, but as interactions may require larger sample sizes to detect effects of similar magnitude as main effects, sample size requirements may be even greater for GWEIS than for GWAS37,38.
To overcome this, some have reduced the multiple testing burden by pre-selecting variants based on main effects from GWAS39,40. While these two-stage approaches could potentially yield more significant SNPs, individual SNP effects are unlikely to yield insight into the higher-order biological mechanisms underlying GxE (as is the case for GWAS41), and the lack of genome-wide GxE data limits the opportunity for follow-up analyses such as gene-set analysis, which could elucidate the function of GxE effects42. In addition, since interacting SNPs may not display strong main effects, this approach could also lead to potential key interactions going undetected40,43. Another option may be to model interactions of individual variants with multiple environments simultaneously44, though this is also at the cost of environmental specificity which could complicate the interpretation of any functional follow-up analysis.
Alternatively, global GxE effects across the entire genome may be investigated by estimating the proportion of variance explained by GxE effects45, or by modelling interactions with polygenic risk scores constructed using SNP main effects from GWAS46,47,48,49. But while such approaches may indicate the presence of GxE, they cannot determine which SNPs or genes are driving the interactions. For the purposes of gaining relevant biological information from the GxE analyses, we, therefore, considered GWEIS to be the most suitable approach.
Beyond issues with power, GWEIS requires particular consideration regarding the control of error rate inflation, as it is particularly vulnerable to the effects of heteroscedastic residuals50. While this can be resolved with the use of heteroscedasticity consistent, or so-called robust, standard errors51,52, these are not currently available in software optimised for large-scale genetic analysis like PLINK53, and researchers have had to implement this them themselves34,54. Interaction effects may also be confounded by covariate-SNP and covariate-environment interaction effects unless these are accounted for55, but doing so can dramatically increase the number of variables analysed and add further computational constraints to this already intensive analysis.
To our knowledge, there have only been three GWEIS of psychiatric phenotypes to date, all of which have focused on depressive symptoms and used some composite measure of stressful life events as environment32,33,34. These studies have found few significant interactions, though only one of these studies featured a sample size close to 100,000 individuals (the rest fewer than 10,000). As such, it is evident that there is a substantial gap in the available genome-wide evidence for GxE in mental health phenotypes in general, including neuroticism for which there are currently none.
To address this, we used data from the UK Biobank56 to perform a series of GWEIS for neuroticism, with a total of 25 broadly defined environmental variables (Nā=ā84,711ā313,339; Table 1). While ensuring proper control for inflation and confounding as mentioned above, we first explored SNP-environment interactions between all 25 environmental variables and a total of 8,614,007 SNPs genome-wide.
Given that conceptually meaningful interaction effects may not be evident on the level of individual SNPs, whose effects are likely small in magnitude, we sought to elucidate relevant biological mechanisms that might govern GxE by testing whether single SNP-environment interaction effects were over-represented within particular genes, tissues, or gene-sets. We also evaluated the predictive ability of SNP-interaction effects across the genome by constructing interaction-based polygenic scores (iPRSGxE) for each environment and used these to predict neuroticism in an independent subset of the UKB sample (Nā=ā10,000).
In parallel with all interaction-based analyses, we performed a traditional neuroticism GWAS in the same sample to evaluate the concordance between the top interaction effects and corresponding main effects, as well as to allow the predictive power of the iPRSs to be contrasted to that of a traditional main effect PRS constructed from the GWAS results (see Fig. 1 for an overview of the analysis workflow).
The selection of environmental variables was based on the epidemiological literature, and consisted primarily of variables relating to trauma and social support, but also to physical health, socioeconomic status, cognitive function, sleep, and substance use (Table 1; see āMethodsā section). While some of these environmental variables are not traditionally seen as āenvironmentsā (such as cognitive function, insomnia, BMI), we decided to include these here anyway as they have often been highlighted as risk factors in epidemiological studies in the past57,58,59. Given that many of these environments are themselves heritable, it is thus possible that some interactions we observe could reflect gene-gene interactions (GxG) rather than pure GxE (see Plomin et al.60 and Vinkhuyzen et al.61 for discussions about the heritability of the environment). Though our rationale for including these is that any potential interactions, be it GxE or GxG, may nonetheless highlight relevant biological mechanisms that contribute to neuroticism. For reasons of convenience, we chose to retain the general term āGxEā throughout the paper, but acknowledge that the term āGene x Traitā interaction is more suitable.
We note that although a potential correlation between the genetic influences on the environment and the outcome phenotype has been a cause for concern for the estimation of GxE in twin studies62, we would not expect this to lead to spurious detection of interaction effects in a GWEIS setting, since the linear regression model allows the SNP and environment main effects to be modelled simultaneously, and can thus account for any correlation that exists between these, as well as with the interaction term.
Results
Interacting SNPs implicated by GWEIS
Due to the risk of inflation of the GWEIS test statistics that were mentioned previously50,54, we analysed SNP-environment interactions in a linear regression framework in R, computing t-statistics for the interaction coefficients using robust standard errors in the form of the Huber-White sandwich estimator51,52 (see āMethodsā section for more detail, and Suppl. Info (A): āHeteroscedasticity and Spurious Inflation of GWEIS Test Statisticsā for comparison with traditional, model-based standard errors). In order to account for any potentially confounding covariate interactions55, we also included covariate-SNP and covariate-environment interaction effects in the model in addition to covariate main effects (see āMethodsā section).
We analysed the single SNP-by-environment interactions between each of the 25 environments (Nā=ā84,711ā313,339; Table 1) and a total of 8,614,007 SNPs (minor allele frequency >0.01; imputation quality >0.9; see āMethodsā section), from which we identified 8 independent SNPs (r2ā<ā0.8) for 7 environments that showed interaction effects at the standard genome-wide significance threshold of pā<ā5eā8 (Table 2; Suppl. Figures). Of these, one intergenic SNP on chromosome 6 remained significant after applying further Bonferroni correction for the number of environments analysed (pā<ā5eā8/25ā=ā2eā9; rs115385310, āfelt hated as a childā); an SNP which was also suggestively significant for āchildhood physical abuseā (pā=ā6.93eā7).
These results are in stark contrast to a traditional GWAS on neuroticism performed using the same main effect covariates as the GWEISs (see āMethodsā section), for which 103 independent significant SNPs were detected. For the 8 SNPs that showed significant interactions at the standard genome-wide significance threshold (pā<ā5eā8), we did not find any evidence of a significant main effect in the GWAS (pā>ā0.05; Table 2).
Genes implicated by SNP-environment interactions
To facilitate functional interpretation of GWEIS results, we sought to determine whether SNP-environment interaction effects across the genome tended to congregate within particular genes. Although any direction of effect is inevitably lost when aggregating the effects from multiple SNPs, this analysis nonetheless provides information about whether variants in certain genes could moderate the effect of specific environmental exposures on the phenotype.
We thus performed 25 gene-based tests for 19,831 protein-coding genes in MAGMA using the interaction p-values from the GWEISs as input (see āMethodsā section). From these analyses, we found a total of 10 genes from 7 environments that reached standard genome-wide significance, correcting only for the number of genes analysed (pā<ā2.52eā6 (0.05/19,831); Table 3; Suppl. Tables 1aā1y); though none survived further correction for the number of environments (pā<ā2.52eā6/25).
Similar to the SNP-level results, the concordance between suggestive main and interaction effects on the gene-level was low, and only one gene (FHIT, āsexual assaultā) reached suggestive significance in the main effects gene analysis for neuroticism (pā=ā1eā4; Table 3; Suppl. Table 1z).
Gene sets enriched by the most interactable genes
In order to determine whether the most strongly associated genes for any environment (including sub-significant ones) tended to be overrepresented within particular pathways, cellular locations, or implicated in particular tissue-specific gene expression patterns, we performed competitive gene-set and gene-property analyses in MAGMA using the results from the 25 GWEIS-based gene analyses as input. These analyses concerned 7426 gene sets (MSigDB) and 53 tissues (GTEx; see āMethodsā section).
At a p-value threshold of 6.85eā6 (.05/(7246ā+ā53)), 12 gene sets from 7 environments were significant (Table 4), but no tissues (Suppl. Tables 2aā2y and 3aā3y). Of these 12 gene sets, two survived the additional correction for the number of environments analysed (pā<ā6.85e-6/25): ānucleotide transmembrane transporter activityā (terminal illness), and āglucose bindingā (insomnia).
Again, none of the interacting gene-sets showed evidence of a significant, or even suggestively significant, the main effect in neuroticism (Table 4; Suppl. Tables 2z and 3z).
Interaction-based polygenic risk scores (iPRS)
To evaluate the predictive accuracy of our SNP-environment interactions, we constructed interaction-based polygenic risk scores (iPRSGxE)ātaking the sum of effect alleles weighted by the interaction beta and the environmentāand used these to model neuroticism in an independent subset of the UKB sample (Nā=ā10,000; see āMethodsā section). An alternative to GWEIS mentioned earlier is to model the interaction between a traditional main effect PRS from GWAS and an environmental variable of interest (henceforth: iPRSG). Since the iPRSG is more widely accessible than the iPRSGxE (as it does not require existing GWEISs), we also computed iPRSGs for each environment as a comparison.
The variance explained by each iPRS was evaluated by comparing the fit between a full model containing the iPRS and covariates to that of a covariate only model (here, the environment main effect and the main effect PRS were included in addition to the standard covariates used for the GWEIS/GWAS, as well as the interactions between these and the standard covariates; see āMethodsā section). This was done using the anova() function in R.
For any of the 25 environments, neither the iPRSGxE nor the iPRSG provided a significant increase in model fit (pā<ā0.05/25/2; see āMethodsā section) above that of the covariates only model, with the attributable variance reaching a maximum of .04% for the iPRSGxEs and .03% for the iPRSGs (Fig. 2). This contrasts with the traditional main effect PRS, which explained 2.06% of the variance in neuroticism beyond standard covariates (pā=ā6.31eā49; see āMethodsā section).
We, therefore, conclude that based on the environments and sample population analysed here, there is currently limited evidence that genome-wide GxE effects in the form of iPRSs can improve prediction accuracy in neuroticism beyond what can already be achieved using SNP and environment main effects.
Discussion
In this study, we have investigated genome-wide gene-environment interactions in neuroticism across a total of 25 different environmental variables previously associated with mental health outcomes. From all SNP, gene, and gene-set based analyses, we detected one SNP (rs115385310 for āfelt hated as a childā) and two gene-sets (āglucose bindingā for āinsomniaā and ānucleotide transmembrane transportā for āterminal illnessā) that survived Bonferroni-correction for the number of environments analysed.
Although multiple interactions were found at standard genome-wide significance thresholds (i.e., not correcting for the number of environments), they were substantially fewer than that detected in a traditional GWAS on neuroticism, in which we identified just over 100 independent significant SNPs. This is in line with the notion that the power to detect interactions is lower than that of main effects, and suggests that even larger data sets will be required before we can uncover a more considerable fraction of relevant interactions. The lack of predictive value for interaction-based polygenic risk scores (iPRSs) echoed this further.
A GWEIS analysis will naturally suffer more from an increased multiple testing burden compared to, for example, two-stage GxE approaches which pre-select genetic variants based on their observed main effects. In this study, however, we found that none of the interacting SNPs identified at standard genome-wide significance thresholds (i.e., uncorrected for the number of environments) showed any evidence of even suggestive main effects in the GWASāthe same was largely true the gene and gene-set level resultsāimplying that preselection based on main effects could result in key interactions being overlooked. In addition, as individual SNP interaction effects might themselves not yield notable insight into the biological mechanisms that govern GxE (as is typically the case with single SNP analyses41), the genome-wide nature of GWEIS is vital as it allows for follow-up analyses, such as gene-set analysis, which can elucidate the function of GxE effects.
Although the multiple testing burden was further exacerbated by the analysis of multiple environments here, we argue that this approach could enable the identification of common patterns across environments and further strengthen the evidence for any particular gene or pathway (particularly when restricted to environments already thought to be implicated). While a systematic investigation of shared GxE effects was not conducted here due to the lack of power even when not correcting for the environments, we hope that our results may prove useful for researchers conducting similar studies in the future, for example, as a basis for replication or meta-analysis. As increased sample sizes lead to the detection of more reliable SNP-environment interactions, we expect that results from GWEIS and related functional follow-up analyses will become valuable for our understanding of the biological mechanisms that underlie GxE.
In this study, we selected neuroticism as our phenotype of interest due to its significant public health impact7,8 and widespread links with several clinical psychiatric disorders3,4,5,6. Although evidence suggests that neuroticism is more dynamic than traditionally thought31, as a personality trait, it may nonetheless be more stable than some clinical phenotypes, such as depression or alcohol use disorder63,64,65,66, and could also be comparatively less sensitive to GxE. In addition, it should be noted that the UKB sample analysed here consists of a relatively older population, and since the influence of GxE may be more pronounced at an earlier stage of development67, the age of this sample might have affected our power to detect certain GxE effects.
Here, we chose to model the relationships between all variables as linear (and thus, treating ordinal environments as continuous), but there is a possibility that some interactions may have a more complex, non-linear form. For instance, for the ordinal environmental variable āphysical assaultā we now assumed that having been assaulted recently versus in the past results in a similar change in the SNP effect as having been assaulted in the past compared to never. While this may not be a fully accurate representation of the data, we expected that the increased multiple testing resulting from analysing the levels of each ordinal variable separately would nevertheless have had a more severe impact on power.
Finally, we wish to reiterate a key limitation regarding the interpretation of our results in the context of heritable environments. The environmental component in GxE is sometimes seen as an independent force that regulates the penetrance of genetic effects (or vice versa), while in practice, any environmental measure obtained in a cross-sectional design is unlikely to be free from genetic influence60,61. Although there have been efforts to distinguish GxG from GxE in the twin modelling literature62,68, doing so in this setting is not uncomplicated, and simply conditioning on heritable components could induce collider bias69.
In this study, we chose to be particularly lenient with what we considered āenvironmentā in favour of covering as broad a range of relevant variables as possible. Based on these results alone, it is therefore not possible to determine whether any interaction detected here represents one with the environmental components directly (GxE) or with some heritable component thereof (GxG). If well-powered, however, we argue that GWEISs of heritable environments are still useful as they could elucidate important sources of aetiological heterogeneity which can be followed up in greater depth using experimental or more controlled observational designs in the future.
Representing the largest effort of its kind to date, we used a total of 25 environmental variables to investigate gene-environment interactions in neuroticism. Although power is low compared to GWAS, we detected one variant and two gene sets that showed significant interaction after correction for the number of environments analysed. Larger sample sizes are, however, needed to obtain more reliable estimates of relevant SNP-environment interaction effects, which will be required in order to understand the molecular mechanisms that govern gene-environment interactions in neuroticism.
Methods
Genotype data and quality control
All genotype and phenotype data were obtained from the UK Biobank56 (release 3, March 2018), and this study was conducted under the UK Biobank application 16406. Data collection, primary quality control, and imputation of the genotype data were performed by the UK Biobank itself, the full details of which have been described elsewhere70. We applied further quality control in order to ensure the inclusion only of high-quality variants. This entailed filtering SNPs with a minimum info score of .9 (HRC panel imputed), maximum missingness of 5%, and a minor allele frequency of at least 1%, resulting in a total of 8,614,007 SNPs for the analysis.
We used only European, unrelated samples with concordant sex (see Suppl. Info (A): UK Biobank Sample Information and Quality Control). Thirty principal components (computed with FlashPCA71) were included as covariates in all analyses to control for population stratification. To ensure that the selection of SNPs remained constant across environments, quality control and filtering were performed on the full subset of individuals with complete neuroticism data (see below), and it is, therefore, possible that exact minor allele frequencies and call rates may vary slightly between the sample subsets for each environment.
Phenotype
Neuroticism was measured using the Eysenck Personality Questionnaire (Revised Short Form72), which contains 12 dichotomous items asking participants to indicate whether they agree with statements such as āDo you worry too long after an embarrassing experience?ā, or āDo you ever feel ājust miserable' for no reason?ā. An individualās level of neuroticism was quantified as the sum of items with which they agreed, ranging from 0 and 12. We included only individuals who had provided complete responses to all items (thus performing no imputation of missing values), resulting in 313,467 samples. To ensure that neuroticism and each environment had been measured simultaneously, we used data collected from the first visit only.
Environmental factors
We considered broadly as āenvironmentā a wide range of variables available from the UKB Biobank that have been associated with neuroticism and related mental health phenotypes in the literature. This included primarily those relating to trauma exposure16,17,18,19,20 and social support21,22,23, but also socioeconomic deprivation73,74, education75 and cognitive ability57,76, substance use77,78, sleep58,79,80, and physical health (overweight/obesity59,81, physical disability82,83, chronic pain84,85). We gathered all available variables that related to any of these categories, limiting the final selection to a subset of 25. We selected variables as such that there was at least one variable from each category, then giving preference to those with larger total sample sizes and less skew in relation to the remaining variables. Given their central role in the literature, we prioritised a wider selection of items related to trauma and social support but sought to include at least one item related to all other domains. Here, we refer to these variables as āenvironmentsā as that is their role in the current analyses while acknowledging that many of the selected environments have a (sometimes considerable) heritable component.
The majority of environments were ordinal, consisting of responses such as ānever trueā to āvery often trueā, or ānever or almost neverā to āalmost dailyā (see Table 1). There were two categorical environments that allowed endorsement of multiple answer options: āsocial activitiesā and āmultiple stressā, which we converted to sum scores representing the number of endorsed options. āChronic painā was constructed using a collection of pain items that indicated whether participants had experienced pain in multiple regions for three months or more (category ID: 100048). Scores on this variable reflect the sum of regions in which participants experienced pain for 3+ months, with a maximum score of 3. Indicating no pain or pain for less than 3 months in any number of regions gave a score of 0. Indicating chronic pain in one region gave a score of 1, in two regions a score of 2, and indicating pain all over the body, or pain in three or more regions for 3+ months, gave a score of 3. The reason for this truncation was to allow the inclusion of pain all over the body without making strong assumptions about the severity compared to multiples of separate areas.
To ensure that neuroticism and all environmental measures were measured at the same time point, we analysed data from the first visit only. All environments were analysed as continuous, and as with neuroticism, we performed no imputation of missing responses for any of the environments.
GWEIS
SNP-environment interactions were analysed in a linear interaction model in R (v3.2.1). As have been shown previously50,54, GWEIS test statistics are particularly susceptible to spurious inflation of test statistics due to heteroscedasticity of the residuals. To deal with this, we relied on Huber-White estimated standard errors, also known as a sandwich estimator. Unlike model-based standard errors, which are computed using a single residual variance term for all observations, the sandwich estimator allows a unique residual variances term across observations, approximated using the squared residuals51,52.
Our script is an adaptation of a PLINK R plugin originally developed by Almli et al.54, which performs a joint test of SNP and SNP-environment interaction effects (https://epstein-software.github.io/robust-joint-interaction). Beyond run-time optimisation, we computed p-values for the gene-environment interaction (rather than the joint test of SNP main and interaction effects, as done initially), and included covariate-SNP and covariate-environment interactions in addition to covariate main effects. As has been shown55, covariate main effects alone do not effectively control for potentially confounding interactions of the covariate with the SNP or the environment, and unless controlled for, such interactions may be captured in the SNP-environment interaction term. We thus implemented the following linear regression model for every SNP and environment:
where Yi represents the phenotype measure for any individual i, Gi the SNP allele count, and Ei the environmental measure. Ci is a kāĆā1 vector of covariates, with k equalling the total number of covariates, and Ļµi the residual, and ā² denotes the transpose. The intercept (Ī²0) and betas for the SNP (Ī²G), environment (Ī²E), and SNP-environment interaction term (Ī²GxE) are all scalars, while the betas for the covariate-environment (Ī²CxE) and covariate-SNP (Ī²CxG) interactions are kāĆā1 vectors. The parameter of interest here is Ī²GxE: the beta for the SNP-environment interaction.
As covariates, we included age, sex, 30 PCs, and all assessment centres with Nā>ā10,000. As recommendations or standards regarding the number of PCs that should be included typically concern main effects analyses, we could not exclude the possibility that potentially more complex confounding effects of ancestry might arise when analysing interactions, and therefore chose a more cautious approach of including as many as 30 PCs.
For the analysis, PLINK formatted genotype data was read into R (v3.2.1) using the read.plink() function from the snpStats package (see the Suppl. Info (B)āAnalysis script for the full R script). As per the snpStats default settings, autosomal SNPs were coded as 0, 1, and 2, representing the homozygous minor, heterozygous, or homozygous major genotypes, respectively. On the X chromosome, male genotypes were coded as 0 and 2, representing single copies of the minor or major alleles.
GWAS
We conducted two GWASs of neuroticism in PLINK v.2.053 using the same set of covariates as in the GWEIS: one using the full neuroticism sample (Nā=ā313,467), done with the purpose of determining whether interacting SNPs, genes, or gene-sets displayed any main effects on neuroticism, and one that excluded a test set of 10,000 individuals done for the purpose of constructing a main effect polygenic risk score.
Gene analyses
To investigate whether SNP-environment interaction signals tended to congregate within genic regions, we performed genome-wide gene analyses with MAGMA (v1.07b)42 using the p-values from the GWEIS as input. Gene locations for 20,260 protein-coding genes were obtained from Ensembl (GRCh37, p13, v96), of which 19,831 contained at least one SNP in our data. To allow the inclusion of nearby, potentially regulatory SNPs, we used windows of 2ākb upstream and 1ākb downstream of the transcription start and stop sites, respectively. For computational efficiency, a random subset of 10,000 individuals from the UKB data set was used as a reference for the estimation of LD.
As an aggregation method for the SNP effects, we employed the āmulti modelā which is a hybrid between the commonly used āmean modelā, which simply averages the SNP effects across the gene, and the ātop modelā, which uses the lowest SNP p-value corrected for gene size. In essence, the āmulti modelā applies both the āmeanā and ātopā models and selects the one with the best fit.
Gene-set and gene property analyses
Competitive gene-set and gene-property analyses were performed for all GWEISs and the GWAS using MAGMA (v1.07b)42. A total of 7246 gene set definitions were obtained from MsigDB (v6.2), including gene ontology (GO) terms, cellular locations, and biological pathways from multiple sources (e.g., KEGG, Reactome, BioCarta). These were analysed in a competitive framework (as is the default in MAGMA), testing whether the average association with genes within a gene set is greater than that of genes outside the gene set, while correcting for LD.
To test for tissue specificity of associated genes, we used the recently implemented conditional gene property analysis in MAGMA. In this framework, any given tissue can be conceptualised as a gene-set, where gene mRNA expression levels represent the continuous gene-set membership for any given gene, with its mean gene expression level across tissues included as a covariate. For this analysis, we used the mean log-transformed gene mRNA expression profiles in 53 different tissues obtained from GTEx (v7).
Polygenic risk scores (PRS)
In order to evaluate the predictive ability of our GWEIS results, we constructed interaction-based polygenic risk scores (iPRSGxE) using the SNP-environment interaction effects from each of the 25 GWEISs. As a comparison, we created alternative iPRSs representing the interactions between a standard main effect PRS and each environment (iPRSG). We evaluated the predictive accuracy of all iPRSs (i.e., the iPRSGxEs and iPRSGs) against that of a traditional main effect PRS.
To obtain GWEIS and GWAS effect sizes for these PRS analyses, we excluded a hold-out sample of 10,000 individuals (to be used for prediction) and re-analysed all main and interaction effects as described previously. For each analysis, we extracted the independent significant SNPs using clumping in PLINK53 (r2ā<ā0.2; 250kb), and used these SNPs to construct PRSs for every individual in the hold-out sample.
The different PRS scores were defined as follows. The standard main effect PRS was computed as \({\mathrm{PRS}}_i = \mathop {\sum}\nolimits_j^k {G_{ij}\beta _j^G}\) for each individual i, with Gij their genotype value for SNP j (k being the number of SNPs used), and \(\beta _j^G\) the GWAS effect size. For environment E, the iPRSG score was computed as \(iPRS_i^G = PRS_i \times E_i\), and the iPRSGxE as \(i{\mathrm{PRS}}_i^{GxE} = \mathop {\sum}\nolimits_j^k {G_{ij}E_i\beta _j^{GxE}}\), with \(\beta _j^{GxE}\) the GWEIS interaction effect size of SNP j.
A possible alternative to how we computed the iPRSGxE here may be to include the SNP main effect from the interaction analyses in the iPRSGxE itself, i.e., \(i{\mathrm{PRS}}_i^{G + GxE} = \mathop {\sum}\nolimits_j^k {G_{ij}E_i\beta _j^{GxE} + G_{ij}\beta _j^G}\). As we are interested in determining the extent to which GxE predicts neuroticism beyond any gene and environment main effects, however, we constructed our \(i{\mathrm{PRS}}_i^{GxE}\) using only the interaction terms, and instead included PRSi as a covariate to account for the genetic main effect.
The PRS scores were constructed using SNPs significant at different p-value thresholds (.001, .05, .1, .2, ā¦, .8, .9, 1). For each PRS score, we then fit a linear regression in the hold-out sample with neuroticism as an outcome, with the PRS score and a set of covariates as predictors. An estimate of the predictive ability of the PRS score was then computed as the difference between the adjusted r2 for this model and the corresponding covariate-only model. Here, we chose to use the adjusted r2, rather than the full r2, as this provides an unbiased estimation of the population explained variance in models with multiple predictors. For the main effect PRS, as well as for both the iPRSGxE and iPRSG for each environment, we selected the PRS based on the p-value threshold for which the predictive ability was greatest.
As covariates, we used the same base covariates as in the GWEIS/GWAS analyses (age, sex, array, and all assessment centres with Nā>ā50). For the traditional PRS, the covariates only model was \(Y_i = \beta _0 + C_i^\prime \beta _C + {\it{\epsilon }}_i\), with Yi the neuroticism score for any one individual i in the holdout sample, Ci the 1āĆāk vector of base covariates (with ā denoting the transpose, and k the number of covariates), Ī²0 the intercept, Ī²C the covariate effect sizes, and Ļµi the residual. The full model including the PRS is then \(Y_i = \beta _0 + C_i^\prime \beta _C + {\mathrm{PRS}}_i\beta _{{\mathrm{PRS}}} + {\it{\epsilon }}_i\), with PRSi representing the main effect PRS for that individual, and Ī²PRS the beta coefficient for the PRS on neuroticism in the hold-out sample.
For the iPRSGxE and iPRSG scores, however, we also included the relevant environment and the main effect PRS as covariates, as well interaction between these and the base covariates (similar to the GWEIS setup). Thus, the covariate only model used for any iPRS with environment E is:
with PRSi and Ei representing the traditional main effect PRS and the environment, respectively (with Ī²PRS and Ī²E their effect on the neuroticism), and \(C_i^\prime {\mathrm{PRS}}_i\) the interaction between the main effect PRS and the covariates (with related effect size Ī²PxC), and \(C_i^\prime E_i\) the covariate-environment interaction (with effect size Ī²ExC). The full model for any iPRS would then also contain the term for the iPRS and its effect on neuroticism in addition to all variables in the null model, i.e.:
The reason why we include the main effect PRS derived from the GWAS as a representation of the SNP main effects, rather than simply a PRS constructed from the SNP main effect from the GWEIS, is because the GWEIS PRSs will have been pruned based on the interaction effects, and will thus underestimate the total amount of variance contributed by SNP main effects across the genome. Since we are specifically interested in how much the iPRSs contribute above and beyond what can be obtained using a simple main effect PRS from GWAS, the SNP main effects as obtained in the GWEIS would not have been appropriate.
Data availability
Summary statistics from all the 25 GWEISs, as well as the neuroticism GWAS can be downloaded from the website of the Department of Complex Trait Genetics, CNCR (http://ctg.cncr.nl). Summary statistics from the gene, gene-set, and gene-property analyses are available in Suppl. Tables 1ā3.
Code availability
The full R script used to perform the GWEIS analyses can be accessed in Suppl. Info (B)āAnalysis script. Analysis scripts used for the PRS, gene-, gene-set, or gene-property analyses are available from the authors upon request.
Change history
13 May 2022
A Correction to this paper has been published: https://doi.org/10.1038/s41398-022-01974-2
08 April 2021
A Correction to this paper has been published: https://doi.org/10.1038/s41398-021-01334-6
References
Eysenck, H. J. Biological basis of personality. Nature 199, 1031ā1034 (1963).
Costa, P. T. & McCrae, R. R. Four ways five factors are basic. Pers. Individ. Dif. 13, 653ā665 (1992).
Ormel, J., Rosmalen, J. & Farmer, A. Neuroticism: a non-informative marker of vulnerability to psychopathology. Soc. Psychiatry Psychiatr. Epidemiol. 39, 906ā912 (2004).
Eysenck, H. J. Neuroticism Anxiety and depression. Psychol. Inq. 2, 75ā76 (1991).
Griffith, J. W. et al. Neuroticism as a common dimension in the internalizing disorders. Psychol. Med. 40, 1125ā1136 (2010).
Malouff, J. M., Thorsteinsson, E. B. & Schutte, N. S. The relationship between the five-factor model of personality and symptoms of clinical disorders: a meta-analysis. J. Psychopathol. Behav. Assess. 27, 101ā114 (2005).
Lahey, B. B. Public health significance of neuroticism. Am. Psychol. 64, 241ā256 (2009).
Cuijpers, P. et al. Economic costs of neuroticism: a population-based study. Arch. Gen. Psychiatry 67, 1086 (2010).
Vukasovic, T. & Bratko, D. Heritability of personality: a meta-analysis of behavior genetic studies. Psychol. Bull. 141, 769ā785 (2015).
Jang, K. L., Livesley, W. J. & Vemon, P. A. Heritability of the big five personality dimensions and their facets: a twin study. J. Pers. 64, 577ā592 (1996).
Jardine, R., Martin, N. G., Henderson, A. S. & Rao, D. C. Genetic covariation between neuroticism and the symptoms of anxiety and depression. Genet. Epidemiol. 1, 89ā107 (1984).
Lake, R. I. E., Eaves, L. J., Maes, H. H. M., Heath, A. C. & Martin, N. G. Further evidence against the environmental transmission of individual differences in Neuroticism from a collaborative study of 45,850 twins and relatives on two continents. Behav. Genet. 30, 223ā233 (2000).
Viken, R. J., Rose, R. J., Kaprio, J. & Koskenvuo, M. A developmental genetic analysis of adult personality: extraversion and neuroticism from 18 to 59 years of age. J. Pers. Soc. Psychol. 66, 722ā730 (1994).
Nagel, M. et al. Meta-analysis of genome-wide association studies for neuroticism in 449,484 individuals identifies novel genetic loci and pathways. Nat. Genet. 50, 920ā927 (2018).
Luciano, M. et al. Association analysis in over 329,000 individuals identifies 116 independent variants influencing neuroticism. Nat. Genet. 50, 6ā11 (2018).
Jaffee, S. R. Child maltreatment and risk for psychopathology in childhood and adulthood. Annu. Rev. Clin. Psychol. 13, 525ā551 (2017).
Allen, B. & Lauterbach, D. Personality characteristics of adult survivors of childhood trauma. J. Trauma. Stress 20, 587ā595 (2007).
Mineka, S. & Zinbarg, R. A contemporary learning theory perspective on the etiology of anxiety disorders: Itās not what you thought it was. Am. Psychol. 61, 10ā26 (2006).
Bunce, S. C., Larson, R. J. & Peterson, C. Life after trauma: personality and daily life experiences of traumatized people. J. Pers. 63, 165ā188 (1995).
Roy, A. Childhood trauma and neuroticism as an adult: possible implication for the development of the common psychiatric disorders and suicidal behaviour. Psychol. Med. 32, 1471ā1474 (2002).
Santini, Z. I., Koyanagi, A., Tyrovolas, S., Mason, C. & Haro, J. M. The association between social relationships and depression: a systematic review. J. Affect. Disord. 175, 53ā65 (2015).
Wang, J., Mann, F., Lloyd-Evans, B., Ma, R. & Johnson, S. Associations between loneliness and perceived social support and outcomes of mental health problems: a systematic review. BMC Psychiatry 18, 156 (2018).
GariĆ©py, G., Honkaniemi, H. & Quesnel-VallĆ©e, A. Social support and protection from depression: systematic review of current findings in western countries. Br. J. Psychiatry 209, 284ā293 (2016).
Uher, R. & Zwicker, A. Etiology in psychiatry: embracing the reality of poly-gene-environmental causation of mental illness. World Psychiatry 16, 121ā129 (2017).
Plomin, R. & Daniels, D. Why are children in the same family so different from one another? Int. J. Epidemiol. 40, 563ā582 (2011).
Turkheimer, E. Three laws of behavior genetics and what they mean. Curr. Dir. Psychol. Sci. 9, 160ā164 (2000).
Rutter, M. Gene-environment interdependence. Dev. Sci. 10, 12ā18 (2007).
Cicchetti, D. Resilience under conditions of extreme stress: a multilevel perspective. World Psychiatry 9, 145 (2010).
Assary, E., Vincent, J. P., Keers, R. & Pluess, M. Gene-environment interaction and psychiatric disorders: review and future directions. Semin. Cell Dev. Biol. 77, 133ā143 (2018).
Uher, R. Gene-environment interactions in severe mental illness. Front. Psychiatry 5, 48 (2014).
Barlow, D. H., Ellard, K. K., Sauer-Zavala, S., Bullis, J. R. & Carl, J. R. The origins of neuroticism. Perspect. Psychol. Sci. 9, 481ā496 (2014).
Dunn, E. C. et al. Genome-Wide Association Study (GWAS) and Genome-Wide Environment Interaction Study (GWEIS) of Depressive Symptoms in African American and Hispanic/Latina Women. Depress Anxiety 33, 265ā280 (2016).
Otowa, T. et al. The first pilot genome-wide gene-environment study of depression in the Japanese population. PLoS ONE 11, e0160823 (2016).
Arnau-Soler, A. et al. Genome-wide by environment interaction studies of depressive symptoms and psychosocial stress in UK Biobank and Generation Scotland. Transl. Psychiatry 9, 1ā13 (2019).
Duncan, L. E. & Keller, M. C. A critical review of the first 10 years of candidate gene-by-environment interaction research in psychiatry. Am. J. Psychiatry 168, 1041ā1049 (2011).
Nugent, N. R., Tyrka, A. R., Carpenter, L. L. & Price, L. H. Gene-environment interactions: early life stress and risk for depressive and anxiety disorders. Psychopharmacology 214, 175ā196 (2011).
McClelland, G. H. & Judd, C. M. Statistical difficulties of detecting interactions and moderator effects. Psychol. Bull. 114, 376ā390 (1993).
Smith, P. G. & Day, N. E. The design of case-control studies: the influence of confounding and interaction effects. Int. J. Epidemiol. 13, 356ā365 (1984).
BĆørglum, A. D. et al. Genome-wide study of association and interaction with maternal cytomegalovirus infection suggests new schizophrenia loci. Mol. Psychiatry 18, 20 (2013).
Winham, S. J. & Biernacka, J. M. Gene-environment interactions in genome-wide association studies- current approaches and new directions. J. Child Psychol. Psychiatry 38, 319ā335 (2013).
Gallagher, M. D. & Chen-Plotkin, A. S. The Post-GWAS era: from association to function. Am. J. Hum. Genet. 102, 717ā730 (2018).
de Leeuw, C. A., Mooij, J. M., Heskes, T. & Posthuma, D. MAGMA: generalized gene-set analysis of GWAS data. PLoS Comput. Biol. 11, e1004219 (2015).
Thomas, D. Geneāenvironment-wide association studies: emerging approaches. Nat. Rev. Genet. 11, 259ā272 (2010).
Moore, R. et al. A linear mixed-model approach to study multivariate geneāenvironment interactions. Nat. Genet. 51, 180ā186 (2019).
Kerin, M. & Marchini, J. Inferring gene-by-environment interactions with a Bayesian whole-genome regression model. Am. J. Hum. Genet. 107, 698ā713 (2020).
Peyrot, W. J. et al. Effect of polygenic risk scores on depression in childhood trauma. Br. J. Psychiatry 205, 113ā119 (2014).
Musliner, K. L. et al. Polygenic risk, stressful life events and depressive symptoms in older adults: a polygenic score analysis. Psychol. Med. 45, 1709ā1720 (2015).
Mullins, N. et al. Polygenic interactions with environmental adversity in the aetiology of major depressive disorder. Psychol. Med. 46, 759ā770 (2016).
Trotta, A. et al. Interplay between Schizophrenia polygenic risk score and childhood adversity in first-presentation psychotic disorder: a pilot study. PLoS ONE 11, e0163319 (2016).
Voorman, A., Lumley, T., McKnight, B. & Rice, K. Behavior of QQ-plots and genomic control in studies of gene-environment interaction. PLoS ONE 6, e19416 (2011).
Hayes, A. F. & Cai, L. Using heteroskedasticity-consistent standard error estimators in OLS regression: an introduction and software implementation. Behav. Res. Methods 39, 709ā722 (2007).
White, H. A heteroskedasticity-consistent covariance matrix estimator and a direct test for heteroskedasticity. Econometrica 48, 817 (1980).
Chang, C. C. et al. Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience 4, 7 (2015).
Almli, L. M. et al. Correcting systematic inflation in genetic association tests that consider interaction effects: application to a genome-wide association study of posttraumatic stress disorder. JAMA Psychiatry 71, 1392ā1399 (2014).
Keller, M. Gene-by-environment interaction studies have not properly controlled for potential confounders: the problem and the (simple) solution. Acc. Chem. Res. 45, 788ā802 (2008).
Sudlow, C. et al. UK biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLoS Med. 12, e1001779 (2015).
Cederblad, M., Dahlin, L., Hagnell, O. & Hansson, K. Intelligence and temperament as protective factors for mental health. A cross-sectional and prospective epidemiological study. Eur. Arch. Psychiatry Clin. Neurosci. 245, 11ā19 (1995).
Hertenstein, E. et al. Insomnia as a predictor of mental disorders: a systematic review and meta-analysis. Sleep. Med. Rev. 43, 96ā105 (2019).
Avila, C. et al. An overview of links between obesity and mental health. Curr. Obes. Rep. 4, 303ā310 (2015).
Plomin, R., DeFries, J. C., Knopik, V. S. & Neiderhiser, J. M. Top 10 replicated findings from behavioral genetics. Perspect. Psychol. Sci. 11, 3ā23 (2016).
Vinkhuyzen, A. A. E., van der Sluis, S., de Geus, E. J. C., Boomsma, D. I. & Posthuma, D. Genetic influences on āenvironmentalā factors. Genes Brain Behav. 9, 276ā287 (2010).
Purcell, S. Variance components models for geneāenvironment interaction in twin analysis. Twin Res. 5, 554ā571 (2002).
Robins, R. W., Fraley, R. C., Roberts, B. W. & Trzesniewski, K. H. A longitudinal study of personality change in young adulthood. J. Pers. 69, 617ā640 (2001).
Baca-Garcia, E. et al. Diagnostic stability of psychiatric disorders in clinical practice. Br. J. Psychiatry 190, 210ā216 (2007).
Seeley, J. R., Farmer, R. F., Kosty, D. B. & Gau, J. M. Prevalence, incidence, recovery, and recurrence of alcohol use disorders from childhood to age 30. Drug Alcohol Depend. 194, 45ā50 (2019).
Prenoveau, J. M. et al. Are anxiety and depression just as stable as personality during late adolescence? Results from a three-year longitudinal latent variable study. J. Abnorm. Psychol. 120, 832ā843 (2011).
Heim, C. & Binder, E. B. Current research trends in early life stress and depression: review of human studies on sensitive periods, gene-environment interactions, and epigenetics. Exp. Neurol. 233, 102ā111 (2012).
Rathouz, P. J., Van Hulle, C. A., Rodgers, J. L., Waldman, I. D. & Lahey, B. B. Specification, testing, and interpretation of gene-by-measured-environment interaction models in the presence of gene-environment correlation. Behav. Genet. 38, 301ā315 (2008).
Day, F. R., Loh, P.-R., Scott, R. A., Ong, K. K. & Perry, J. R. B. A robust example of collider bias in a genetic association study. Am. J. Hum. Genet. 98, 392ā393 (2016).
Bycroft, C. et al. The UK Biobank resource with deep phenotyping and genomic data. Nature 562, 203ā209 (2018).
Abraham, G. & Inouye, M. Fast principal component analysis of large-scale genome-wide data. PLoS ONE 9, e93766 (2014).
Eysenck, S. B. G., Eysenck, H. J. & Barrett, P. A revised version of the psychoticism scale. Pers. Individ. Dif. 6, 21ā29 (1985).
Lorant, V. et al. Socioeconomic inequalities in depression: a meta-analysis. Am. J. Epidemiol. 157, 98ā112 (2003).
Ribeiro, W. S. et al. Income inequality and mental illness-related morbidity and resilience: a systematic review and meta-analysis. Lancet Psychiatry 4, 554ā562 (2017).
Bjelland, I. et al. Does a higher educational level protect against anxiety and depression? The HUNT study. Soc. Sci. Med. 66, 1334ā1345 (2008).
Opitz, P. C., Lee, I. A., Gross, J. J. & Urry, H. L. Fluid cognitive ability is a resource for successful emotion regulation in older and younger adults. Front. Psychol. 5, 609 (2014).
Conway, K. P. et al. Co-occurrence of tobacco product use, substance use, and mental health problems among adults: Findings from Wave 1 (2013-2014) of the Population Assessment of Tobacco and Health (PATH) Study. Drug Alcohol Depend. 177, 104ā111 (2017).
Saban, A. et al. The association between substance use and common mental disorders in young adults: results from the South African Stress and Health (SASH) Survey. Pan Afr. Med. J. 17, 11ā18 (2014).
Biddle, D. J., Kelly, P. J., Hermens, D. F. & Glozier, N. The association of insomnia with future mental illness: is it just residual symptoms? Sleep. Heal. 4, 352ā359 (2018).
Pigeon, W. R., Bishop, T. M. & Krueger, K. M. Insomnia as a precipitating factor in new onset mental illness: a systematic review of recent findings. Curr. Psychiatry Rep. 19, 44 (2017).
Rajan, T. M. & Menon, V. Psychiatric disorders and obesity: a review of association studies. J. Postgrad. Med. 63, 182ā190 (2017).
Turner, R. J., Lloyd, D. A. & Taylor, J. Physical disability and mental health: an epidemiology of psychiatric and substance disorders. Rehabil. Psychol. 51, 214ā223 (2006).
Turner, R. J. & Noh, S. Physical disability and depression: a longitudinal analysis. J. Health Soc. Behav. 29, 23ā37 (1988).
Fishbain, D. A., Cutler, R., Rosomoff, H. L. & Rosomoff, R. S. Chronic pain-associated depression: antecedent or consequence of chronic pain? A review. Clin. J. Pain. 13, 116ā137 (1997).
Charles, S., Carayannopoulos, A. G. & Pathak, S. Anxiety and depression in patients with chronic pain. In Deerās Treatment of Pain (eds Deer, T. R., Pope, J. E., Lamer, T. J. & Provenzano, D.) 125ā129 (Springer International Publishing, 2019).
Acknowledgements
This work was funded by COSYN (Comorbidity and Synapse Biology in Clinically Overlapping Psychiatric Disorders: Horizon 2020 Program of the European Union under RIA grant agreement 667301 to D.P.), a European Research Council advanced grant (Grant no, ERC-2018-AdG GWAS2FUNC 834057 [to DP]), and the Netherlands Organization for Scientific Research (NWO: VICI 435-14-005). The analyses were carried out on the Genetic Cluster Computer, which is financed by the Netherlands Organization for Scientific Research (NWO: 480-05-003), by the VU University (Amsterdam, The Netherlands) and the Dutch Brain Foundation, hosted by the Dutch National Computing and Networking Services SurfSARA. This research was conducted using the UK Biobank Resource (application number 16406), and we would like to thank all participants who consented to participate in this research, as well as the researchers involved in the collection of the data.
Author information
Authors and Affiliations
Contributions
D.P. conceived of the study. J.W. performed the analyses and drafted the manuscript. J.W., S.v.d.S., D.P., and C.d.L. participated in the interpretation of the results and editing of the manuscript. All authors provided meaningful contributions at all stages of the project.
Corresponding authors
Ethics declarations
Conflict of interest
The authors declare no competing interests.
Additional information
Publisherās note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the articleās Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the articleās Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Werme, J., van der Sluis, S., Posthuma, D. et al. Genome-wide gene-environment interactions in neuroticism: an exploratory study across 25 environments. Transl Psychiatry 11, 180 (2021). https://doi.org/10.1038/s41398-021-01288-9
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41398-021-01288-9