Major depression is the leading contributor to the global burden of disease1, due to its high prevalence2, disabling consequences2 and partial treatment response3. Major depression is heritable (h2 = 37%)4 and recent genome-wide association studies (GWAS) by Wray et al.5 and Howard et al.6 for the Psychiatric Genomics Consortium (PGC) have identified 44 and 102 risk-associated genetic variants, respectively. Although each single genetic variant contributes very little to disease liability, the genetic risk scores based on the additive effect of common genetic variants over the whole genome, i.e. polygenic risk scores (PRS), can account for a significant proportion of phenotypic variance7. The latest GWAS of depression now provides the ability to more precisely estimate polygenic risk of depression in independent samples6 and thereby identify traits whose genetic architecture is shared with major depression using PRS.

Major depression is phenotypically correlated with many behaviours, brain structure and function measures, cognitive domains and physical conditions8,9,10,11,12,13. It is important to investigate the associations between the genetic predisposition to major depression and these phenotypes, to help identify shared causal risk factors, mechanisms and the causal consequences of major depression14. Until recently, however, this approach has received relatively little attention owing to a lack of data resources with the appropriate scale and coverage of genetic, behavioural and neuroimaging traits to test for these associations with sufficient statistical power15,16,17.

A phenome-wide association study (PheWAS) aims to identify multiple phenotypes associated with a single genetic risk score or genotype. Unlike studies that examine associations between a single trait and genetic risk scores, PheWAS are less constrained by prior assumptions. This is particularly important in situations where we currently have an incomplete understanding of disease mechanisms. Genotype-based PheWAS approaches also have the considerable advantage that they are based on robust biological knowledge that is fixed from birth and therefore less susceptible to confounding and reverse causality.

The present study uses large data sets for both depression-PRS generation and a wide range of phenotypes, including neuroimaging. Depression-PRS are generated using summary statistics from the most recent meta-analysis combining the PGC, UK Biobank and 23andMe (N = 0.8 million)6. A PheWAS approach is used to estimate the effect and significance of associations between depression-PRS and other behavioural, cognitive and neuroimaging traits. A PheWAS is conducted on the latest neuroimaging data releases from the UK Biobank imaging project18 that includes a discovery sample of 10,674 people, and a replication sample of 11,214 people (21,888 individuals in total). The UK Biobank imaging project is a large-scale data set containing both genetic and cross-modality neuroimaging data. A total of 77 traits are associated with polygenic risk of depression both in discovery and replication samples. Where depression-PRS are associated with neuroimaging phenotypes, we additionally test whether this is a potential causal consequence of depression or, conversely, whether neuroimaging measures have a causal effect on depression, using Mendelian Randomisation (MR) and structural equational modelling (SEM). The findings suggest that variation in white matter microstructure is a consequence of depression. We also test for the presence of gene-by-environment interactions using measures of early-life risk factors and sociodemographic variables available in UK Biobank19,20, which show larger effects of polygenic risk of depression on psychiatric conditions when participants are exposed to adverse environments.



We found that 100 phenotypes (67 behavioural and 33 neuroimaging) out of 552 examined (209 behavioural and 343 neuroimaging) in the discovery sample showed significant associations with depression-PRS at a minimum of four p thresholds after FDR (false discovery rate) correction for multiple comparisons (absolute β: 0.014–0.341, β are standardised regression coefficients throughout, pFDR for linear regression: 0.050–3.61 × 10−31). There were 37 phenotypes that remained significant after Bonferroni correction. However, due to correlation between the phenotypes tested, Bonferroni correction is likely to be overly conservative. Thresholds for both FDR and Bonferroni corrections are shown in Supplementary Fig. 1.

Overall results for depression-PRS of representative p thresholds of 1 and 0.01 are presented in Fig. 1. These two thresholds were selected since pT < 1 and pT < 0.01 showed the largest effect sizes in behavioural traits and neuroimaging phenotypes, respectively (see Fig. 2 and Supplementary Fig. 1). Results for other thresholds can be found in Supplementary Fig. 1, Supplementary Table 1 and Supplementary Data 1-2.

Fig. 1: Significance plot for all phenotypes for depression-PRS at p threshold (pT) < 1 and pT < 0.01.
figure 1

The x axes represent phenotypes, and the y axes represent the −log10 of uncorrected p values of two-sided test for linear regression between depression-PRS and each of the phenotype. Each dot represents one phenotype, and the colours indicate their according categories. The dashed lines indicate the threshold to survive FDR correction. FDR correction was applied over all the traits and all depression-PRS (see “Methods”). From left to right on the x axis, categories were shown by the sequence of mental health measure, sociodemographics, early-life risk factor, lifestyle measure, physical measure, cognitive ability, intracranial/subcortical volume, white matter microstructure, white matter hyperintensity, resting-state functional connectivity and resting-state fluctuation amplitude. Representative top findings are annotated in the figure. SN salience network, ECN executive control network, SMN sensorimotor network, FA fractional anisotropy, MD (for white matter microstructure) mean diffusivity, ICVF intra-cellular volume fraction, AF association fibres, FMi forceps minor, SLF superior longitudinal fasciculus, CGpC cingulate gyrus part of cingulum.

Fig. 2: Heatmap for the traits that were significantly associated with depression-PRS.
figure 2

The shown traits were significantly associated with depression-PRS at a minimum of four p thresholds for depression-PRS. Shades of cells indicate the standardised effect sizes (β) for the linear regression between depression-PRS and each phenotype. A larger effect size was shown by a darker colour. Cells with an asterisk were significant after FDR correction. Descriptions for the variables in detail can be found in Table 1, Supplementary Table 1 and Supplementary Data 1.

All of the 100 variables showed an identical direction of effect in the replication sample (Fig. 3, Supplementary Figs. 2 and 3 and Supplementary Data 2 and 3). After multiple comparison correction, 77 traits showed associations with depression-PRS at a minimum of four p thresholds in the replication sample (51 behavioural and 26 neuroimaging). Within these traits, 23 remained significant after Bonferroni correction. In total, 77% findings were replicated; the highest replication rates were seen for white matter microstructure (92.3%), mental health variables (81.3%) and physical measures (76%), see Supplementary Figs. 2 and 3. There was no significant interaction between magnetic resonance imaging (MRI) site and depression-PRS on any of the traits (pcor > 0.431, see Supplementary Fig. 4, Supplementary Data 4). Results for meta-analysis combining the two samples can be found in Supplementary Figs. 5 and 6 and Supplementary Data 5.

Fig. 3: Results for replication analysis.
figure 3

a Comparisons of effect sizes for the discovery and replication samples. The x axes represent the mean standard effect size across depression-PRS at all eight p thresholds for generating depression-PRS (pT). Colours for the bars indicate their categories (from top to bottom: mental health measure, sociodemographics, lifestyle measure, physical measure, intracranial/subcortical volume, white matter microstructure, white matter hyperintensity, resting-state functional connectivity, and resting-state fluctuation amplitude). b Significance plot for the replication analysis on representative depression-PRS at pT < 1 and pT < 0.01, in accordance with Fig. 1. The x axes represent phenotypes, and the y axes represent the −log10 of uncorrected p values of two-sided test for linear regression between depression-PRS and each of the phenotype. Each dot represents one phenotype, and the colours indicate their according categories. The yellow dashed lines indicate the threshold to survive FDR-correction. FDR-correction was applied over all the traits and all depression-PRS (see “Methods”). The pink and red dashed lines indicate the threshold to survive Bonferroni correction and nominally significant threshold. Top hits shown in the discovery sample (Fig. 1) are annotated in the figure. Explanations for the abbreviations can be found in the legend of Fig. 1.

Significant associations that were found in both the discovery and replication data sets are reported below; the βs provided are from the discovery analysis. A complete list of all results is presented in Supplementary Data 2 and 3.

For the associations between depression-PRS and definitions for depression and symptomology, higher depression-PRS were associated with the presence of depression based on all three definitions, including broad depression (β: 0.154–0.300, pFDR for linear regression: 3.93 × 10−9–3.61 × 10−31), probable depression (β: 0.174–0.341, pFDR for linear regression: 1.14 × 10−6–1.52 × 10−23), and Composite International Diagnostic Interview (CIDI) depression (β: 0.121–0.261, pFDR for linear regression: 3.08 × 10−4–1.18 × 10−17). Significant associations were also found between depression-PRS and depressive symptoms, assessed by PHQ-4 (Patient Health Questionnaire) and CIDI questionnaires, and other self-reported psychological traits, including self-harm, subjective well-being, reported feeling of not worth living and neuroticism (absolute β: 0.027–0.339, pFDR for linear regression: 0.045–8.84 × 10−30).

Associations were found between depression-PRS and white matter microstructure. Higher depression-PRS were in general associated with decreased white matter microstructural integrity. First, by looking at the classic microstructural measures of fractional anisotropy (FA) and mean diffusivity (MD): globally lower FA and higher MD (absolute β: 0.027–0.038, pFDR for linear regression: 0.037–6.90 × 10−4) were associated with higher depression-PRS. Lower microstructural integrity was also found in the general measures of FA and MD for two subsets of white matter tracts, the association fibres (absolute β: 0.029–0.040, pFDR for linear regression: 0.024–6.05 × 10−4) and thalamic radiations (absolute β: 0.025–0.036, pFDR for linear regression: 0.036–2.70 × 10−3). For each individual tract (Figs. 2 and 4), higher depression-PRS were associated with decreased FA in inferior fronto-occipital fasciculus, inferior longitudinal fasciculus, posterior thalamic radiation and superior longitudinal fasciculus (SLF) (β: −0.025 to −0.032, pFDR for linear regression: 0.050–5.78 × 10−3) and increased MD in anterior thalamic radiation (ATR), cingulate gyrus part of cingulum, inferior fronto-occipital fasciculus, SLF and superior thalamic radiation (β: 0.023–0.042, pFDR for linear regression: 0.040–6.95 × 10−5). All associations found for neurite orientation dispersion and density imaging measures were in intra-cellular volume fraction (ICVF; an index reflecting neurite density, β: −0.025 to −0.044, pFDR for linear regression: 0.047–1.74 × 10−4). General variance of ICVF in the association fibre subset was negatively associated with depression-PRS (β: −0.028 to −0.039, pFDR for linear regression: 0.036–1.13 × 10−3). For tracts, lower ICVF was correlated with higher depression-PRS in similar regions found for FA and MD, in acoustic radiation, cingulate gyrus part of cingulum, inferior fronto-occipital fasciculus, SLF and uncinate fasciculus (β: −0.025 to −0.043, pFDR for linear regression: 0.043–2.10 × 10−4).

Fig. 4: Maps for the significant associations between depression-PRS and brain phenotypes.
figure 4

ac are the brain maps for the significant associations between depression-PRS and white matter microstructure in fractional anisotropy (FA; a), mean diffusivity (MD; b) and intra-cellular volume fraction (ICVF; c) of major tracts. The shade for each tract represents the standardised effect size (β), with a darker shade showing a greater mean β across all depression-PRS at different p thresholds (pT). From left to right are from anterior, superior and right view. For clarity, among the tracts presented in Fig. 2, the ones that showed consistent associations across at least four depression-PRS pT are presented. d shows the brain maps for regions involved in significant associations between resting-state fluctuation amplitude and depression-PRS. Regions that show consistent associations across at a minimum of four depression-PRS p thresholds are presented. Visualisation of results is achieved by calculating the average intensity of ICA maps, weighted by their mean β across the pT. For clarity, the brain maps shown below have a threshold applied on (intensity over 50% of the highest global intensity).

Depression-PRS were also found associated with resting-state fluctuation amplitude. Associations were found between depression-PRS and resting-state fluctuation amplitude of low-frequency signal (β: 0.027–0.043, pFDR: 0.037–2.03 × 10−4) in the discovery sample (Figs. 2 and 4). A full list of report is presented in Supplementary Data 2.

In brief, higher depression-PRS were associated with lower fluctuation amplitude in anterior cingulate gyrus (peak coordination: −10, 54, 2; cluster size: 7065), bilateral postcentral gyrus (peak coordination: −44, −30, 46 and 44, −24, 40 for left and right hemispheres, respectively; cluster sizes: 2781 and 1619), bilateral insula (peak coordination: −38, −4, 16 and 30, 18, −16 for left and right hemispheres, respectively; cluster sizes: 963 and 308), bilateral orbital part of inferior frontal gyrus (peak coordination: −34, 34, −12 and 32, 36, −10 for left and right hemispheres, respectively; cluster sizes: 154 and 171) and left superior frontal lobe (peak coordination: −18, 32, 38; cluster size: 124). These regions are largely contained within the salience, executive control and sensorimotor networks (Supplementary Table 2)15,21.

Finally, depression-PRS were found associated with sleep problems, smoking and poor physical health. In the category of lifestyle measures, reporting of sleep problems (e.g. too much sleep or insomnia) (absolute β: 0.034–0.180, pFDR for linear regression: 0.043–8.26 × 10−9) and smoking behaviours (absolute β: 0.044–0.105, pFDR for linear regression: 2.28 × 10−3–3.74 × 10−8) were found to be significantly positively associated with depression-PRS.

Physical health items associated with depression-PRS can be summarised as the following four categories: (1) self-reported overall health rating and conditions of long-standing illnesses (absolute β: 0.040–0.129, pFDR for linear regression: 4.38 × 10−3–1.49 × 10−13), (2) recent pains and on-going treatment for pain (absolute β: 0.083–0.163, pFDR for linear regression: 6.30 × 10−4–1.28 × 10−14), (3) cardiovascular/heart problems (absolute β: 0.066–0.112, pFDR for linear regression: 0.027–1.97 × 10−5), and (4) body mass and weight change compared to 1 year ago (absolute β: 0.014–0.042, pFDR for linear regression: 0.046–5.00 × 10−6).

Bidirectional MR on imaging phenotypes and depression

A significant and potentially causal effect of depression was found on lower microstructural integrity in four white matter microstructural measures and lower resting-state fluctuation amplitude in the Salience Network (Node 14). For these phenotypes, the effect from depression were shown using at least two MR methods after FDR correction (Fig. 5, Supplementary Data 6 and 7 and Supplementary Figs. 711). The neuroimaging phenotypes include (β and pFDR reported for significant effects): global gMD (gMD-Total; β: 0.125–0.724, pFDR for MR: 0.041–0.022, significant for all three MR methods), gMD in thalamic radiations (gMD-TR; β: 0.131–0.527, pFDR for MR: 0.050–0.010, significant for all three MR methods), ICVF in SLF (tICVF-SLF; β: −0.159 to −0.926, pFDR for MR: 0.023–0.015, inverse-variance weighted estimator (IVW) and MR-Egger) and forceps major (tICVF-FMa; β: −0.160 to −0.792, pFDR for MR: 0.040–0.023, IVW and MR-Egger), and the resting-state fluctuation amplitude in the Salience Network (amp-N14; β: −0.130 to −0.177, pFDR for MR: 0.021–0.015, IVW and the weighted median). Other than ICVF in SLF, no significant reverse effects of these neuroimaging phenotypes on depression were found (p for MR ranged from 0.860 to 0.498). For the above significant effects, ICVF in SLF and FMa both showed significant heterogeneity among genetic instruments, indicating potential horizontal pleiotropy (pFDR for Q test: 0.018–0.011, for MR-Presso global test: 0.024–0.009), and after removing outlying genetic instruments, MR-Presso became insignificant for both variables (β: −0.086 to −0.087, p for MR: 0.090–0.081). No other test showed significant horizontal pleiotropy or single-nucleotide polymorphism (SNP) heterogeneity (pFDR for MR-Egger intercept > 0.071, pFDR for all Q tests > 0.135 and pFDR for MR-Presso global test > 0.214).

Fig. 5: Mendelian Randomisation analysis between neuroimaging phenotypes and depression.
figure 5

The left panel shows the model and results for Mendelian Randomisation results for the causal effect of depression to neuroimaging phenotypes, and the right panel shows the model and results for effect of neuroimaging phenotypes to depression. For the model illustrations, G = genetic instruments extracted from GWAS summary statistics of the exposure, E = exposure variable, O = outcome variable, U = unmeasured confounders (have no systematic association with G). In the scatter plots, x axes represent −log10-transformed p values for the Mendelian Randomisation results, and the y axes represent the neuroimaging traits tested in the models. Three types of dots represent the three Mendelian Randomisation methods used. Dashed grey lines are the p = 0.05 threshold for nominal significance. MD mean diffusivity, ICVF intra-cellular volume fraction, TR thalamic radiations, SLF superior longitudinal fasciculus, Amplitude.N14 (SN) fluctuation amplitude in Node 14 (i.e. the Salience Network).

Conversely, the directional effect of neuroimaging phenotypes on depression was then tested. The only significant effect was shown from general variance of ICVF in association fibres to depression for IVW method (gICVF-AF; β = −0.031, pFDR for MR = 0.018); however, results using other MR methods were not significant (pFDR for MR > 0.886). Heterogeneity tests were also highly significant (pFDR = 4.78 × 10−7 for Q test and 3.33 × 10−4 for MR-Presso global test). Two other neuroimaging phenotypes showed nominally significant effects on depression, including higher MD in the ATR for MR-Egger (tMD-ATR; β = 0.107, p for MR-Egger = 0.028, pFDR = 0.166; however, the Egger intercept was not significant, pFDR = 0.432, Supplementary Fig. 11, Supplementary Data 6 and Supplementary Table 3) and ICVF in SLF (tICVF-SLF; β = −0.025, p = 0.030, pFDR = 0.179 for IVW).

An additional test was conducted to see whether there was a substantial reduction in effect sizes after controlling for depressive symptoms (assessed online by CIDI short form and PHQ-9, and PHQ-4 for current symptoms along with the imaging assessment), and all three white matter microstructural measures were found significant in the MR analysis as the causal consequences of depression showed large reductions in effect sizes (reduced by 20.5–30.9%), however, resting-state fluctuation amplitude in Salience Network did not show such a reduction (by 0.3%, see Supplementary Figs. 1214).

In addition to the above MR results, genetic correlations were found between depression and FA in forceps minor (rg = −0.157, pcor for genetic correlation = 0.001), MD in ATR (rg = 0.106, pcor for genetic correlation = 0.012), MD in cingulate part of cingulum (rg = 0.105, pcor for genetic correlation = 0.012), MD in forceps minor (rg = 0.119, pcor for genetic correlation = 0.012), general ICVF in association fibres (rg = −0.083, pcor for genetic correlation = 0.026), ICVF in cingulate part of cingulum (rg = −0.10, pcor for genetic correlation = 0.023), SLF (rg = −0.10, pcor for genetic correlation = 0.023) and uncinate fasciculus (rg = −0.10, pcor for genetic correlation = 0.023) (see Supplementary Table 4).

Mediation analyses on imaging phenotypes

In the first mediation model, we tested whether polygenic risk of depression led to changes in several neuroimaging variables through the mediating effects of depression. The neuroimaging variables were chosen if they presented as a significant causal consequence of depression in the MR analyses. Conversely, in the second model the neuroimaging variable of MD in ATR showed a potentially causal effect on depression at nominal significance using MR and was therefore tested for its potential role as a mediator of genetic risk on depression. Other neuroimaging variables nominally significant in the MR analysis as causal factors were not tested as mediators, because the heterogeneity tests were highly significant. Here we report the results for depression-PRS at the threshold of pT < 1. For other depression-PRS thresholds, see Supplementary Data 8. Details can be found in Supplementary Methods.

We found evidence that current depressive symptoms measured by PHQ-4 mediated the effect of depression-PRS on: global MD (gMD-Total; β = 0.002, pFDR for mediation test = 0.003) and MD in thalamic radiations (gMD-TR; β = 0.002, pFDR for mediation test = 0.001). Conversely, a significant mediation effect of MD in ATR was found, mediating the effect of depression-PRS on current depressive symptoms (PHQ-4) (β = 0.001, p for mediation test = 0.005). All significant mediation models showed good model fit characteristics (CLI ranged from 0.987 to 0.993, TLI ranged from 0.978 to 0.989, and all pRMSEA = 1). A full list of results for all mediation models tested can be found in Supplementary Data 8.

G-by-E interaction

Environmental variables that showed significant interaction with depression-PRS included childhood trauma, Townsend Index and recent stressful life events. The dependent variables that provided evidence of G × E were mainly measures of mental health, including depressive symptoms and the self-declared total number of psychiatric conditions (see Fig. 6 and Supplementary Fig. 15, pFDR for linear regression <0.040).

Fig. 6: G-by-E interaction.
figure 6

The figures present the variance explained by depression-PRS under the exposure of different environmental risk factors. The colour shade of each bar represents one condition of environmental factor, a darker shade represents a risk-conferring condition (i.e. had reported childhood trauma and in the most deprived area). The y axes represent the variance explained (R2 in %) by depression-PRS under the given environmental conditions.

In general, the effect of depression-PRS was enhanced in participants exposed to more adverse social/socioeconomic environments. In participants who reported any childhood trauma versus none, the variance in the dependent variables accounted for by depression-PRS were 1.67–1.78 times higher for the total number of psychiatric conditions and affective symptoms of depression. For participants in the most deprived tertile band, variance explained in the sum of psychiatric conditions was 3.57 times higher than for the least deprived participants. Detailed reports can be found in Fig. 6, Supplementary Fig. 15 and Supplementary Data 914.

We found no evidence of interactions however between depression-PRS and adulthood trauma, recent stressful life events and household income (pFDR for linear regression >0.086).


Replicated associations between depression-PRS, behavioural and neuroimaging phenotypes were found in the present study using an independent imaging cohort. The strongest associations were found between depression-PRS and mental health variables. Several novel associations were detected, including associations between depression-PRS and both brain white matter microstructure and a measure of resting-state activity amplitude. In addition, MR analysis also showed evidence for changes in the MD of thalamic radiations and global variance of MD that, should the assumptions of MR hold, are likely to be a causal consequence of depression. Other associations with higher polygenic risk included more abnormal self-reported sleep problems, smoking behaviour and presence of cardiovascular conditions, as well as an increased in body mass index. Findings regarding the interactions of early-life factors and sociodemographic variables with depression-PRS revealed that the effect of depression-PRS on mental health was stronger in participants who reported childhood trauma and experienced socioeconomic deprivation.

While replicated associations were found between depression-PRS and both behavioural and neuroimaging variables, in total 24.4% of all the behavioural phenotypes tested were found to be significantly associated with depression-PRS. The proportion was lower for the neuroimaging phenotypes tested, where only 7.6% of the variables were significantly associated. The higher proportion of associations found for behavioural phenotypes likely reflects the overlapping genetic architecture of multiple psychiatric conditions with one another and the behavioural traits with which they are commonly associated. In contrast, although several brain phenotypes were associated with depression PRS, our findings suggest that depression shares its genetic architecture with only a small proportion of them. One potential reason for this finding is a relative lack of signal in the GWAS of neuroimaging variables22. It is also possible that depression may have a relatively specific relationship with a smaller number of neuroimaging variables, reflecting the underlying mechanisms of depression.

Novel associations were found between depression-PRS and neuroimaging variables on structural connectivity and functional resting-state fluctuation amplitude in the brain. Findings from both diffusion tensor imaging and resting-state data revealed the importance of the prefrontal cortex, which is a hub for emotion regulation and executive control23,24. The role of the prefrontal cortex is further supported by a GWAS on depression in UK Biobank, which showed the enrichment of risk-associated genes in this region25. In particular, white matter microstructure showed the largest effect sizes among brain phenotypes in our results, and most trait associations in this category were replicated in an independent data set. The current findings therefore indicate a potentially risk-conferring role for white matter over other modalities. This finding is supported by previous evidence that white matter microstructure has stronger phenotypic associations with lifetime depression compared to brain structural volumes26 and higher SNP heritability (20–60%) compared with other neuroimaging modalities, indicating a greater genomic contribution to individual differences in phenotypes27. Resting-state findings were comparatively less replicated than for white matter, which may be due to the less well-standardised protocols for resting-state acquisition. For example, although others have shown that the results of resting-state studies are broadly comparable across a range of acquisition lengths28, the data acquisition time for resting-state data in UK Biobank was relatively short compared with other imaging cohorts, such as the Human Connectome Project29.

The strongest replicated white matter finding with PRS were found for the MD measures, consistent with previously reported depressive symptom associations30. This may due to MD’s greater sensitivity to ageing and related pathophysiological processes in this mid- to late-life UK Biobank sample31. Alternatively, the associations with dispersion density suggest that reductions in MD may be partly due to reduced neurite density. This highlights the need for further investigation of these issues in tissue from large samples of depressed individuals. Recent gene expression studies suggest that genetic predisposition to depression may influence more spatially and functionally specific, neuronal-level activities such as synaptic pruning and the overproduction of synapses32 for regional segregation33 during the process of brain maturation and myelin repair, which contribute largely to brain structural and functional individual variance27. These highly regional and functionally specific brain phenotypes are of great importance and may help explain how genetic predisposition contributes to variance in neuroimaging measures.

Several associations between polygenic risk of depression and neuroimaging variables were subsequently identified, through MR analysis, to have directional or potential causal significance. Whether brain structural and functional alterations are the outcome or cause of depressive symptoms has long been debated34. Our results show that some brain structural and functional alterations are likely to be an outcome of depression; however, whether other imaging features are also a cause is yet unclear. Although our results for the causal effect from neuroimaging phenotype to depression were null, therefore suggesting a possibly uni-directional relationship from depression to the brain, it may be premature to draw confident conclusions without the availability of a greater number of genome-wide significant genetic instruments for neuroimaging traits. It is important to consider that the relative lack of genome-wide significant loci for most neuroimaging measures provides weaker genetic instruments for MR, which may reduce power to detect causal associations. There is currently a global effort to conduct GWAS using neuroimaging phenotypes, and these efforts are likely to provide stronger genetic instruments for future analyses. Further, white matter microstructure in ATR did demonstrate a nominally significant causal effect on depression, but notably not in the reverse direction (from depression to ATR). This is in spite of the reverse direction of testing (from depression to ATR) having a much larger set of genetic instruments and greater power to detect significant effects. This indicates that the white matter microstructure in ATR may be one of the strongest neuroimaging candidates as a causal mediator of risk for depression.

The associations found in behavioural traits with depression-PRS suggest that polygenic risk of depression may also identify a predisposition to experience particular environmental risk exposures, or a vulnerability to their effects and later recall. First, the linear association of depression-PRS with sleep, recent pains, smoking behaviour and the presence of any heart/cardiovascular conditions showed the largest effect sizes. Various mechanisms can be involved in these behavioural patterns, such as hyperactivity in the hypothalamic–pituitary–adrenal (HPA) axis35, and neurodevelopmental or parental impact on poorer health. Studies have shown that poor physical and neurobiological health may be correlated36. Here we identified candidate behavioural and physical phenotypes that may partially explain the genetic association between depression and brain phenotypes for future research to explore. Second and more directly, the environmental risk factors tested in this study consistently strengthened the effect of depression-PRS. Compared with previous studies that test genetic–environment (G × E) interactions, the present study revealed that the G × E effect can present on a whole-genome, polygenic level. It may be a manifestation of interactions between the environmental risk factors and some important endophenotypes (e.g. HPA-axis activity) that polygenic risk of depression confers upon.

There are several strengths and limitations to the present study. PheWAS aims to identify the multiple phenotypes associated with a single risk score. This is arguably a stronger approach than studies that consider a single trait, based on prior theory, as PheWAS is less constrained by prior assumptions based on an incomplete understanding of disease mechanisms. Genotype-based PheWAS approaches also have the considerable advantage that they are based on robust biological knowledge that is fixed from birth and less susceptible to reverse causality. The current study leveraged the most up-to-date GWAS findings in depression, providing the most predictive PRS for depression with >100 instruments to test for bi-directional causal associations with brain imaging variables, and improved power for the detection for significant G × E interactions. This approach led to a number of novel findings, including MR-based evidence for a causal effect of depression on measures of brain function and connectivity. The current data set has considerable advantages in terms of its large sample size and was focussed on whether polygenic associations can be replicated across neuroimaging samples, improving on an area of previously identified methodological weakness37. Most neuroimaging studies have sample sizes in the range of 50–100 people38. In contrast, the current study provided results from a larger sample using a potentially less biased, data-driven approach.

The present study uses MR to address causal relationships between depression and brain imaging measures. The directional or causal relationship between these traits has remained uncertain. Our approach takes a more methodologically consistent approach and applies state-of-the-art causal inference methods to go beyond mere association, prioritising brain regions and risk factors for experimental approaches.

Although we found robust and replicated associations between brain measures and genetic risk of depression, whether neurodevelopmental or neurodegenerative factors are both contributing to the individual differences is unknown. Participants in this study were in their mid-life to later life. Various factors, including ageing39, the long-term effects of early developmental deficits40 and comorbid illnesses, may impact variation in brain phenotypes in this age range. These possible explanations necessitate longitudinal imaging data and studies of high-risk participants that are able to identify the timing and trajectory of brain differences before and after the onset of illness. The genomic regions driving the shared architecture between depression and the brain phenotypes have also not been identified.

The mediation models employed in our investigation were limited in that they tested for causal associations using cross-sectional, rather than longitudinal, data. In order to make causal inferences, we sought consistent findings using both of these methodologies but acknowledge that other methodologies will help to facilitate more robust causal inferences in future. Larger samples for genetic studies on neuroimaging traits would largely benefit such analysis in order to balance the statistic power of clinical and neuroimaging phenotypes.

Future studies that provide improved GWAS on depression and relevant traits would further increase our understandings of depression. Summary statistics we used here were based on GWAS that included some cases identified by self-declared depressive symptoms. As it has been argued in previous papers, the self-declared phenotypes may, to some extent, be more lenient than clinically identified traits; however, the statistic power can largely overcome the noise introduced by a small amount of misclassification, which was supported by a high genetic correlation between self-declared depression and clinically validated depression5,6. While PRS is a powerful means of identifying factors associated with genetic risk, it currently explains around 1.6% of phenotypic variance in depression. Future PRS scores, trained on more precise GWAS summary statistics, are likely to be more strongly predictive and may have greater sensitivity to detect disease-relevant phenotypes. Further associations may be revealed as PheWAS studies increase in size, although this is counterbalanced by their small effect sizes and likely limited clinical utility for individual patients.

To conclude, a novel and relatively unconstrained approach was used to test for associations between depression-PRS and various behavioural and neuroimaging variables of likely relevance for depression. The findings revealed that white matter microstructure, general mental and physical health and behaviours such as sleep patterns and smoking behaviour were associated with PRS of depression. Our findings suggest that most neuroimaging associations with depression are likely to be the causal consequence of depression.



Data from 21,888 individuals who participated in the UK Biobank imaging study18 were included in the current study (released in 2 waves, in May and October 2018, mean age is 62.75 years, standard deviation of age is 7.44 years, 48.4% were male, details can be found in Supplementary Table 5). The discovery sample included participants mainly from the first data release, and the replication sample from the second release (details for the discovery and replication samples can be found in Supplementary Fig. 16). The majority of participants were assessed at the Cheadle MRI site (80.1%) and the rest at the Newcastle site (19.9%). All imaging data were collected using a 3-T Siemens Skyra (software platform VD13) machine.

Behavioural and neuroimaging data acquisition were conducted under standard protocols18,41. Written consent was acquired for all participants. Data acquisition and analyses in the present study were conducted under UK Biobank Application #4844. Ethical approval was accepted by the National Health Service (NHS) Research Ethics Service (11/NW/0382).


In the present study, the sample used for generating GWAS summary statistics is referred to as the training data set. The samples in which depression-PRS were generated and tested are referred to as the testing samples, which include both discovery and replication samples (as described above). We removed any overlapping individuals from the training sample (used to estimate allele effects for polygenic profiling) and testing data sets (where the effects of PRS scores were estimated) (see Supplementary Methods).

PRS were calculated using the summary statistics from a meta-analysis of depression GWAS from three cohorts, including PGC analysis of major depression5, the 23andMe discovery sample in the Hyde et al. analysis of self-reported clinical depression42 and a broad depression phenotype from UK Biobank within individuals who had not participated in the imaging study25. This meta-analysis provided a total training data set of 785,581 individuals (238,360 cases and 547,221 controls; for further details, see the study by Howard et al.6). We used the summary statistics that included only the 8,099,819 SNPs that were present in the GWAS data from all three cohorts6.

PRSice version 2.0 (used with PLINK 1.9)43 was used to calculate the depression-PRS. Before the analyses were conducted, individuals who met the following criteria were removed from the testing data set: related or non-European-ancestry individuals and those who were included in PGC, 23andMe and UK Biobank GWAS on depression (details can be found in Supplementary Methods). The sample sizes reported below are after applying the above criteria. Genotyping and quality control were conducted by UK Biobank as described in an earlier protocol paper44. Details of SNP quality control and imputation can be found in Supplementary Methods. We used the classic thresholding+clumping method to generate PRSs. This method allows direct comparisons with a vast majority of previous major depressive disorder (MDD)-PRS studies that used the same approach. We did not consider some new Bayesian methods because they showed no particular advantages over the thresholding+clumping method for MDD45. Eight p value thresholds were applied to select genetic variants included in calculating PRS, as p < 0.0005, p < 0.001, p < 0.005, p < 0.01, p < 0.05, p < 0.1, p < 0.5 and p < 1.

Behavioural phenotypes

The behavioural phenotypes consisted of 6 broad categories, containing 209 variables in total. Where summary data were available (e.g. neuroticism total score), the individual items used to derive the summary data were not included. Phenotypes that were available on <2000 people in the discovery sample were also excluded from further analysis. Mean sample sizes for all traits contained in each category are included in brackets below. For further details see in Table 1, Supplementary Table 1 and Supplementary Data 1. Categories included: (1) Mental health (Ndiscovery = 7970 and Nreplication = 3880), including self-reported symptoms of major psychiatric conditions46. In this category, three definitions for depression were included: broad depression, which was a self-declared definition of whether the participant had seen a psychiatrist for nerves, anxiety, tension or depression6,25, probable depression which was derived from an abbreviated set of self-declared symptoms of major depression and hospital admission history47, and CIDI depression, a measure assessing full diagnostic criteria for depression based on questions from a shortened version of the structured CIDI46. (2) Sociodemographic measures (Ndiscovery = 8759 and Nreplication = 4352), such as household income and educational attainment. (3) Early-life risk factors (Ndiscovery = 9755 and Nreplication = 10,370), containing physical measures such as birth weight, and environmental variables like adoption and maternal smoking. (4) Lifestyle measures (Ndiscovery = 9231 and Nreplication = 4796), which mainly included items on sleep, smoking, alcohol consumption and diet, (5) Physical measures (Ndiscovery = 8961 and Nreplication = 4618), consisting of self-declared medical conditions such as recent pains, cancers, operations, heart and artery diseases and other major illnesses and also measures of blood pressure, arterial stiffness and hand-grip strength, and finally (6) Cognitive ability (Ndiscovery = 8153 and Nreplication = 4105). This included four tests conducted at the assessment centres, four tests conducted online and a general measure46 derived based on the tests conducted at the assessment centres that have larger sample sizes (see more details in Supplementary Methods).

Table 1 A summary of phenotypes.

All of the behavioural phenotypes, with the exception of mental health items derived from online follow-up questionnaires (see Table 1), were primarily acquired at the same time as the imaging assessment. Missing data for the imaging assessment were imputed using data available from the baseline assessment. The mean age difference between imaging assessment and the initial visit was 8.53 years (SD = 1.56 years). Sample sizes and descriptions for all the behavioural phenotypes can be found in Supplementary Data 1.

Neuroimaging phenotypes

Neuroimaging data consisted of: (1) intracranial and subcortical volumes (Ndiscovery = 10,631 and Nreplication = 5553), containing eight major structures46; (2) T2 flair imaging for the whole brain (Ndiscovery = 9829 and Nreplication = 5472) and in subcortical regions (Ndiscovery = 9702 and Nreplication = 5187), which assess plausible white matter hyperintensity, (3) white matter microstructure, indexed by FA, MD, neurite density (ICVF), isotropic volume fraction and orientation dispersion index (Ndiscovery = 9377 and Nreplication = 5239) for measures of white matter microstructure, in which we included three measures of association, projection and thalamic radiation subsets, and 15 major individual white matter tracts26; (4) pair-wise resting-state functional (rsfMRI) connectivity (Ndiscovery = 9745 and Nreplication = 5241) of 21 nodes over the whole brain28; and finally (5) the amplitude of low-frequency rsfMRI signal fluctuation of the 21 nodes (Ndiscovery = 9745 and Nreplication = 5241). All four types of neuroimaging data consisted of the imaging-derived phenotypes provided by UK Biobank. Available data for the Hariri faces/shapes emotion task included only whole-brain activation measures and a single region of interest (amygdala). We decided to exclude this sparse data from our analyses until more comprehensive measures become available. Images were acquired, pre-processed and quality controlled by UK Biobank using the FMRIB Software Library packages by a standard protocol (URL:, which was also described in two protocol papers18,48. All pilot study data with inconsistent scanner settings and data that did not pass the initial quality assessment conducted by UK Biobank imaging team were not included in the analysis. All imaging data were collected using a 3-T Siemens Skyra (software platform VD13) machine. For clarity, major steps of pre-processing are described in Supplementary Methods.

Statistical models for PheWAS

The GLM function in R (R version 3.2.3 and version 3.3.2, RStudio version 0.98.1080) was used to test the PheWAS associations49, and the LME function from the ‘nlme’ package (version 3.1.131 under R version 3.2.3) in R50 was used to test bilateral brain structures where hemisphere was included as a within-subject variable. Depression-PRSs were set as independent fixed effects, and behavioural and neuroimaging phenotypes were set as dependent variables. Overall, 552 phenotypes (209 behavioural phenotypes + 8 white matter hyperintensity measures + 9 intracranial/subcortical volumes + 95 diffusion tensor imaging measures + 210 rsfMRI connectivity + 21 rsfMRI fluctuation amplitude) × 8 depression-PRS (under 8 p thresholds) = 4416 tests across phenotypes and depression-PRS p thresholds were corrected altogether by FDR-correction51 using p.adjust function in R (q < 0.05).

Covariates included in all association tests were sex, age, age2, the first 15 genetic principal components and genotyping array25. For the replication analysis, MRI site was added in addition to the above covariates for all association tests. In addition to these covariates, adjustments were made for other confounders that were relevant to each phenotypic category, as listed below. Scanner positions on the x, y and z axes were included in the models for all brain phenotypes to control for static-field heterogeneity38. Mean head motion was set as a covariate for the rsfMRI data28,52. Subcortical volumetric tests controlled for intracranial volume26,53. Hemisphere was controlled for where applicable in bilateral brain structural phenotypes26. A list of covariates for each type of phenotype can be found in Supplementary Table 1. Distributions of PRS using different covariates can be found in Supplementary Fig. 17.

In order to help compare the results of logistic and linear regression models, we report the standardised regression coefficients for the models as effect sizes (β) for both types of models. Log-transformed odds ratio for binary dependent variables using logistic regression models are therefore reported. FDR-corrected p values are reported throughout. For clarity, we have also reported the number of associations found using Bonferroni correction as a supplementary method. We acknowledge that as the phenotypes are likely to be correlated, and therefore Bonferroni correction is considered overly conservative. When effect sizes of different signs were presented together, we reported the range of absolute effect sizes. Two-side statistical tests were applied in all analyses.

Replication analysis for PheWAS

Traits that were found to be significantly associated with depression-PRS at a minimum of four PRS variant p thresholds were selected for re-analysis in the independent replication sample. Our decision to combine FDR correction and four-threshold criterion was to control for type I error in the first step (achieved by FDR correction) and to carry the most robust and stable findings significant in more than half of all PRS thresholds to the following MR and mediation analyses. This rationale is in line with studies on depression itself (depression-PRS predicting depression), whereby similar odds ratios are typically reported across multiple PRS thresholds5,6. The replication analysis was conducted on the selected traits across all eight depression-PRS thresholds. Results were considered to be replicated where they showed an identical direction of effect across discovery and replication samples and where the p value for the replication sample analysis was significant after correction for multiple testing for depression-PRS at a minimum of four p thresholds. FDR correction was applied to all the tests conducted in the replication analysis across all traits and p thresholds (e.g., if m traits were taken into replication analysis, then p value adjustment was applied to all m × 8 thresholds). The number of associations found using Bonferroni correction was also reported for clarity.

Bidirectional MR analyses on depression and neuroimaging variables

We used the ‘twosampleMR’ package version 0.4.22 in R to conduct bidirectional two-sample MR analyses between depression and neuroimaging variables in order to test for causal effects54. MR uses genetic data as instruments for testing whether there is any causal effect between an exposure and an outcome variable. A chart illustrating the underlying models can be found in Fig. 5, a flow chart of all the steps in Supplementary Fig. 18 and the main procedure is summarised below.

GWAS summary statistics for depression came from the meta-analysis used to generate the PRS as described above. For the neuroimaging variables, the ones that were found associated with depression-PRS in both the discovery and replication samples were chosen. GWAS were conducted using BGENIE version 355 on these neuroimaging variables in the UK Biobank imaging sample that were used in the PheWAS. Therefore, all exclusion criteria, genetic data quality check, ancestry control, relatedness removal and covariates remain the same as the depression GWAS. Overlapping individuals between the depression GWAS and the neuroimaging GWAS were removed. The neuroimaging variables were scaled to Mean = 0, SD = 1 to obtain standardised estimates. SNP heritability of depression and number of genome-wide significant hits are reported elsewhere6. SNP heritability of white matter microstructure measures estimated using linkage disequilibrium (LD) score regression56 ranged from 13.2% to 34.0% and resting-state fluctuation amplitude ranged from 13.8% to 14.5%. The number of genome-wide significant loci (p < 5 × 10-8) ranged from 2 to 14 for all neuroimaging phenotypes. More details of neuroimaging GWAS summary statistics can be found in Supplementary Table 4.

To test the causal effect of depression on neuroimaging variables, genetic instruments were chosen from the GWAS summary statistics of depression6, at a p threshold of 5 × 10−8. These SNPs were then clumped with a distance of 3000 kb and a maximum LD r2 of 0.001, resulting in 107 independent genetic instruments. These SNPs were then identified within the GWAS summary statistics for each outcome, and those that were not present in both GWAS data sets were removed. SNP effect data on both the exposure and outcome were then harmonised to match the effect alleles before conducting the MR analyses.

For the causal effects of neuroimaging variables on depression, genetic instruments were chosen at a p threshold of 8 × 10−6, as the smallest number of genome-wide significant hits for neuroimaging GWAS was 2 prior to harmonising the two GWAS summary statistics. We therefore chose this lower threshold for neuroimaging GWAS to select genetic instruments. The same approach has been used in a previous MR study57. Genome-wide significant SNPs for depression and relevant genes have been reported and discussed by Howard et al.6. For significant MR results showing effects from neuroimaging variables to depression, we conducted manual inspections on scatter plots to ensure that the top neuroimaging GWAS SNPs driving the results were indeed brain relevant by checking if they have been associated gene expression in neural tissues or associated with other psychiatric or brain phenotypes in previous studies. SNPs that appear anomalous are reported in “Results” and highlighted in “Discussion”. Genetic instruments used in the bidirectional MR analyses are reported in Supplementary Data 7 and Supplementary Table 3. At this threshold, after clumping with the same parameters as for choosing genetic instruments for depression, 12–44 independent genetic instruments were identified for each neuroimaging variable (see Supplementary Table 4). To further illustrate overlapping genetic architecture, we reported results for LD score regression based on the summary statistics above.

Three robust MR methods were chosen: MR-Egger, IVW, and the weighted median method. We also conducted three additional analyses (i) to test for horizontal pleiotropy by estimating the MR-Egger intercept and to test global heterogeneity of the genetic instruments using (ii) the Q test54 and (iii) the MR-Presso global test (using R package ‘MRPRESSO’ version 1.0)58. Four types of plots were generated for visual inspection: (1) leave-one-out plot for testing SNP outliers, (2) funnel plot to show horizontal pleiotropy, (3) forest plot showing single SNP effects in the MR analysis, and finally, (4) scatter plot for overall inspection of effect sizes in GWAS for the cause and outcome.

FDR corrections were applied separately on each MR method within each trait category using a traditional whole-brain family-wise error correction as can be widely seen in other neuroimaging studies59,60.

We have also provided results for genetic correlation using Linkage Disequilibrium Score Regression v1.0.056 in the main text and phenotypic association between depressive symptoms and other variables in Supplementary Table 4 and Supplementary Figs. 1214 for completeness.

Statistical models to test for the mediating effect of neuroimaging variables

Following the PheWAS and MR analyses, we sought to test whether manifestations of depression were mediating the causal effect of depression-PRS on brain imaging phenotypes, as well as whether the neuroimaging variables act as neural mediators of genetic risk on depressive traits (i.e. neuroimaging traits were ‘endophenotypes’). These tests were applied using SEM with the ‘lavaan’ package version in R v3.2.361. Two types of mediation analysis were conducted (Supplementary Fig. 19). The first one aimed to test whether the neuroimaging effects were the consequence of depression by testing whether depression mediated the relationship between polygenic risk and neuroimaging variables (predictor = depression-PRS, mediator variable = CIDI definition of depression/depressive symptoms and dependent variables = neuroimaging traits). Neuroimaging variables were chosen from those measures that showed a significant causal effect from depression in the MR analyses. The second type of mediation models tested whether neuroimaging variables mediated the relationship between polygenic risk of depression on depressive phenotypes (predictor = depression-PRS, mediator = neuroimaging traits and dependent variable = CIDI definition of depression/depressive symptoms). The list of mediators was restricted to the neuroimaging phenotypes that showed significant causal effects on depression by MR analyses. For both types of mediation analyses, variables for manifestations of depression include CIDI definition for depression, severity of depression assessed by CIDI short form62 and the current symptoms at the imaging assessment measured by PHQ-463. In order to maximise statistic power, all mediation tests used the full sample that included both discovery and replication data sets (N = 21,888), adjusted for site.

All covariates remained the same as for PheWAS regression models. p Value correction followed the same method as the MR analysis. Illustration for the models can be found in Supplementary Fig. 1, Supplementary Data 8 and Supplementary Methods.

Interactions of depression-PRS and early risk factors or sociodemographic variables

Interactions between environmental variables, previously associated with depression, and depression-PRS were tested. Environmental variables were chosen from early-life risk factors and sociodemographic variables previously found associated with risk for depression and showed depression case–control difference in the present sample (p < 0.05), which include: household income, Townsend Index, childhood trauma, adulthood trauma, and recent stressful life events in the past 6 months before imaging assessment64,65. Additional tests on the interaction effect between depression-PRS and sex were also reported in Supplementary Data 14 for completeness.

Dependent variables were the behavioural and imaging phenotypes that had significant associations with depression-PRS at a minimum of four thresholds in both the discovery and replication samples. Variables that were selected as factors were not included as dependent variables. The covariates included in these G × E analyses were those included in the PheWAS analyses, plus the interaction terms for PRS × covariates and environmental variables × covariates, in accordance with previous studies66. FDR correction was applied in the same manner with the PheWAS (m dependent variables × 8 p thresholds).

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.