Abstract
The difference between chronological age and the apparent age of the brain estimated from brain imaging data—the brain age gap (BAG)—is widely considered a general indicator of brain health. Converging evidence supports that BAG is sensitive to an array of genetic and nongenetic traits and diseases, yet few studies have examined the genetic architecture and its corresponding causal relationships with common brain disorders. Here, we estimate BAG using state-of-the-art neural networks trained on brain scans from 53,542 individuals (age range 3–95 years). A genome-wide association analysis across 28,104 individuals (40–84 years) from the UK Biobank revealed eight independent genomic regions significantly associated with BAG (p < 5 × 10−8) implicating neurological, metabolic, and immunological pathways – among which seven are novel. No significant genetic correlations or causal relationships with BAG were found for Parkinson’s disease, major depressive disorder, or schizophrenia, but two-sample Mendelian randomization indicated a causal influence of AD (p = 7.9 × 10−4) and bipolar disorder (p = 1.35 × 10−2) on BAG. These results emphasize the polygenic architecture of brain age and provide insights into the causal relationship between selected neurological and neuropsychiatric disorders and BAG.
Introduction
Over the last decade, brain age has emerged as a promising measure of overall brain health [1, 2]. To estimate brain age, machine learning models are applied to brain imaging data to learn visual patterns characteristic of different ages [3, 4]. The difference between predicted brain age and chronological age is termed the brain age gap (BAG) and indicates deviation from a normative trajectory, a potential health indicator. Earlier studies have found a large variation in the predicted brain age of individuals with the same chronological age, and that these interindividual variations correlate with neurological and mental disorders [5,6,7], such as dementia [6, 8], schizophrenia (SCZ) [9, 10], major depressive disorder (MDD) [11], and also mortality [7, 12]. In addition, biological, environmental, and lifestyle factors associated with these disorders have been reported to correlate with BAG, for example, infections [13, 14], smoking [5], physical activity [15], and education level [16].
Genetic differences have been shown to explain a sizeable portion of interindividual variation in BAG. Twin-based heritability for BAG has been estimated to be as high as 0.66 [17], and single nucleotide polymorphism (SNP)-based heritability estimates are also relatively high—around 0.2 [6, 18]. Earlier gene-discovery efforts investigating genetic associations with BAG have found and examined two genomic loci in detail: one on chromosome 1 containing the potassium channel gene, KCNK2, and one in the chromosome 17 inversion region (17q21.31) [18, 19]. Genetic variants in these two regions together explain a negligible fraction of estimated SNP-heritability [18]. These results suggest that existing GWAS were potentially underpowered to fully characterize the genetic architecture, supported by studies using a conditional false discovery rate-based models yielding a larger set of associations [6]. Furthermore, Smith et al. [20] found a rich set of associations when investigating different facets of a multimodal brain age, suggesting that the interplay between genetic variants is complex.
Although BAG has been frequently associated with clinical conditions and health-related phenotypes and behaviors, the underlying genetic basis for the observed associations has seldom been investigated, possibly due to incomplete knowledge of the genetic architecture of the former. Furthermore, the causal relationships between BAG and brain disorders remain untapped. Mendelian randomization (MR) has become an attractive tool to interrogate cause-effect relationships between risk factors and disorders [21]. Two-sample MR models have been used to infer causal relations between hundreds of traits or diseases [22]. However, MR analyses targeting the causal relations between BAG and brain disorders and associated traits have been lacking [23].
In the present work, we improve the yield of genetic associations for BAG using three strategies: First, we estimate brain age using a state-of-the-art deep neural network architecture (SFCN-reg) trained on one of the largest samples assembled to date [5]. Then we perform a GWAS for BAG on out-of-sample predictions for a portion of the UK Biobank v3 data containing 28,104 unrelated individuals, about eight thousand more than earlier studies. Finally, we use two-sample MR to assess the genetic and causal relations between BAG and SCZ, bipolar disorder (BIP), Alzheimer’s disease (AD), MDD, and Parkinson’s disease (PD).
Methods
Sample
All datasets used in the present study have been obtained from previously published studies that have been approved by their respective institutional review boards, research ethic committees, or other relevant ethic organizations.
We used UK Biobank imaging data (UKB, accession number 27412) released in 2019 in combination with a pre-compiled dataset from various sources (Supplementary Table S1) for brain age model training and estimation. For the downstream genetic analyses, we started with the initial 40,330 UKB participants that had undergone at least one brain scan (using baseline scans where more were available). We excluded those with recorded brain injury or neurological or psychiatric conditions, those failing standard image quality checks [5]. To quality check the genetic data, the protocol developed by the NealeLab (nealelab.is/uk-biobank) was strictly followed, in addition to participants who withdrew consent. After removing samples with failed image and genetics quality check, 28,104 unique participants remained.
Brain age estimation
A minimal preprocessing protocol was applied to all raw T1-weighted brain MRI images before brain age estimation [5]: The auto-recon pipeline from FreeSurfer 5.3 [24] was used to remove nonbrain tissue. The resulting volumes were reoriented to the standard FSL [25] orientation using fslreorient2std, and linearly registered to the 1 mm FSL (version 6.0) MNI152 template using FLIRT [26], with 6 degrees of freedom. For efficiency, during model fitting, we cropped a central cube spanning the voxels 6:173, 2:214, and 0:160 in the sagittal, coronal, and axial dimensions, respectively. Before modeling, all voxel intensities were normalized by a constant factor to produce values in the range [0, 1].
The data from all sources (Supplementary Table S1 and UKB) were split into five equally-sized and disjoint folds with comparable age ranges and sex distributions. Four of these folds were used for fitting the brain age model, and out-of-sample estimates were computed for the remaining fold. This procedure was repeated five times, resulting in out-of-sample brain age estimates for all participants. Next, BAG was calculated by subtracting chronological age from estimated brain age. The subsequent analyses were performed on the out-of-sample estimates of the UKB data (Supplementary Table S2).
Genome-wide association study
Imputed genotypes for the 28,104 participants were obtained from UKB (Category 100314, for further details see [27]). We excluded SNPs based on missing rate (>0.02), the Hardy-Weinberg Equilibrium test (p < 10-6) and minor allele frequency (MAF; < 0.01). In total, ≈8.6 million SNPs were analyzed. Since we have observed apparent differences in predicted brain age across folds (Supplementary Fig. S1), a GWAS was performed on each hold-out fold separately using PLINK 1.90 beta [28]. The additive genetic model was assumed, and chronological age, sex and the top ten principal components were included as covariates, accounting for population structure. Association results for each hold-out fold of UKB along with distributions of BAG are shown in Supplementary Figure S1. These association results were then meta-analyzed using the inverse variance weighted model implemented in PLINK to identify SNPs that are associated with BAG. Supplementary Figure S2 shows the association QQ plot which indicated no noticeable genomic inflation.
Associated regions and genes
Association results were ‘clumped’ by the FUMA [29] web-service using the linkage disequilibrium (LD) structure from the 1000 Genomes projects phase 3 EUR dataset (1KGp3), with parameters –clump-p 5e-8 –clump-2 1e-6 –clump-r2 0.1. The standard 250 kilo-bases (kb) were used as the inter-region distance threshold. Genes whose genomic coordinates located within the boundaries of each region were assigned to the corresponding region. SNPs with the smallest association p values were taken as the lead SNPs for the corresponding regions. In addition, the gene that is closest to each lead SNP by genomic position was annotated using the Ensembl tool VEP [30] (Table 1).
Associated regions were fine-mapped using the FINEMAP [31] program. The LD structure from 1KGp3 was also used in this analysis. The default settings of FINEMAP were used, which compares causal models assuming one causal variant in each region to that assuming two, based on the estimated posterior probabilities (PP_1 versus PP_2). FINEMAP ranks all possible configurations in each model presented as 95% credible sets. The confidence of a variant belonging to a set was evaluated by posterior probabilities of inclusion (PPI). In the case of assuming one causal variant, each single variant was assigned a PPI. In the two causal variants cases, each pair of variants was assigned a PPI.
Post-GWAS annotations
Both FUMA and Garfield [32] were used for annotating associated SNPs. First, SNPs were assigned to genic elements (e.g., exon, intron, 3′ and 5′ untranslated regions, intergenic regions, etc.), and the enrichment of this assignment was tested by hypergeometric test (FUMA) or logistic regression models (Garfield). Expression levels of annotated genes to the associated SNPs were inspected in each of the 54 tissue types from the GTEx v8 dataset [33]. To further test if the identified variants affect expression levels of these genes the GTEx v8 eQTL portal (gtexportal.org) was searched. Data in this portal include the association statistics of SNPs with gene expressions in 49 different tissues. We took a conservative significant threshold to claim the existence of evidence as p < = 0.05/49*8 = 1.3 × 10−4. Moreover, detailed biological functions for proteins coded by these genes were manually searched in the NCBI Entrez Gene database [34] and the UniProtKB database [35].
Genetic correlations between BAG and disorders
GWAS summary data for four disorders (SCZ [36], BIP [37], MDD [38], and AD [39]) were obtained from the Psychiatric Genomics Consortium (PGC, https://med.unc.edu/pgc/download-results). For each GWAS, the association results for European ancestral samples excluding samples from 23andMe were used (SCZ: n case=67,390, n control = 94,015; BIP: n case = 41,917, n control = 371,549; MDD: n case = 59,851, n control = 113,154; AD: n case = 71,880, n control = 383,378). The PD GWAS results were obtained from the fixed-effect meta-analysis performed by the International Parkinson Disease Genomics Consortium (IPDGC, n case = 33,674, n control = 449,056) [40].
Before post-GWAS analysis we processed the results from all GWAS using a standard protocol. Specifically, SNPs having a MAF < 0.05, or imputation INFO < 0.5, or ambiguous allelic coding (A/T, or C/G) were removed from subsequent analyses. The LD score model (ldsc) [41] was applied to estimate SNP-heritability and genetic correlations between BAG and disorders. Only high-quality SNPs published in the HapMap3 dataset were used for estimation. The LD score derived from the 1KGp3 was used as input to ldsc. The Benjamini-Hochberg False Discovery Rate (FDR) procedure was used to correct for multiple testing across disorders (FDR-corrected p < 0.05 was considered statistically significant).
To visualize polygenic enrichment, conditional QQ plots [42, 43] were made for BAG versus each disorder. In these plots, the QQ curves for the association statistics (-log10 P values) for BAG were stratified by the corresponding association strength for the conditioned disorder. As the association strength to the conditioned disorder increases, a successive leftward deflation in these curves indicates polygenic enrichment. Similarly, conditional QQ for each disorder versus BAG shows polygenic enrichment in the reverse direction.
Two-sample Mendelian randomization
To study the cause-effect relations between BAG and the five disorders, two sets of MR analyses were performed. The first set, using standard models, included the inverse-variance weighted model (IVW) [44], weighted median (wMed) [45], Egger regression (Egger) [46], and MR-PRESSO (PRESSO) [47]. For these analyses, only genome-wide significant SNPs (p < 5×10−8) to the exposure traits or disorders were used as potential instruments. The PLINK program and the LD structure of 1KGp3 dataset were used to select instruments with the following parameters, --clump-kb 500 kb, --clump-p1 5×10−8, and --clump-r2 0.01. The TwoSampleMR package [19] was used for data harmonization and causal inference for the IVW, wMed, and Egger models. The same harmonized datasets were used as input to the MR-PRESSO software to assess outliers that may artificially affect MR estimates, i.e., SNPs that show horizontal pleiotropy to both BAG and disorders. Harmonized instrumental SNPs are shown in Supplementary Tables S6–S15.
The second set of models included the robust adjusted profile score (RAPS) [48] and the CAUSE models [49]. These models can make use of SNPs that show a suggestive level of association (p < 10-3) with exposure to increase statistical power without incurring weak instrument bias in estimation. Although both models control for horizontal pleiotropy, CAUSE directly tests for a shared (correlated horizontal pleiotropy) versus a causal model for each relation [49]. The same instrument selection procedures used in the first set of models were used here, except that 10-3 was taken as the cut-off for selecting instruments, i.e., –clump-p1 10-3.
As each of the six MR models has different assumptions that are difficult to verify in real data, a majority vote ensemble scheme was used to make conclusions for the existence of cause-effect relations: specifically, only when four or more models indicated a cause-effect relation (FDR adjusted P < 0.05) was such a relation considered causal.
In addition to applying multiple MR models, GWAS results for height measured for European samples [50] (n = 253,288; https://portals.broadinstitute.org/collaboration/giant/), for AD diagnosed in a Japanese sample [51] (n case= 3962 and n control = 4047) and an African sample [52] (n case = 2784 and n control=5 222) and for BIP diagnosed in a Japanese sample [53] (n case = 2964 and n control = 61,887) were used to corroborate MR findings. As commonly done in genetic studies, height was used as a negative exposure control to test if population stratification could generate spurious causal effects [54]. The non-European GWAS data were used to test if any observed causal effects generalize across ancestry groups, although with significantly smaller sample sizes.
Results
We obtained accurate brain age estimates; mean absolute errors (MAEs) in each of the five disjoint folds were consistently below 2.5 years (Supplementary Table S2). This was consistent when we split the dataset into different subsets based on covariates (MAE = 2.40 in females compared to 2.53 in males; 2.40 in the youngest half compared to 2.52 in the oldest), although we observed a slight age bias (Supplementary Fig. S1). Based on the meta-analyzed GWAS results, we estimated a SNP heritability of 0.27 (standard error (SE) = 0.036) for BAG (Methods section). Our estimate is comparable to or higher than the two previously reported estimates (0.26, SE = 0.044 [6]; 0.19, SE = 0.02 [18]).
We identified eight independent genomic loci significantly associated with BAG (Fig. 1a, b and Table 1). Associations of lead SNPs in these regions to BAG are highly consistent in directions and effect sizes across the five folds (Supplementary Table S3). Among these loci, the one in the inversion region on chromosome 17 (lead SNP rs2106786), including the MAPT gene, has been previously reported [18, 19], although indexed by a different SNP (rs2435204). This SNP was also highly significant in our analysis (p = 5.4 × 10−21, beta = 0.27 years, effective-allele = G). The locus containing the RUNX2 gene (lead by rs2790102: p = 8.92 × 10−9, beta = −0.15 years, effective-allele=A), which showed suggestive significance in Jonsson et al. [18], was genome-wide significant in the present study. The RUNX2 gene codes for a master transcription factor which plays a critical role in skeletal development [55]. Among the remaining six novel loci, the rs79107704-A allele showed the largest association with BAG; one copy of this allele was associated with an average increase in brain age of 0.63 years (Table 1). This SNP is located 3405 bp downstream of the Betaine-homocystein S-methyltransferase 2 gene (BHMT2, Fig. 1b), a gene whose product is involved in choline metabolism during development [56]. Other protein-coding genes that are closest to the lead SNPs include those involved in calcium signaling (CAMK2N2 and INPP5A) and metabolism and transcription regulation (GALC, KLF3, and KLHL38), both processes are implicated in biological ageing [57]. In Supplementary Table S5, we present detailed annotations of biological functions of each gene.
a The Manhattan plot of meta-analyzed association results for brain age gap (BAG). Chromosome numbers are shown on x axis, -log10 association p values on y axis and lead SNP rs-numbers in the plot. b Region plots for each of the eight associated regions. Genes located in each region are shown below each figure. Linkage disequilibrium r-squared values are indicated by colors; and recombination frequences by curves. c Expression levels of the annotated genes across tissues analyzed by the GTEx v8 study. Colors indicate average log2 transformed expression level in each tissue.
We further annotated these identified SNPs to nearby genes and regulatory elements (Methods). Most of the associated SNPs are in noncoding regions such as intergenic, intron or untranslated regions (Supplementary Figs. S3 and S4). Using the default parameters in FUMA [29], 54 unique genes were found to be implied by these significant associations by genomic position. The expression levels of these genes in the 54 tissue types from the GTEx v8 project [33] showed three remarkable patterns (Fig. 1c). The first set of genes expressed highly across almost all 54 tissues; the second set of genes showed low expression levels in most tissue types; and the last set, including eight genes, was highly expressed only in brain tissue (Fig. 1c), for example, MAPT, GFAP, and the Homeobox protein gene NKX6-2. These results suggest that BAG encodes coordinated physiological processes implicating both the brain and the peripheral systems.
To nominate causal variants in each locus we performed statistical fine-mapping [31] for regions around each lead SNP in Table 1 (Methods). Except for the locus on chromosome 14 which was not resolvable, all loci clearly indicated that the 95% credible sets suggest a causal model with one causal SNP, instead of two, i.e., the posterior probability for the 1-SNP set were larger than those of the 2-SNP sets (Supplementary Table S16). Furthermore, four credible sets indicated that the lead SNPs were also the causal ones (posterior inclusion probability (PPI_1) > 0.05 and >PPI of the second most probable SNP(PPI_2)) (Methods; Supplementary Table 16) but identifying the causal SNP for the remaining were difficult. For example, the MAPT locus on chromosome 17 and the RUNX2 locus on chromosome 6 showed two SNPs having almost equal and small PPIs (i.e., <=0.05), indicating that the true causal variants may be some untyped rare ones not investigated in this study. The clearest signal comes from the regions on chromosome 3 and 5, where the PPIs for the lead SNPs were much larger than that for the second most probable causal SNPs.
We investigate whether the statistically fine-mapped causal variants affect BAG through transcriptional regulatory mechanism using the GETx database (eQTL or sQTL data for 49 tissues; gtexportal.org; accession date 25 February 2023). Except rs79107704, the seven SNPs significantly affect the transcription levels of one or more nearby genes (p < 1.3 × 10−4; Supplementary Figs. S6–S12 and Tables S17–S22). Of note, rs2790102 and rs17203398 affect the only gene (RUNX2 and GALC, respectively). The other five affect the expression of two or more genes, particularly for rs2106786 that affect 37 unique genes or noncoding RNAs across all the 49 tissue types. In addition, this SNP also affects the splicing isoforms for 15 unique genes in the 49 tissue types (Supplementary Table S21-S22). This observed complex pattern makes it difficult to pin down the genes through which rs2106786 influences BAG.
We observed nominally significant genetic correlation between BAG and AD that did not survive FDR-correction (r = 0.23, SE = 0.1, p = 0.02, FDR adjusted p = 0.13) and no significant associations with any other of the four disorders (Fig. 2a). SNP heritability estimates for the five disorders were all significant but varied greatly; SCZ showed the largest (0.34, SE = 0.01) and AD showed the lowest (0.01, SE = 0.005) estimates. Bidirectional conditional QQ plots (Fig. 2b–d; Supplementary Figs. S13 and S14) showed that there was noticeable genetic enrichment for BIP conditional on BAG but not in the reverse direction. For AD and PD, both directions showed clear enrichment, surprisingly for PD that did not show significant genetic correlation with BAG (r = −0.07, p = 0.42).
a Genetic correlation between brain age gap and disorders computed by ldsc. SNP heritability and its standard error are indicated. b–d Conditional QQ plot between brain age gap and disorders in both directions. Colors are used to indicate different association strength to the conditioned traits, i.e., the ones indicated after the vertical bar in each figure. Dashed diagonal lines indicate expected null distributions. AD Alzheimer’s disease, BIP bipolar disorder, MDD major depression disorder, PD Parkinson’s disease, SCZ schizophrenia.
We then performed extensive MR analyses using six different models to examine the existence of cause-effect relations between BAG and the five disorders (Methods). Figure 3a shows that BAG was only causally associated with PD, i.e., four out of the six MR models showed a negative relation with varying effect sizes (all with adjusted p < 0.05). One year increase in genetically predicted BAG was estimated to reduce the risk of PD by a log odds ratio from 1.4 (by Egger regression) to 0.02 (by MR-RAPS) (Supplementary Table S23). In the reverse direction (i.e., disorders as exposure), increased genetic risk for AD and BIP were causally associated with increased BAG (30 and 55 SNPs used as instruments, respectively); these estimated causal effects on BAG were relatively larger for AD than BIP (Fig. 3b; Supplementary Table S24).
a Causal effect of brain age gap (BAG) on risk of disorders; b Causal effect of genetic risk of disorders on BAG. Colors indicate different models; triangle indicates significant effect after false discovery correction. Estimated standard errors for each effect are aslo shown. c Scatter plots of SNP effects on AD (x axis) and BAG (y axis). d Scatter plots of SNP effects on BIP (x axis) and BAG (y axis). e Scatter plots of SNP effects BAG (x axis) and PD (y axis). f Scatter plots of SNP effects on PD (x axis) and BAG (y axis). Causal effects estimated by the five models (except CAUSE) are shown by fitted lines; slopes of these lines indicate causal effect sizes. Exceptional SNPs are marked by boxes that include SNP rs-numbers and genome location in the hg19 coordinates. AD Alzheimer’s disease, BIP bipolar disorder, MDD major depression disorder, PD Parkinson’s disease, SCZ schizophrenia.
A close investigation into the scatter plots of instrumental SNPs showed that the causal effect of AD on BAG was primarily driven by a SNP (rs59007384) in the APOE region, which was not identified as a horizontal pleiotropic instrument by MR-PRESSO (outlier test p > 0.05) (Fig. 3c); there were no extreme instruments identified for the BIP to BAG relation by MR-PRESSO (Fig. 3d) but Egger-regression indicated existence of horizontal pleiotropy (Egger intercept test: p = 0.017). The causal effects of BAG on PD were primarily driven by two SNPs in the inversion region on chromosome 17, effective alleles of these SNPs were associated with higher BAG and lower risk of PD (Fig. 3e). SNPs in the same region also drove the negative causal relation (not significant) from PD to BAG (Fig. 3f). However, both SNPs were flagged as horizontal pleiotropic instruments by MR-PRESSO (p < 0.05) and Egger-regression (Egger intercept test: p = 0.03 and 0.008, respectively). Therefore, the observed negative relations between BAG and PD are less likely to be causal.
We used the GWAS results for height of European samples and cross-ancestry MR analysis to corroborate the identified causal relations (Methods). We found no causal effect between BAG and height with any of the MR methods employed (all p > 0.05). Therefore, our observed AD and BIP to BAG relations are less likely to be driven by population stratification, i.e., both the exposure and outcome data originating from the same ancestry group. There was also no significant cross-ancestral causal effect detected using AD data from Japanese or African samples (IVW p = 0.74, 0,85, respectively), and BIP data from the Japanese sample to BAG (beta = 0.10, p = 0.13).
Discussion
Combining the advantages of large samples and advanced models for brain age prediction, we confirmed that BAG is a heritable and polygenic trait, and estimated the genetic pleiotropy and causal genetic relations with major brain and mental disorders. We identified seven novel loci associated with BAG, in addition to confirming the previously reported MAPT loci [18, 19]. Although MR indicated that increased genetic risk for AD or BIP may be causally associated with higher BAG, our results demonstrate that individual variability and previously reported case-control differences in BAG only to a marginal degree should be attributed to the common genetic architecture previously associated with the respective diseases.
Functional annotation of the genes linked to the identified loci confirms that deviations in BAG are linked to complex processes encompassing multiple biological systems [20]. Although earlier work observed this variety when investigating different multimodal aspects of imaging data linked to brain age, our findings suggest it also exists when looking at a singular BAG computed from only T1-weighted MRI data. Our coarse division of the implied 54 genes into three groups indicates that only eight genes are specifically expressed in brain tissue. The remaining genes were either expressed in abundance across all tissue types tested, including the brain, or expressed at very low levels across all tissues. Nonetheless, the proteins coded by these nonbrain-specific genes have been implicated in brain-related disorders or traits (Supplementary Table S5). For example, among the genes we found to be expressed across all tissue types (group 1), mutations in AP2M1 have been linked to epilepsy, intellectual developmental disorder, and seizures [58]; among the genes expressed in low levels across tissues (group 2), STH has been associated with frontotemporal dementia and 17q21.31 duplication syndrome [59, 60]. More importantly, we show that our fine-mapped causal SNPs affect the expression levels of these genes in multiple tissue types, providing testable molecular mechanisms for these genetic variants. In addition, although our analysis revealed no significant pathway enrichment, these 54 genes contribute to biological functions that include calcium signaling, protein metabolism, DNA damage repair, and general innate immune defense. Thus, our analyses highlight the role of these diverse sets of processes affecting the brain throughout life.
Prior work has shown higher BAG in patients with a multitude of disorders compared to healthy controls [5, 6, 8, 9, 11], and has documented partly overlapping genetic associations between BAG and clinical conditions [6]. However, the causal effects have remained unclear. Our MR approach suggested that genetically predicted risks for AD or BIP were causally associated with increased BAG. However, these relations were only weakly supported by genetic correlation analysis. One possible explanation for this weaker support from genome-wide signals (genetic correlation) in contrast to MR (significant associations only) might be due to heterogeneous genetic correlations across the genome, i.e., some genomic regions show positive correlations while others show negative correlations [61, 62]. In such a scenario, the net genetic correlations between the two traits are expected to be lower than regional correlations.
The causal effect of genetically predicted risk for AD on BAG was small but consistent in directions across the six MR models, four of which were significant after multiple-testing correction. For BIP, although four models showed significant effects, the CAUSE model suggested an opposite direction of effect to the other five models. Thus, we advise careful interpretation of this result. Our attempts of testing causal relations across ancestral groups led to largely null fundings for the AD to BAG relations. We believe these nonsignificant fundings are largely due to the lack of statistical power in the non-European GWAS [51,52,53].
The observed causal relations between genetically predicted risk of brain disorders and BAG are intriguing. One possible interpretation is that overt changes in the brain incurred by the disorders contribute to accelerated aging. Another possibility may be that lifestyle and health-related behaviors of patients with clinical conditions such as AD and BIP, e.g., medication [63], may increase brain age. Yet another is that genetic variation associated with clinical traits may influence the brain early in life. Given the comparable sample sizes to the GWAS of AD and BIP and the widely observed clinical correlations, surprisingly, no genetic nor causal relations of SCZ and MDD with BAG were found. On the one hand, this may suggest that previously reported case-control differences in BAG do not reflect causal relations, but rather a combination of indirect and confounding factors. For example, smoking and physical exercise have been associated both with MDD and SCZ [64,65,66,67] and brain age [5, 66]. Alternatively, it has been shown that both BAG [20] and psychiatric disorders are highly heterogeneous phenotypes [68, 69], and thus further identification and characterization of the causal relations may require even larger, and carefully screened, samples. It is also worth noting that while the sample sizes for the disorders are large, our BAG GWAS sample is relatively small. Thus, our null findings in the direction from BAG to disorders may be due to too weak instruments [70].
Our initial results showed weak evidence of a causal relation between BAG and PD, corroborating two recent studies which reported a weak correlation between BAG and PD [71, 72]. Striking patterns of enrichment between the two were shown in the conditional QQ plots and four out of six MR models indicated that genetically predicted BAG may have protective effect from PD. However, we found that these relations were completely caused by the MAPT gene region on chromosome 17: After removing chromosome 17 from our analyses, no enrichment was observed in either direction (Supplementary Fig. S15). In addition, instrumental SNPs in this region were detected by MR-PRESSO as horizontal pleiotropic SNPs, i.e., affecting BAG and PD through independent biological pathways. Reperforming MR analysis excluding these outlier SNPs confirmed null causal relations. Thus, we conclude that we found no evidence for causal relations between genetically predicted risk for PD and BAG. Our analytic procedures also highlight the importance of triangulation and converging evidence in causal inference analysis [73].
While the present study advances current knowledge regarding the genetic architecture of and causal contributions to BAG, the results should be interpreted with caution. Although we confirm previously reported genetic associations with BAG, e.g., the MAPT gene locus [18, 19], our sample overlaps with previous ones—which were also based on UK Biobank data. We attempted replicating our findings in three independent but small samples (n ranges from 321 to 702; Supplementary Analysis and Table S4) but no clear replications were achieved. Therefore, independent large-scale samples are needed for replication. We used a simple voting schema across six different MR models to infer causal relations between genetically predicted BAG and brain disorders. Furthermore, as only eight independent loci showed significant associations with BAG, other models [74] that require large number of genome-wide significant instruments were considered not applicable. However, it should be noted our simple voting approach may not be the most efficient strategy for identifying causal effects. Formal development of ensemble methods, such as bagging [75], may provide better grounds for precise interpretation. Furthermore, our BAG GWAS is still smaller than GWAS performed for the disorders, which may partly explain the lack of causal effects of BAG on brain disorders. Another limitation is that we were unable to obtain independent data to perform three sample MR analysis, a model that can account for the winner’s curse bias in two sample MR models. Therefore, to increase our confidence in the identified relations, large-scale data for BAG, and replications in independent datasets are needed. Relatedly, our estimation of brain age was based on cross-sectional samples, which makes its interpretation nontrivial [76], and studies built on longitudinal data could help disentangle its complexities. Finally, although we refer to our brain age estimation in general terms, it is based on T1-weighted MRI data only. The brain is a complex and heterogeneous organ, and different imaging modalities are known to capture different aspects of the naturally occurring variation. Thus, studies relying on other modalities, either independently or in combination, could reveal a broader set of associations [77].
In conclusion, the present study increases the yield of genetic associations with brain age to eight genomic loci; implicated genes indicate involvement of calcium signaling, DNA damage repair, protein metabolism, and general innate immune defense. Our analysis did not provide evidence of a causal relationship between BAG and the included clinical conditions, and their interactions remain unclear.
References
Franke K, Gaser C. Ten years of BrainAGE as a neuroimaging biomarker of brain aging: what insights have we gained? Front Neurol. 2019;10:789.
Cole JH, Franke K. Predicting age using neuroimaging: innovative brain ageing biomarkers. Trends Neurosci. 2017;40:681–90.
Franke K, Luders E, May A, Wilke M, Gaser C. Brain maturation: predicting individual BrainAGE in children and adolescents using structural MRI. NeuroImage. 2012;63:1305–12.
Smith SM, Vidaurre D, Alfaro-Almagro F, Nichols TE, Miller KL. Estimation of brain age delta from brain imaging. NeuroImage. 2019;200:528–39.
Leonardsen EH, Peng H, Kaufmann T, Agartz I, Andreassen OA, Celius EG, et al. Deep neural networks learn general and clinically relevant representations of the ageing brain. NeuroImage. 2022;256:119210.
Kaufmann T, van der Meer D, Doan NT, Schwarz E, Lund MJ, Agartz I, et al. Common brain disorders are associated with heritable patterns of apparent aging of the brain. Nat Neurosci. 2019;22:1617–23.
Cole JH, Ritchie SJ, Bastin ME, Valdés Hernández MC, Muñoz Maniega S, Royle N, et al. Brain age predicts mortality. Mol Psychiatry. 2018;23:1385–92.
Gaser C, Franke K, Klöppel S, Koutsouleris N, Sauer H, Alzheimer’s Disease Neuroimaging I. BrainAGE in mild cognitive impaired patients: predicting the conversion to Alzheimer’s disease. PloS one. 2013;8:e67346.
Schnack HG, van Haren NEM, Nieuwenhuis M, Hulshoff Pol HE, Cahn W, Kahn RS. Accelerated Brain Aging in Schizophrenia: A Longitudinal Pattern Recognition Study. Am J Psychiatry. 2016;173:607–16.
Constantinides C, Han LK, Alloza C, Antonucci L, Arango C, Ayesa-Arriola R, et al. Brain ageing in schizophrenia: evidence from 26 international cohorts via the ENIGMA Schizophrenia consortium. Mol Psychiatry. 2023;28:1201–9.
Han LKM, Dinga R, Hahn T, Ching CRK, Eyler LT, Aftanas L, et al. Brain aging in major depressive disorder: results from the ENIGMA major depressive disorder working group. Mol Psychiatry. 2021;26:5124–39.
Elliott ML, Belsky DW, Knodt AR, Ireland D, Melzer TR, Poulton R, et al. Brain-age in midlife is associated with accelerated biological aging and cognitive decline in a longitudinal birth cohort. Mol Psychiatry. 2021;26:3829–38.
Kuhn T, Kaufmann T, Doan NT, Westlye LT, Jones J, Nunez RA, et al. An augmented aging process in brain white matter in HIV. Hum Brain Mapp. 2018;39:2532–40.
Cole JH, Underwood J, Caan MWA, De Francesco D, van Zoest RA, Leech R, et al. Increased brain-predicted aging in treated HIV disease. Neurology. 2017;88:1349–57.
Steffener J, Habeck C, O’Shea D, Razlighi Q, Bherer L, Stern Y. Differences between chronological and brain age are related to education and self-reported physical activity. Neurobiol Aging. 2016;40:138–44.
Wrigglesworth J, Ward P, Harding IH, Nilaweera D, Wu Z, Woods RL, et al. Factors associated with brain ageing - a systematic review. BMC Neurol. 2021;21:312.
Cole JH, Poudel RPK, Tsagkrasoulis D, Caan MWA, Steves C, Spector TD, et al. Predicting brain age with deep learning from raw imaging data results in a reliable and heritable biomarker. NeuroImage. 2017;163:115–24.
Jonsson BA, Bjornsdottir G, Thorgeirsson TE, Ellingsen LM, Walters GB, Gudbjartsson DF, et al. Brain age prediction using deep learning uncovers associated sequence variants. Nat Commun. 2019;10:5409.
Ning K, Zhao L, Matloff W, Sun F, Toga AW. Association of relative brain age with tobacco smoking, alcohol consumption, and genetic variants. Sci Rep. 2020;10:10.
Smith SM, Elliott LT, Alfaro-Almagro F, McCarthy P, Nichols TE, Douaud G, et al. Brain aging comprises many modes of structural and functional change with distinct genetic and biophysical associations. eLife. 2020;9:e52677.
Smith GD, Ebrahim S. ‘Mendelian randomization’: can genetic epidemiology contribute to understanding environmental determinants of disease? Int J Epidemiol. 2003;32:1–22.
Hemani G, Zheng J, Elsworth B, Wade KH, Haberland V, Baird D, et al. The MR-Base platform supports systematic causal inference across the human phenome. eLife. 2018;7:e34408.
Kolbeinsson A, Filippi S, Panagakis Y, Matthews PM, Elliott P, Dehghan A, et al. Accelerated MRI-predicted brain ageing and its associations with cardiometabolic and brain disorders. Sci Rep. 2020;10:19940.
Ségonne F, Dale AM, Busa E, Glessner M, Salat D, Hahn HK, et al. A hybrid approach to the skull stripping problem in MRI. NeuroImage. 2004;22:1060–75.
Jenkinson M, Beckmann CF, Behrens TEJ, Woolrich MW, Smith SM. FSL. NeuroImage. 2012;62:782–90.
Jenkinson M, Smith S. A global optimisation method for robust affine registration of brain images. Med Image Anal. 2001;5:143–56.
Bycroft C, Freeman C, Petkova D, Band G, Elliott LT, Sharp K, et al. The UK Biobank resource with deep phenotyping and genomic data. Nature. 2018;562:203–9.
Chang CC, Chow CC, Tellier LC, Vattikuti S, Purcell SM, Lee JJ. Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience. 2015;4:7.
Watanabe K, Taskesen E, van Bochoven A, Posthuma D. Functional mapping and annotation of genetic associations with FUMA. Nat Commun. 2017;8:1826.
McLaren W, Gil L, Hunt SE, Riat HS, Ritchie GR, Thormann A, et al. The ensembl variant effect predictor. Genome Biol. 2016;17:122.
Benner C, Spencer CC, Havulinna AS, Salomaa V, Ripatti S, Pirinen M. FINEMAP: efficient variable selection using summary data from genome-wide association studies. Bioinformatics. 2016;32:1493–501.
Yap CX, Henders AK, Alvares GA, Wood DLA, Krause L, Tyson GW, et al. Autism-related dietary preferences mediate autism-gut microbiome associations. Cell. 2021;184:5916–31.e5917.
Consortium G. Human genomics. The Genotype-Tissue Expression (GTEx) pilot analysis: multitissue gene regulation in humans. Science. 2015;348:648–60.
Maglott D, Ostell J, Pruitt KD, Tatusova T. Entrez Gene: gene-centered information at NCBI. Nucleic Acids Res. 2011;39:D52–D57. suppl_1
Boutet E, Lieberherr D, Tognolli M, Schneider M, Bairoch A UniProtKB/Swiss-Prot. In: Edwards D (ed). Plant Bioinformatics: Methods and Protocols. Humana Press: Totowa, NJ, 2007, pp 89–112.
Trubetskoy V, Pardiñas AF, Qi T, Panagiotaropoulou G, Awasthi S, Bigdeli TB, et al. Mapping genomic loci implicates genes and synaptic biology in schizophrenia. Nature. 2022;604:502–8.
Mullins N, Forstner AJ, O’Connell KS, Coombes B, Coleman JRI, Qiao Z, et al. Genome-wide association study of more than 40,000 bipolar disorder cases provides new insights into the underlying biology. Nat Genet. 2021;53:817–29.
Wray NR, Ripke S, Mattheisen M, Trzaskowski M, Byrne EM, Abdellaoui A, et al. Genome-wide association analyses identify 44 risk variants and refine the genetic architecture of major depression. Nat Genet. 2018;50:668–81.
Jansen IE, Savage JE, Watanabe K, Bryois J, Williams DM, Steinberg S, et al. Genome-wide meta-analysis identifies new loci and functional pathways influencing Alzheimer’s disease risk. Nat Genet. 2019;51:404–13.
Nalls MA, Blauwendraat C, Vallerga CL, Heilbron K, Bandres-Ciga S, Chang D, et al. Identification of novel risk loci, causal insights, and heritable risk for Parkinson’s disease: a meta-analysis of genome-wide association studies. Lancet Neurol. 2019;18:1091–102.
Bulik-Sullivan B, Finucane HK, Anttila V, Gusev A, Day FR, Loh PR, et al. An atlas of genetic correlations across human diseases and traits. Nat Genet. 2015;47:1236–41.
Lo MT, Wang Y, Kauppi K, Sanyal N, Fan CC, Smeland OB, et al. Modeling prior information of common genetic variants improves gene discovery for neuroticism. Hum Mol Genet. 2017;26:4530–9.
Chen CH, Wang Y, Lo MT, Schork A, Fan CC, Holland D, et al. Leveraging genome characteristics to improve gene discovery for putamen subcortical brain structure. Sci Rep. 2017;7:15736.
Relton CL, Davey Smith G. Two-step epigenetic Mendelian randomization: a strategy for establishing the causal role of epigenetic processes in pathways to disease. Int J Epidemiol. 2012;41:161–76.
Bowden J, Davey, Smith G, Haycock PC, Burgess S. Consistent estimation in mendelian randomization with some invalid instruments using a weighted median estimator. Genet Epidemiol. 2016;40:304–14.
Bowden J, Davey Smith G, Burgess S. Mendelian randomization with invalid instruments: effect estimation and bias detection through Egger regression. Int J Epidemiol. 2015;44:512–25.
Verbanck M, Chen CY, Neale B, Do R. Detection of widespread horizontal pleiotropy in causal relationships inferred from Mendelian randomization between complex traits and diseases. Nat Genet. 2018;50:693–8.
Zhao Q, Wang J, Hemani G, Bowden J, Small DS. Statistical inference in two-sample summary-data Mendelian randomization using robust adjusted profile score. Ann Stat. 2020;48:1742–69.
Morrison J, Knoblauch N, Marcus JH, Stephens M, He X. Mendelian randomization accounting for correlated and uncorrelated pleiotropic effects using genome-wide summary statistics. Nat Genet. 2020;52:740–7.
Wood AR, Esko T, Yang J, Vedantam S, Pers TH, Gustafsson S, et al. Defining the role of common variation in the genomic and biological architecture of adult human height. Nat Genet. 2014;46:1173–86.
Shigemizu D, Mitsumori R, Akiyama S, Miyashita A, Morizono T, Higaki S, et al. Ethnic and trans-ethnic genome-wide association studies identify new loci influencing Japanese Alzheimer’s disease risk. Transl psychiatry. 2021;11:151.
Kunkle BW, Schmidt M, Klein H-U, Naj AC, Hamilton-Nelson KL, Larson EB, et al. Novel Alzheimer Disease Risk Loci and Pathways in African American Individuals Using the African Genome Resources Panel: A Meta-analysis. JAMA Neurol. 2021;78:102–13.
Ikeda M, Takahashi A, Kamatani Y, Okahisa Y, Kunugi H, Mori N, et al. A genome-wide association study identifies two novel susceptibility loci and trans population polygenicity associated with bipolar disorder. Mol Psychiatry. 2018;23:639–47.
Sanderson E, Richardson TG, Hemani G, Davey, Smith G. The use of negative control outcomes in Mendelian randomization to detect potential population stratification. Int J Epidemiol. 2021;50:1350–61.
Lam K, Zhang DE. RUNX/CBF Transcription Factors☆. Reference Module in Biomedical Sciences. Elsevier, 2015.
Kotb M, Geller AM. Methionine adenosyltransferase: Structure and function. Pharmacol Therap. 1993;59:125–43.
López-Otín C, Blasco MA, Partridge L, Serrano M, Kroemer G. The hallmarks of aging. Cell. 2013;153:1194–217.
Helbig I, Lopez-Hernandez T, Shor O, Galer P, Ganesan S, Pendziwiat M, et al. A recurrent missense variant in AP2M1 impairs Clathrin-mediated endocytosis and causes developmental and epileptic encephalopathy. Am J Hum Genet. 2019;104:1060–72.
Mc Cormack A, Taylor J, Te Weehi L, Love DR, George AM. A case of 17q21.31 microduplication and 7q31.33 microdeletion, associated with developmental delay, microcephaly, and mild dysmorphic features. Case Rep. Genet. 2014;2014:658570.
Arbogast T, Iacono G, Chevalier C, Afinowi NO, Houbaert X, van Eede MC, et al. Mouse models of 17q21.31 microdeletion and microduplication syndromes highlight the importance of Kansl1 for cognition. PLOS Genet. 2017;13:e1006886.
Frei O, Holland D, Smeland OB, Shadrin AA, Fan CC, Maeland S, et al. Bivariate causal mixture model quantifies polygenic overlap between complex traits beyond genetic correlation. Nat Commun. 2019;10:2417.
van Rheenen W, Peyrot WJ, Schork AJ, Lee SH, Wray NR. Genetic correlations of polygenic disease traits: from theory to practice. Nat Rev Genet. 2019;20:567–81.
Van Gestel H, Franke K, Petite J, Slaney C, Garnham J, Helmick C, et al. Brain age in bipolar disorders: Effects of lithium treatment. Aust NZ J Psychiatry. 2019;53:1179–88.
Pasco JA, Williams LJ, Jacka FN, Ng F, Henry MJ, Nicholson GC, et al. Tobacco smoking as a risk factor for major depressive disorder: population-based study. Br J Psychiatry. 2008;193:322–6.
Winterer G. Why do patients with schizophrenia smoke? Curr Opinion Psych 2010;23:112–9.
Sanders A-M, Richard G, Kolskår K, Ulrichsen KM, Kaufmann T, Alnæs D, et al. Linking objective measures of physical activity and capability with brain structure in healthy community dwelling older adults. NeuroImage: Clin. 2021;31:102767.
Brokmeier LL, Firth J, Vancampfort D, Smith L, Deenik J, Rosenbaum S, et al. Does physical activity reduce the risk of psychosis? A systematic review and meta-analysis of prospective studies. Psychiatry Res. 2020;284:112675.
Wolfers T, Doan NT, Kaufmann T, Alnaes D, Moberget T, Agartz I, et al. Mapping the heterogeneous phenotype of schizophrenia and bipolar disorder using normative models. JAMA Psychiatry. 2018;75:1146–55.
Wolfers T, Rokicki J, Alnæs D, Berthet P, Agartz I, Kia SM, et al. Replicating extensive brain structural heterogeneity in individuals with schizophrenia and bipolar disorder. Hum Brain Mapp. 2021;42:2546–55.
Taschler B, Smith SM, Nichols TE. Causal inference on neuroimaging data with Mendelian randomisation. NeuroImage. 2022;258:119385.
Eickhoff CR, Hoffstaedter F, Caspers J, Reetz K, Mathys C, Dogan I, et al. Advanced brain ageing in Parkinson’s disease is related to disease duration and individual impairment. Brain Commun. 2021;3:fcab191.
Charissé D, Erus G, Pomponio R, Gorges M, Schmidt N, Schneider C, et al. Brain age and Alzheimer’s-like atrophy are domain-specific predictors of cognitive impairment in Parkinson’s disease. Neurobiol aging. 2022;109:31–42.
Lawlor DA, Tilling K, Davey Smith G. Triangulation in aetiological epidemiology. Int J Epidemiol. 2016;45:1866–86.
Sanderson E, Glymour MM, Holmes MV, Kang H, Morrison J, Munafò MR, et al. Mendelian randomization. Nat Rev Methods Prim. 2022;2:6.
Büchlmann P, Yu B. Analyzing bagging. Ann Stat. 2002;30:927–61.
Vidal-Pineiro D, Wang Y, Krogsrud SK, Amlien IK, Baaré WFC, Bartres-Faz D, et al. Individual variations in ‘brain age’ relate to early-life factors more than to longitudinal brain change. eLife. 2021;10:e69995.
Rokicki J, Wolfers T, Nordhøy W, Tesli N, Quintana DS, Alnæs D, et al. Multimodal imaging improves brain age prediction and reveals distinct abnormalities in patients with psychiatric and neurological disorders. Hum Brain Mapp. 2021;42:1714–26.
Acknowledgements
We thank for the computational resources provided by UNINETT Sigma2-the National Infrastructure for High-Performance Computing and Data Storage in Norway – with project no. (nn9769k/ns9769k), the PGC consortium and the International Parkinson Disease Genomics Consortium (IPDGC) for sharing of their GWAS results.
Funding
This study is supported by Norwegian Research Council (No. 223273, 298646, 300767, 302854), the UiO:Life Science Convergence environment, University of Oslo, Norway (4MENT), the South-Eastern Norway Regional Health Authority (2019101), KG Jebsen Stiftelsen, and the European Research Council under the European Union’s Horizon 2020 research and Innovation program (ERC StG, Grant 802998).
Author information
Authors and Affiliations
Contributions
EL, TW, and YW designed this study. EL, DVP, OF, AAS, OI, AMG, and YW performed data analysis. EL, DVP, JMR, OI, TK, LTW, OOA, TW, BT, SMS, and YW interpreted the results. EL, DVP, JMR, OI, and YW prepared the first draft of the manuscript. All authors contributed and approved the final draft.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Leonardsen, E.H., Vidal-Piñeiro, D., Roe, J.M. et al. Genetic architecture of brain age and its causal relations with brain and mental disorders. Mol Psychiatry 28, 3111–3120 (2023). https://doi.org/10.1038/s41380-023-02087-y
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41380-023-02087-y