Genetic architecture of epigenetic and neuronal ageing rates in human brain regions

Identifying genes regulating the pace of epigenetic ageing represents a new frontier in genome-wide association studies (GWASs). Here using 1,796 brain samples from 1,163 individuals, we carry out a GWAS of two DNA methylation-based biomarkers of brain age: the epigenetic ageing rate and estimated proportion of neurons. Locus 17q11.2 is significantly associated (P=4.5 × 10−9) with the ageing rate across five brain regions and harbours a cis-expression quantitative trait locus for EFCAB5 (P=3.4 × 10−20). Locus 1p36.12 is significantly associated (P=2.2 × 10−8) with epigenetic ageing of the prefrontal cortex, independent of the proportion of neurons. Our GWAS of the proportion of neurons identified two genome-wide significant loci (10q26 and 12p13.31) and resulted in a gene set that overlaps significantly with sets found by GWAS of age-related macular degeneration (P=1.4 × 10−12), ulcerative colitis (P<1.0 × 10−20), type 2 diabetes (P=2.8 × 10−13), hip/waist circumference in men (P=1.1 × 10−9), schizophrenia (P=1.6 × 10−9), cognitive decline (P=5.3 × 10−4) and Parkinson's disease (P=8.6 × 10−3).


Supplementary
Chr=chromosome; Corr.= Correlation with respect to minor allele. Table 5: Evaluating whether published SNPs that affect epigenetic age acceleration in the cerebellum also have a significant effect on age acceleration in independent data sets. The table presents the meta-analysis association results for the two SNPs identified in our earlier GWAS of cerebellar age acceleration 3 . The first macro column "CRBLM" lists the association results from our previous study. The 2 nd to 5 th macro column list the results from our current study using the following brain regions: FCTX, PFCTX, PONS, and TCTX. The last macro column "ALL" lists the results using the tissues from all the test brain regions.  Table 6: Sample size distribution in the GTEx brain atlas used for brain cis-eQTL analysis. The table presents tissue counts stratified by 12 brain regions released from the data summary in the Genotype-Tissue Expression (GTEx) project 4 . In the latest released version (V6), there were 1,007 brain samples from up to 449 individuals.  Table 7: Cis-expression QTL analysis of SNPs associated with epigenetic age acceleration in human brain. We conducted cis-eQTL analysis based on three categories of brain expression data, as listed in Methods in the main article. We present the results using the expression data from category one (individual level data)

Supplementary
in which we identified genes exhibiting cis-effects, referred as to step (1) in Methods. Category one involved the gene expression data from our study sets 2, 3, 5, 6 and 7 that allowed us to correlate our leading SNPs on susceptibility loci with brain gene expression levels of adjacent genes (i.e. genes that are located within +/-1Mb of the test SNPs).
For each study and brain region, we examined cis-effects and highlighted the genes surpassing the Bonferroni corrected significance level . The highlighted genes (colored in red) were subsequently analyzed using the brain expression data from category 2 (GTEx) and category 3 (the UK database). As a result, there are four cis-genes identified in 17q11.2. Each row corresponds to a study/brain region. The columns report the biweight midcorrelation "Bicor", standard error "SE" and the corresponding nominal p-value "P". The column "N" lists the number of (SNP, gene expression) pairs tested. The significance level (corrected for the number N of multiple comparisons) of each brain region can be found in the last column " ". The P values are colored in bold red if they were smaller than .
. The table presents a total of 18 significant cis-eQTL analysis results for chromosome 17 rs2054847 reported from GTEx V6. In the V6 version, GTEx reports cis-eQTL results that were significant after multiple corrections. For each gene, a permutation test was used to determine a threshold that corrects for multiple comparisons across genes and tissue types. Nominal P values are reported in the colum "P". The 18 cis-eQTL results (rows) correspond to 10 distinct gene symbols (and ensemble gene identifiers) in 10 human tissue types.

Our study
GTEx UK --denoted not available. * There was no eQTL result with P less than the defined threshold 1.57E-03. Thus, the threshold was set at 0.01 in order to perform the SMR and HEIDEI tests. a the analysis result from our study was performed on a combined sample studies 6 & 7; b the analysis result from our study was performed on study 3; c the analysis result from our study was performed on study 5; d the analysis result from our study was performed on study 2; e denote the expression averaged on all the 10 brain regions in the UK database.

43
Supplementary  (2) prefrontal cortex (PFCTX). The analysis is based on the P values from the meta-analysis GWAS results.
The cutoffs of gene set enrichment analysis (GSEA) in the MAGENTA algorithm were set at 95 th and 75 th percentiles. We listed the results corresponding to the most significant results (at a FDR < 0.25 in either cutoffs) from the MAGENTA analysis.   discovery rate and the proportion of trait related genes that also relate to brain age acceleration respectively, stratified by the overlap results with ALL and PFCTX.

Supplementary Table 12: SNPs significantly associated with proportion of neurons in PFCTX.
We present the loci with SNP associations at 5.0x10 -8 and report the most significant SNP within each locus. Fixed effects meta-analysis was used to estimate the correlation coefficient and standard error ("Corr. (SE)") between the minor allele and the proportion of neurons in PFCTX. The corresponding meta-analysis P values can be found in the column "Meta P". The brain regions in prefrontal cortex (PFCTX) include dorsolateral prefrontal cortex.   We retrieved the HRS longitudinal data during 1996 to 2010 (Waves 3 to 10). The characters of study individuals are listed including the distributions of three cognitive functioning measurements used for GWAS analysis. In particular, the cognitive slope is a measure that assesses the change in cognitive age given the change in chronological age over the fourteen years (1996)(1997)(1998)(1999)(2000)(2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010)

Supplementary Table 18: Genomic inflation estimates for cognitive traits in the HRS.
Genomic inflation estimates for cognitive functioning traits in different ethnic strata of the Health and Retirement Study. Column 2 reports the group of individuals with "Admixed" denoting all samples combined irrespective of race/ethnicity. The GWAS was conducted using the number of SNPs (column "No. SNPs" with common variants (MAF ≥ 5%)). To protect against spurious association results, the analysis was adjusted for principal components (PCs), gender, and education (college or above).

Supplementary Note 1: Description of datasets
Individuals of European ancestry from each study were included in our GWA meta-analysis.
Genetic ancestry was identified in PLINK or EIGENSTRAT 6 for each study, respectively. Here we describe resources of each study and disease status of study individuals.
Study 1: These brain tissues are part of the study samples used in a study for Alzheimer's disease 7 , archived in the MRC London Brain bank for Neurodegenerative Disease. We obtained genotyping and DNA methylation data for 63 individuals, including 38 diagnosed with Alzheimer's disease.
DNA methylation data are available for free public download (Table 1). Individuals with missing age acceleration estimates were removed from the GWAS, yielding 63 subjects remained for analysis.

Study 2:
All of the 148 individuals were neurologically normal 8 . The SNP data was archived in dbGAP, http://www.ncbi.nlm.nih.gov/gap, with accession: phs000249.v1.p1. The GWAS was conducted in 142 individuals with age acceleration estimates for brain tissues. Gene expression and DNA methylation data are available for free public download (Table 1).

Age-related macular degeneration (AMD)
A large-scale GWAS meta-analysis was performed in the study including >17,100 advanced AMD cases and >60,000 controls of European and Asian ancestry in the analysis 16

Alzheimer's disease
The IGAP consortium performed a GWAS meta-analysis on 74,046 individuals of European ancestry 17 . We downloaded the summary results of GWAS from http://www.pasteur-lille.fr/en/recherche/u744/igap/igap_download.php. Two sets of association results are available.
The first set includes the GWAS results of meta-analysis based upon 17,008 Alzheimer's disease cases and 37,154 controls at stage 1 analysis. A total of 11,632 SNPs exhibited moderate evidence of association (P < 1.0x10 -3 ) at stage 1. The second set includes the P values of the 11,632 SNPs from the final meta-analysis that combined stages 1 &2 results. We used both association results for our overlap analysis.

DIAGRAM Type 2 diabetes
The DIAbetes Genetics Replication and Meta-analysis (DIAGRAM) consortium performed a GWAS meta-analysis for type 2 diabetes (T2D) in 18

GIANT BMI and height
The GIANT consortium performed a GWAS meta-analysis for BMI and height traits on ~170,000 individuals of Europe ancestry 19 . The majority individuals (N=133,454) were used for discovery phase and the others were used for replication. GWAS results are for only available for discovery phase that can be downloaded from, https://www.broadinstitute.org/collaboration/giant/index.php/GIANT_consortium_data_files.

IIBDGC Inflammatory bowel disease
The International Inflammatory Bowel Disease Genetics Consortium (IIBDGC) performed a large-scale GWAS meta-analysis for studying inflammatory bowel disease (IBD) on 86,640 European individuals at discovery phase and 9,846 individuals of East Asian, Indian or Iranian decent at replication phase 20 . Three traits were analyzed including two forms of IBD: ulcerative colitis and Crohn's disease, as well as IBD in either form. As the results for replication analysis performed by Bayesian trans-ancestry meta-analysis that reports Bays factor rather than P values, we only conducted the overlap analysis using the GWAS results at discovery phase for the three IBD forms, respectively. At discovery phase, 34652 individuals were available for genome-wide genotype data and the other 51988 individuals were only available for Immnuochip genotype data.
The GWAS results using the genotypes of 34,652 individuals can be downloaded from http://www.ibdgenetics.org/.

The Health and Retirement Study (HRS)
The Health and Retirement Study (1992-2012) is a longitudinal panel study of a representative sample of Americans over age 50 (and their spouses), collected every two years (Supplementary

GWAS in the HRS
We performed association analysis on genotyped and imputed SNPs with common variants.
Genotyping data were performed on Illumina's Human Omni2.5-Quad (Omni2.5) platform and imputed data were computed with IMPUTE2. Quality control of SNPs was guided by HWE P > 1.0 x10 -6 along with info measure > 0.4 for imputed markers and thresholded genotypes set at 0.9 for imputed genotypes. In addition, we required minimum number of samples at 200 per marker.
Since the HRS study is comprised of different racial/ethnic groups, we either restricted the GWAS analysis to a given ethnic group or used principal components (from an identity by state analysis) in multivariate regression models. Post association analysis, we pruned out the SNPs associated with large effect sizes guided by odds ratios ( > 3 or < 1/3). More details for model framework and assessments for GWAS results can be found in Supplementary Table 9.

Cognitive slope for measuring cognitive decline
The cognitive slope defines the change in cognitive age given the change in chronological age over fourteen years (1996)(1997)(1998)(1999)(2000)(2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010) Where, kj and qj are the slope and intercept, respectively, for the regression of chronological age and each cognitive measure, xji is the value of cognitive measure j for participant i, sj is the root mean squared error of chronological age regressed on the j th cognitive measure, and CAi is chronological age for participant i. Additionally 2 , the variance of the random variable, takes into account the variability in the first half of the equation, the mean variance of the cognitive measures that is explained by chronological age, and the range of chronological age. Overall, the mean cognitive age of a population should equal the mean chronological age of the population 21 .

Dementia assessment at each wave
The HRS participants with ages 70 and over (n=6,412) were classified into normal, CIND, and dementia groups. Predicted dementia status also relied on the four cognitive functioning variables that were used to estimate cognitive aging-delayed recall, immediate recall, serial 7s, and backwards counting. However, dementia status also took into account proxy responses for participants who were unable to complete the cognitive battery. For those who were able to respond, scores across the four variables were summed. Participants with total score ranging between 12 and 27 were categorized as having normal cognitive functioning; those with scores between 7 and 11 were categorized as having CIND, and those with scores of six or less were categorized as having dementia.
Participants whose status relied on proxy respondents were categorized in accordance with the method proposed by Langa et al. 23 . For these participants, cognitive status was based on the sum of scores from three measures taken from memory assessments by a proxy (0=excellent, 1=very good, 2=good, 3=fair, 4=poor), the participant's total number of IADL limitations, and the interviewer's assessment of whether the participant had difficulty completing the cognitive battery due to cognitive limitations (0-2 indicating, no limitation, some limitation, and limitation prevents completion, respectively). After these three measures were summed, participants with total scores between 0 and 2 were categorized as having normal cognitive functioning, those with scores between 3 and 5 were categorized as having CIND, and those with scores of 6 or more were categorized as demented.

Age at Huntington's Disease (HD) motor onset
The GWAS meta-analysis was performed for 4,082 HD subjects collected from the Massachusetts HD Center Without Wall (MAHDC) and the Genetics Modifies of HD (GeM-HD) 24 . All the HD subjects are with European ancestry. Age at onsite motor was adjusted for the influence of HTT CAG repeat then the residual values were used as phenotypes for quantitative association analysis.
We downloaded the summary data from the Genetic Modifies of Motor Onset Age (GeM-MOA, http://chgr.partners.org/cgi-bin/gem.moa/gem.moa.py), with access granted by the GeM-MOA consortium.

Longevity study
The GWAS meta-analyses study was performed for 98,066 individuals of European ancestry 25 , including discovery, replication and joint analyses. The summary results at discovery phase analysis can be downloaded from http://hmg.oxfordjournals.org/content/23/16/4420/suppl/DC1.
The results include two types of association P values with respect to (1) individuals with longevity > 85 versus < 65 and (2) individuals with longevity > 90 versus < 65. We only report the results for comparison (2), i.e. longevity >90 versus <65. The results for the first comparison were similar.

MAGIC Glycemic measure
The Meta-Analyses of Glucose and Insulin-related traits (MAGIC) consortium performed a GWAS meta-analysis on (a) 58,074 non-diabetic individuals for fasting glucose trait and (b) 51,750 non-diabetic individuals for fasting insulin trait, respectively. All study individuals are of European descent. BMI variation was accounted for GWAS in both traits. Data on glycemic traits have been contributed by MAGIC investigators and have been downloaded from www.magicinvestigators.org and the relevant article can be found in 26 .

PGC bipolar disorder
The PGC performed a combined GWAS meta-analysis for studying bipolar disorder 29 . The study comprised 7,481 cases and 9,250 controls. Ancestry of individuals identified by multi-dimensional scaling (MDS) analysis was used to identify ancestry of study individuals and was used to correct population stratification in association analysis. The GWAS results can be downloaded from http://www.med.unc.edu/pgc/downloads.

PGC major depression disorder
The PGC performed a combined GWAS meta-analysis for studying major depression disorder 30 .
Association analysis was performed on discovery phased (9,240 cases and 9,519 controls of European ancestry), replication phase (6,783 cases and 50,695 controls), and mega-analysis (9,238 major depression disorder cases/8,039 controls, and 6,998 bipolar disorder cases and 7775 controls) for cross-disorder trait, the last two only involved a small number of SNPs (m ≤ 819).
Only the GWAS results for discovery phase are available for download from http://www.med.unc.edu/pgc/downloads.

PGC Schizophrenia
The PGC performed a multi-stage large-scale GWAS meta-analysis for studying schizophrenia up