The transcriptional landscape of age in human peripheral blood

Disease incidences increase with age, but the molecular characteristics of ageing that lead to increased disease susceptibility remain inadequately understood. Here we perform a whole-blood gene expression meta-analysis in 14,983 individuals of European ancestry (including replication) and identify 1,497 genes that are differentially expressed with chronological age. The age-associated genes do not harbor more age-associated CpG-methylation sites than other genes, but are instead enriched for the presence of potentially functional CpG-methylation sites in enhancer and insulator regions that associate with both chronological age and gene expression levels. We further used the gene expression profiles to calculate the ‘transcriptomic age' of an individual, and show that differences between transcriptomic age and chronological age are associated with biological features linked to ageing, such as blood pressure, cholesterol levels, fasting glucose, and body mass index. The transcriptomic prediction model adds biological relevance and complements existing epigenetic prediction models, and can be used by others to calculate transcriptomic age in external cohorts.

Lambda (λ) is the shrinkage parameter that is used to estimate effect sizes in the ridge regression (see methods). We re-ran the meta-analysis excluding the BSGS cohort. We performed the approximate ridge regression analysis with a range of lambda values, and calculated the correlation between the predictor and the actual age in BSGS (shown on y-axis). Shown on x-axis is lambda / n with n being the sample size of the meta-analysis (excluding BSGS). The maximum prediction accuracy is achieved at lambda = 6.07 * n with n being the sample size.

SUPPLEMENTARY TABLES
Supplementary Table 1. The number of age-associated genes across different ancestries.

Brisbane Systems Genetics Study (BSGS)
Individuals present in this study were recruited as part of the Brisbane Twin Nevus and cognition studies (known as BTN and MAPS respectively). As described in detail elsewhere 1 , adolescent MZ and DZ twins, their siblings, and their parents have been recruited over a 16 year period into an ongoing study of the genetic and environmental factors influencing pigmented nevi and the associated risk of developing skin cancer and cognition. The sample is of northern European origin (mainly Anglo-Celtic). A Principal Component Analysis (PCA) comparing individuals in this study to HapMap3 (International HapMap3 Consortium) and GenomEUtwin 2 populations shows the close similarity of ancestry to northern European populations. All participants gave informed consent, and the appropriate institutional review boards approved the study protocol. Whole blood for expression profiling was collected directly into PAXGene ™ tube (QIAGEN, Valencia, CA). Total RNA was extracted from PAXGene™ tubes using the WB gene RNA purification kit (QIAGEN, Valencia, CA). RNA from all samples was run on an Agilent Bioanalyzer to assess quality and to estimate RNA concentrations RNA was converted to cDNA, amplified and purified using the Ambion Illumina TotalPrep RNA Amplification Kit (Ambion).
Expression profiles were generated by hybridising 750 ng of cRNA to Illumina HumanHT-12 v4.0 Beadchip according to Illumina whole-genome gene expression direct hybridization assay guide (Illumina Inc, San Diego, USA). Briefly, 500 ng of total RNA were used to generate biotinylated cRNA, which was fragmented and hybridised to an Illumina whole genome expression chip, HumanHT-12 v4.0 for 18 h at 58°C. Beadchips were then washed and stained and subsequently scanned to obtain fluorescence intensities. Samples were scanned using an Illumina Bead Array Reader. Samples were randomized across chips and chip positions, with checks for balance across families, sex and generation. Full details of the BSGS cohort and sample preparation are given elsewhere 3,4 . The expression data is available at GEO (Gene Expression Omnibus) public repository under the accession GSE33321.
DNA methylation was measured on 614 individuals from 117 families of European descent recruited as part of BSGS 5 . Bisulfite conversions were performed in 96 well plates using the EZ-96 DNA Methylation Kit (Zymo Research, Irvine, CA, USA). Prior to conversion, DNA concentrations were determined by NanoDrop quantification (NanoDrop Technologies, Inc., Wilmington, DE, USA) and standardised to include 500ng. Three technical replicates were included in each conversion to assess repeatability. A commercial female human genomic DNA sample (Promega Corporation, Madison, WI, USA) was used on all plates, one sample from each run was duplicated on the plate and one sample duplicated from a different plate. DNA recovery after conversion was quantified using Nanodrop (Thermo Scientific, Wilmington, DE, USA).
Bisulfite converted DNA samples were hybridised to the 12 sample, Illumina HumanMethylation450 BeadChips using the Infinium HD Methylation protocol and Tecan robotics (Illumina, San Diego, CA, USA). The HM 450 BeadChip-assessed methylation status was interrogated at 485,577 CpG sites across the genome. It provides coverage of 99% of RefSeq genes. Methylation scores for each CpG site are obtained as a ratio of the intensities of fluorescent signals and are represented as β-values. Samples were randomly placed with respect to the chip they were measured on and to the position on that chip in order to avoid any confounding with family. Box-plots of the red and green intensity levels and their ratio were used to ensure that no chip position was under-or over-exposed, with any outlying samples repeated. Similarly, the proportion of probes with detection p-value less than 0.01 was examined to confirm strong binding of the sample to the array.
Methylation probes on the sex chromosomes or having been annotated as binding to multiple chromosomes were removed from the analysis, as were those with zero CpG sites. The probability of a probe within a sample either being called as missing or with a detection p-value less than 0.001 were estimated from the average rate across all probes and samples. A threshold for probes showing significant deviation from random missingness (or excess poor binding) was determined by testing against a binomial distribution for the number of samples at the 0.05 significance level with a Bonferroni correction for the number of probes. Any probe with more than 11 individuals with missing data or more than five individuals with detection p-values > 0.001 were removed. After cleaning a total of 417,069 probes remained.

Dietary, Lifestyle, and Genetic determinants of Obesity and Metabolic syndrome study (DILGOM)
The Finnish study samples included a total of 513 unrelated individuals aged 25-74 years from the Helsinki area, recruited during 2007 as part of the DILGOM study, an extension of the FINRISK 2007 study described earlier 6 . Study participants were asked to fast overnight (at least 10 hours) prior to giving a blood sample.
To obtain stabilized total RNA, we used the PAXGene Blood RNA System (PreAnalytiX GMbH, Hombrechtikon, Switzerland). It included collection of 2.5 ml peripheral blood into PAXGene Blood RNA Tubes (Becton Dickinson and Co., Franklin Lakes, NJ, USA) and total RNA extraction with PAXGene Blood RNA Kit (Qiagen GmbH, Hilden, Germany). Protocol recommended by the manufacturer was used. The integrity and quantity of the RNA samples were evaluated with the 2100 Bioanalyzer (Agilent Technologies, Santa Clara, CA, USA). Biotinylated cRNA was produced from 200 ng of total RNA with Ambion Illumina TotalPrep RNA Amplification Kit (Applied Biosystems, Foster City, CA, USA), using the protocol specified by the manufacturer. 750 ng of biotinylated cRNA were hybridized onto Illumina HumanHT-12v3 Expression BeadChips (Illumina Inc., San Diego, CA, USA), using standard protocol. After sample mix-up correction, 509 samples were included for further analysis in this cohort. The expression data is available at ArrayExpress public repository under the accession E-TABM-1036.

Estonian Gene Expression Cohort (EGCUT)
The Estonian Gene Expression Cohort is composed of 1,124 randomly selected samples (mean age 37.8 years) from the Estonian Genome Center, University of Tartu (EGCUT, www.biobank.ee). The EGCUT is the population based database which comprises currently the health, genealogical and genome data of more than 51,530 individuals 18 years of age and up reflecting closely the age distribution in the adult Estonian population. Participants of EGCUT recruited by the general practitioners (GP) from GP offices, physicians from the hospitals or data collectors from EGCUT's patient recruitment offices. Each participant filled out a Computer Assisted Personal interview during 1-2 hours at a doctor's or data collector's office, including personal data (place of birth, place(s) of living, nationality etc.), genealogical data (family history, three generations), educational and occupational history and lifestyle data (physical activity, dietary habits, smoking, alcohol consumption, quality of life). Anthropometric and physiological measurements were also taken. All diseases are defined according to the ICD10 coding. Women filled out additional questionnaire relating to the woman's health.
The collection of blood samples and the data is conducted according to the Estonian Gene Research Act and all participants have signed the broad informed consent. For gene expression profiling, whole blood samples were collected in Tempus Blood RNA Tubes (Life Technologies, NY, USA). RNA was extracted using Tempus Spin RNA Isolation Kit (Life Technologies, NY, USA) and quantified by NanoDrop 1000 Spectrophotometer (Thermo Fisher Scientific, DE, USA). RNA integrity was assessed by Agilent 2100 Bioanalyzer (Agilent Technologies, CA, USA). Samples with RIN<7.0 were excluded. Following amplification and labelling of RNA by Ambion TotalPrep RNA Amplification Kit (Life Technologies, NY, US), the whole genome gene expression levels were obtained by Illumina HT12v3 arrays (Illumina Inc, San Diego, US) according manufactures protocols. 1,086 samples passed the general quality control and were included in the subsequent analyses. The expression data is available at GEO (Gene Expression Omnibus) public repository under the accession GSE48348.
Methylation data pre-processing and quality control analyses were performed in R using the Bioconductor package minfi 7 . 'Raw' pre-processing was used to convert the intensities from the red and the green channels into methylated and unmethylated signals. Beta values were computed using Illumina's formula [beta = M/(M+U+100)]. The difference in the distribution of beta values for type I and type II probes was corrected for using the SWAN normalization method 8 . Detection p-values were obtained for every CpG probe in every sample. Failed positions were defined as signal levels lower than background from both the methylated and unmethylated channels. Samples with high proportion of failed position were discarded from further analyses.
For the purification of CD4+ and CD8+ T-cells, we obtained peripheral blood from healthy donors of the Estonian Genome Center of the University of Tartu (Tserel et al., in press). Peripheral blood mononuclear cells (PBMC) were extracted using Ficoll-Paque (GE Healthcare) gradient centrifugation. CD4 + T cells and CD8 + T cells were extracted from the PBMCs by consecutive positive separation using microbeads (CD4+ #130-045-101; CD8+ #130-045-201) and AutoMACS technology (Miltenyi Biotec) according to the manufacturer's protocol. RNA was extracted using the miRNeasy Mini Kit combined with a recommended RNase-free DNase I treatment (both from Qiagen) according to the manufacturer's protocol. The RNA was labeled and amplified using the TargetAmp Nano Labeling Kit for Illumina Expression BeadChip (Epicentre Biotechnologies) with SuperScript III Reverse Transcriptase (Life Technologies) according to the manufacturer's protocol, followed by purification with the RNeasy MinElute Cleanup Kit (Qiagen). The labelled RNA samples were hybridized to HumanHT-12 v4 Expression BeadChips (Illumina) according to the manufacturer's instructions.

Fehrmann et al. dataset (FEHRMANN)
The Fehrmann dataset consists of whole peripheral blood samples of 1,469 unrelated individuals from the United Kingdom and the Netherlands 9,10 . Some of these individuals are patients, while others are healthy controls. RNA levels were quantified using both the Illumina H8v2 platform (229 samples) and the HT12v3 platform (1,240 samples), as has been described before. The total number of samples having geneexpression data and age equals 1,191. The expression data is available at GEO (Gene Expression Omnibus) public repository under the accessions GSE20332 and GSE20142.

Framingham Heart Study (FHS)
The Framingham Heart Study (FHS) is a community-based, prospective, longitudinal study of three generations of participants. The original cohort enrolled 5,209 participants starting in 1948 and the Offspring (Second Generation) cohort enrolled 5,124 children and spouses of children of the original cohort starting in 1971 11 . From 2002 to 2005 the grandchildren of the original cohort were enrolled in the Third Generation cohort. Participants gave informed consent for research and the study is approved by the Boston University IRB. Participants received clinic visits examinations over time including detailed medical histories, standard exam measurements and had fasting blood samples collected.
Fasting whole blood samples (PAXGene Tubes-Becton Dickinson) were collected from the FHS OFFSPRING (at their 8 th examination cycle), and the FHS 3 rd GENERATION (at their 2 nd examination cycle). Details of the cohort designs and ascertainments have previously been described 39 . RNA was extracted from whole blood samples following the manufacturers' protocol (PAXGene Blood RNA kit -Assuragen). 50 ng of total RNA were amplified following the NuGen WG-Ovation Pico RNA amplification system and labeling conducted with the FL-Ovation cDNA Biotin Module V2 (NuGen, San Carlos, CA). Fragmented, biotinylated cDNA (5.5ug) was prepared for each sample and added to a hybridization cocktail and loaded onto an Affymetrix Human Exon 1.0 ST GeneChip (1.4 million probesets), and hybridized at 45 0 C and 60 rpm for 16h. The manufacturer's protocol was followed for washing and staining, and stained arrays were scanned at 532 nm in an Affymetrix GeneChip Scanner 3000 to generate CEL files for each array. The expression data is available at dbGAP public repository under the accession phs000363.v7.p8.

Genetics, osteoARthritis and Progression (GARP)
The GARP study is a study among Dutch Caucasian sib pairs with symptomatic osteoarthritis at multiple joint locations: the GARP study consists of 191 sib pairs (N=382) of white, Dutch, ancestry. All participants (age-range: 40-70 years; mean age: 60 years) are clinically and radiographically diagnosed with primary, symptomatic osteoarthritis at multiple joint sites in the hand, or in at least two joints of the following locations: hand, spine, knee, or hip 12 .
For the current study, whole genome expression profiles were analyzed from 108 participants (68 unrelated families) of the GARP study and 26 age-matched healthy controls was used. Details on the generation of expression profiles and analysis are described earlier 13 . In short, blood of participants was collected and mononuclear blood cells were separated with Histopaque® gradient prior to RNA isolation. After RNA isolation the quality of 36 random samples was analyzed. Samples had an RNA integrity number (RIN) of 8.3 or higher, and an average 28S/18S ratio of 2.2 (range: 1.8-2.7). For cDNA synthesis, amplification, biotin labeling, and hybridization with 500ng of total RNA the Ambion® TotalPrep™-96 RNA amplification kit (Life Technologies) was used according to the manufacturer's protocol. Samples were hybridized onto the microarrays (Illumina Human HT-12_v3_BeadChip's; Illumina). Quality control was performed in the R statistical programming language using the lumi-package. Total number of samples included in the current work is 134.
For the methylation analysis, genomic DNA was extracted with phenol-chloroform, and isolated DNA was treated with sodium bisulphite using the ZymoResearch EZ DNA Methylation kit. DNA methylation was assayed at over 450,000 sites on the Illumina Infinium HumanMethylation 450K BeadChips. Quality control was performed in the R statistical programming language using the minfi-and methylumipackage. Total number of samples with data on expression and methylation included in the current work is 120. The expression data is available at GEO (Gene Expression Omnibus) public repository under the accession GSE48556.

Genetic Epidemiology Network of Arteriopathy (GENOA)
The Genetic Epidemiology Network of Arteriopathy (GENOA) study consists of hypertensive sibships that were recruited for linkage and association studies in order to identify genes that influence blood pressure and its target organ damage 14 . In the initial phase of the GENOA study (Phase I: 1996(Phase I: -2001, all members of sibships containing ≥ 2 individuals with essential hypertension clinically diagnosed before age 60 were invited to participate, including both hypertensive and normotensive siblings.
The GENOA gene expression cohort is composed of 883 European-American individuals participating in the "Genetics of Microangiopathic Brain Injury" substudy, which investigates the genetic basis of alteration in brain structure detectable by magnetic resonance imaging 15 .
RNA samples from cell lines established from peripheral blood samples by Epstein-Barr virus transformation were extracted using standard protocols. RNA quality was assessed using the Agilent 2100 Bioanalyzer (Agilent Technologies Inc., Foster City, CA) and quantified by spectrophotometry using the Nanodrop ND-1000 (Nanodrop Inc., Wilmington, DE). All RNA samples used in the present study yielded both an A260/A280 absorbance ratio greater than 2.0 and a RNA Integrity Number (RIN) ≥ 8. One μg of RNA was labeled using the WT Expression labeling assay (Applied Biosystems/ Ambion, Foster City, CA) including the labeling controls from the GeneChip Eukaryotic Poly-A RNA Control Kit (Affymetrix, Santa Clara, CA). Each step of the labeling protocol was monitored using the Agilent 2100 Bioanalyzer or the Nanodrop spectrophotometer, as recommended by the manufacturer. Labeled cRNAs were hybridized onto the Affymetrix Human Exon 1.0 ST Array.
Array quality control was performed using Affymetrix Expression Console™ (v 1.1) at the transcript level using core-level probe sets. All array images passed visual inspection. Hybridization controls were all present with signal increases following concentration. Labeling control signal strengths followed the order Lys < Phe < Thr < Dap. Signal intensity plots were examined for raw and processed data to identify outliers. Only core probesets were used to assess exon-level expression. Probe sets, which are known to crosshybridize and those with undetectable expression were also excluded. Transcript-level expression was assessed by averaging all the core probe sets for that gene. QC metrics were again examined to identify possible outliers or non-performing samples using the Partek Genomic Suite software (V.6.6). These included the mean raw intensity for all probes, the distribution of RMA normalized intensities, and principal component analysis. We also verified sex assignment of the samples by examining expression levels of genes on the Y chromosome.
RNA quality control and microarray expression profiling experiments were conducted at the laboratory of Dr. Myriam Fornage at the University of Texas Health Science Center at Houston. The expression data is available at GEO (Gene Expression Omnibus) public repository under the accession GSE49531.

Grady Trauma Project (GTP)
The Grady Trauma Project (GTP) is a population-based, prospective study of demographic characteristics, trauma exposure, and prevalence of post-traumatic stress disorder and major depressive disorder in an urban, predominantly African-American population 16 . Subjects were recruited prospectively from the waiting rooms of primary care and obstetrics-gynecology clinics of Grady Memorial Hospital in Atlanta, GA. Exclusion criteria included mental retardation, active psychosis, or the inability to give informed consent. Written and verbal informed consent was obtained for all participating subjects. All procedures in this study were approved by the Institutional Review Boards of Emory University School of Medicine and Grady Memorial Hospital. Since its inception in 2005, over 5000 subjects have been interviewed for the study.
For the expression analysis, whole blood was collected between 8 -9 a.m. in Tempus RNA tubes for 398 subjects. All subjects were instructed to fast before blood collection. Whole genome expression profiles were generated for 398 subjects at the Max-Planck Institute as follows: RNA was isolated using the Versagene kit (Gentra Systems, Minneapolis, U.S.A.), quantified using the Nanophotometer and quality checks were performed on the Agilent Bioanalyzer. 250 nanograms of total RNA were reverse transcribed to cDNA, converted to cRNA and biotin-labeled using the Ambion kit (AMIL1791, Applied Biosystems), 750 nanograms of cRNA were hybridized to Illumina HT-12 v3.0 or v4.0 arrays (Illumina, San Diego, California, U.S.A) and incubated overnight for 16 hours at 55ºC. Arrays were washed, stained with Cy3 labeled streptavidin, dried and scanned on the Illumina BeadScan confocal laser scanner. 21,394 transcripts were on both the v3.0 and v4.0 arrays and were significantly expressed above background levels (detection P<.05) in at least 10% of subjects, and were thus eligible for further analysis. Of the initial 398 subjects, 359 subjects with self-reported African-American ancestry were selected for this analysis. Age of subjects at blood draw ranged from 16-78 years. The expression data is available at GEO (Gene Expression Omnibus) public repository under the accession GSE58137.
For the methylation analysis, we extracted DNA from whole blood at the Max Planck Institute in Munich using the Gentra Puregene Kit (Qiagen). Genomic DNA was then bisulfite converted using the Zymo EZ-96 DNA Methylation Kit (Zymo Research). We assessed DNA methylation at >480,000 CpG sites using Illumina HumanMethylation450 BeadChip arrays, with hybridization and processing performed according to the instructions of the manufacturer. For each CpG site and individual, we collected two data points: M (the total methylated signal), and U (the total unmethylated signal). We set to missing data points with 1) a detection p-value greater than 0.001 or 2) a combined signal less than 25% of the total median signal and less than both the median unmethylated and median methylated signal. We removed individual samples from analysis if they were outliers in a hierarchical clustering analysis or had 1) a mean total signal less than half of the median of the overall mean signal or 2000 arbitrary units, or 2) a missingness rate above 5%. Similarly, we removed from analysis CpG sites with a missingness rate above 10%. We then computed β-values for each individual at each CpG site as the total methylated signal divided by the total signal.

Heart and Vascular Health Study (HVH)
The Heart and Vascular Health (HVH) study [17][18][19] constitutes a group of population based case control studies of myocardial infarction (MI), stroke, venous thromboembolism (VTE), and atrial fibrillation 7 . Study participants were 30-79 year old members of Group Health, a large integrated health care organization in Washington State. Cases were identified from hospital discharge diagnosis codes and subsequently validated by medical record review. Cases shared a common control group that was a random sample of Group Health members, frequency-matched to MI cases on age (within decade), sex, treated hypertension, and calendar year of identification. The HVH study started in 1987 and blood specimen has been collected since 1995. Study eligibility, participant characteristics, and risk factor information were collected by medical record review and telephone interview. In addition, surviving cases and controls who agreed to participate had a blood draw.
Since 2003, whole blood has been collected in PAXGene tubes for mRNA expression studies. Participants of the current study (N=348) are those for whom expression profiling was done as part of several gene expression pilot studies conducted among HVH controls to investigate incident cardiovascular disease, hormone therapy, medications, diabetes, and atrial fibrillation. The Group Health human subjects review committee approved the study and all participants provided written informed consent.
Total RNA was extracted using PAXGene Blood RNA Kit and RNase-Free DNase Set (QIAGEN Inc., Valencia, CA) at the Fred Hutchinson Cancer Research Center, Seattle, WA. RNA integrity and quality was assessed using Agilent 2100 Bioanalyzer (Agilent Technologies Inc., Santa Clara, CA). Illumina® TotalPrep™-96 RNA Amplification Kit (Life Technologies Corp., Carlsbad, CA) was used for RNA amplification and labeling using manufacturer's instructions. Labeled cRNAs were hybridized onto Illumina HumanHT-12v3 and Illumina HumanHT-12v4 Expression Beadchip (Illumina, San Diego, CA) arrays, according to manufacturer's protocols. The images of the array chips were captured using an Illumina Beadarray scanner and scanned array images were imported into Illumina's GenomeStudio Gene Expression Module. RNA quality control and microarray expression profiling experiments were conducted at the laboratory of Dr. Jerome Rotter at the Medical Genetics Institute at Cedars-Sinai Medical Center, Los Angeles, CA. The expression data is available at GEO (Gene Expression Omnibus) public repository under the accession GSE47729.

Boston Cohort (CD4+ T cells and CD14+ monocytes)
Healthy European-American individuals were sampled from the population of Boston, Massachusetts (n=211). Blood samples from each individual underwent flow cytometric purification to independently isolate CD4+CD8-CD3+ T cells and CD14+CD16-monocytes. Resulting cDNA was profiled using Affymetrix ST1.0 HuGene arrays. The sampled individuals ranged in age from 18 to 54 years of age. The expression data is available at GEO (Gene Expression Omnibus) public repository under the accession GSE56035.

Invecchiare in Chianti, ageing in the Chianti area (InCHIANTI)
The InCHIANTI study (http://www.inchiantistudy.net) 20 is a population-based, prospective study of human ageing in the Tuscany area of Italy. 1,455 participants were enrolled at baseline (1998)(1999)(2000), with follow-up waves every 3 years. Extensive interviews, questionnaires, medical examinations, physical tests and blood samples were taken at every wave. Ethical approval was granted by the Instituto Nazionale Riposo e Cura Anziani institutional review board in Italy, and participants gave informed consent to participate.
Peripheral blood specimens were collected at wave 4 (year 9, 2008⁄9) from 712 individuals, using the PAXGene technology to preserve levels of mRNA transcripts as they were at the point of collection 21 . RNA was extracted from peripheral blood samples using the PAXGene Blood mRNA kit (Qiagen, Crawley, UK) according to the manufacturer's instructions. RNA was biotinylated and amplified using the Illumina® TotalPrep 22 -96 RNA Amplification Kit and directly hybridized with HumanHT-12_v3 Expression BeadChips that include 48,803 probes. Image data was collected on an Illumina iScan and analyzed using the Illumina and Beadstudio software (Illumina, San Diego, California, USA) as previously described 23 . All microarray experiments and analyses complied with MIAME guidelines. The total number of InCHIANTI samples with good quality whole-genome expression data equals 698, 695 of which also have cell-count data.
CpG methylation data was generated for a subset of the samples seen during follow-up wave 3 (the 'gene expression' wave) using the Illumina Infinium HumanMethylation450 BeadChip. Briefly, genomic DNA was bisulfite converted using Zymo EZ-96 DNA Methylation Kit, followed by CpG analysis using the aforementioned Illumina 450k array. Quality control of the samples included exclusion based on sexdiscrepancy and call-rate thresholds. Normalization of the arrays was performed using the 'wateRmelon' 24 R package (specifically the DASEN method), which includes quantile normalization between probe-types and arrays. Samples having 5% of sites with a detection P-value>0.01 were removed. After exclusions, 506 samples had robust data available for analysis 25 . The expression data is available at GEO (Gene Expression Omnibus) public repository under the accession GSE48152.

KOoperative gesundheitsforschung in der Region Augsburg (KORA)
KORA F4 (Cooperative Heath Research in the Region of Augsburg) is a population-based survey in the region of Augsburg in Southern Germany. KORA exists since 1996 in the region of Augsburg in the southwest of Germany, and it is a regional research platform for population-based surveys and follow-up studies. Four cross-sectional health surveys have been performed in five-year intervals, each containing independent random samples of individuals with German nationality resident in Augsburg city or one of sixteen communities from the adjacent counties. 3,080 samples were collected for KORA F4 between 2006 and 2008 26 . The study followed the recommendations of the Declaration of Helsinki and was approved by the local ethical committees.
For the expression analysis, the subset from the survey F4 (2004)(2005)(2006) of 1002 elderly individuals aged 61 to 82 years was used 27 . Gene expression profiling was performed using the Illumina HumanHT12v3 BeadChip as described elsewhere 28 . RNA was isolated from whole blood under fasting conditions using PAXGene Blood miRNA Kit (Qiagen, Hilden, Germany). Purity and integrity of the RNA was analysed using the Agilent Bioanalyzer with the 6000 Nano LabChip reagent set (Agilent Technologies, Germany). Samples with low quality were excluded after manually inspection. Using the Illumina TotalPrep-96 RNA Amp Kit (Ambion), 500ng of RNA was reverse transcribed into cRNA, and biotin-UTP-labelled. 3000ng of cRNA was hybridized to the Illumina HumanHT12v3 Expression BeadChips, followed by washing steps as described in the Illumina protocol. After quality control, 993 samples were available for analysis. The expression data is available at ArrayExpress public repository under the accession E-MTAB-1708.
Genome-wide DNA methylation patterns were analyzed using the Infinium HumanMethylation450 BeadChip array (Illumina) as described elsewhere 29 . The percentage of methylation at a given cytosine is reported as a β-value, which is a continuous variable between 0 and 1, corresponding to the ratio of the methylated signal over the sum of the methylated and unmethylated signals. The M-value is calculated as the log2 ratio of the intensities of methylated probe vs. unmethylated probe 30 .
Raw methylation data were extracted with Illumina Genome Studio (version 2010.3) with methylation module (version 1.8.5) and preprocessed using R (version 3.0.1). For data pre-processing of the Infinium Human Methylation 450K BeadChip we used the pipeline described earlier with default parameter settings to avoid bias in the analysis since the assay combines two different chemistries 31 . In brief, CpG sites in close proximity (50bp) to common SNPs and on allosomal positions were removed and color bias adjustment based on a smooth quantile normalization method as well as background level correction based on negative-control probes was performed for each chip using the R lumi package 32 . Finally, the pipeline performs subset quantile normalization, including only probes with detection p-values < 0.01, in order to correct for the InfI/InfII shift and normalize between samples. Therefore, the target CpGs were separated into different categories using CpG island annotation. For analysis we considered only beta-values associated with detection p-value < 0.01. Three samples were excluded due to chip wise detection rate < 80%.

Multi-Ethnic Study of Atherosclerosis (MESA)
The MESA study was designed to investigate the prevalence, correlates, and progression of subclinical cardiovascular disease in a population cohort of 6,814 participants. Since its inception in 2000 there have been five clinic visits during which extensive clinical, socio-demographic, lifestyle and behavior, laboratory, nutrition, and medication data were collected 33  Centralized training of technicians, standardized protocols, and numerous quality control (QC) measures were implemented for collection, on-site processing, and shipment of MESA specimens and routine calibration of equipment. Blood was initially collected in sodium heparin-containing Vacutainer CPTTM cell separation tubes (Becton Dickinson, Rutherford, NJ) to separate peripheral blood mononuclear cells from other elements within 2 hours from blood draw. Subsequently, monocytes were isolated with the anti-CD14 coated magnetic beads, respectively, using AutoMACs automated magnetic separation unit (Miltenyi Biotec, Bergisch Gladbach, Germany). Based on flow cytometry analysis of 18 specimens, monocyte samples were consistently greater than 90% pure.
DNA and RNA were isolated from samples simultaneously using the AllPrep DNA/RNA Mini Kit (Qiagen, Inc., Hilden, Germany). QC metrics were examined on the DNA and RNA samples, including optical density 22 measurements, using a NanoDrop spectrophotometer and evaluation of the integrity of the 18s and 28s ribosomal RNA. RNA QC testing was performed using the Agilent 2100 Bioanalyzer with RNA 6000 Nano chips (Agilent Techonology, Inc., Santa Clara, CA) according to the manufacturer's instructions. RNA with RIN (RNA Integrity) scores > 9.0 was used for global gene expression microarrays.
The Illumina HumanHT-12 v4 Expression BeadChip and the Illumina Bead Array Reader were used to perform the genome-wide expression analysis, following the Illumina expression protocol. The Illumina TotalPrep-96 RNA Amplification Kit (Ambion/Applied Biosystems, Darmstadt, Germany) was used for reverse transcription, and amplification with 500 ng of input total RNA (at 11ul). 700 ng of biotinylated cRNA was hybridized to a BeadChip at 58°C for 16-17 hours. The expression data is available at GEO (Gene Expression Omnibus) public repository under the accession GSE56045.

North American Brain Expression Consortium (NABEC) and United Kingdom Brain Expression Consortium (UKBEC)
Frozen frontal cortex (FCTX) and cerebellum (CRBL) samples were obtained from > 399 selfreported European ancestry samples without determinable neuropathological evidence of disease 23,[34][35][36] . mRNA expression levels were assayed using Illumina HumanHT-12 v3 Expression Beadchips. In brief, individual probes were excluded from analyses if the p-value for detection was > 0.01 or there was less than 95% completeness of data per probe, and samples were excluded if <95% of probes were detected. Probes were also removed if an analyzed SNP was within the probe or the probe mapped ambiguously to multiple locations in the genome.
Expression data was quantile normalized and log2-transformed prior to analyses. Analyses were carried out separately for each brain region. The total number of NABEC and UKBEC samples with good quality whole-genome expression data was 394. The expression data is available at GEO (Gene Expression Omnibus) public repository under the accession GSE36192.
For the methylation analysis, genomic DNA was extracted with phenol-chloroform. Bisulfite converted DNA and assayed at > 27,000 sites on the Illumina Infinium HumanMethylation 27K BeadChips.

NIDDK-Phoenix Study (NIDDK/PHOENIX)
Subjects for the NIDDK-Phoenix study were selected from among participants in the Phoenix extension of the Family Investigation of Nephropathy and Diabetes 37 . In this study, American Indians from urban Phoenix, Arizona, are invited to have a screening examination for diabetes and nephropathy; beginning in 2008 a PAXGene RNA tube (BD) was collected at each examination. Individuals for the present study were selected among 1961 participants who had been fasting for ≥8 hours, who had been examined between 06:30 and 12:30 hours, and who did not have end-stage renal disease. The self-reported heritage of each individual was ≥½ American Indian, and all participants were ≥ 18 years old. Information on family membership was collected, and participants with diabetes or kidney disease were encouraged to refer family members. A total of 1461 individuals were selected for expression studies: 768 from pedigrees with ≥2 individuals available, and an additional 693, selected at random from the remaining potentially eligible individuals. Informed consent was obtained from each participant, and the Institutional Review Board of the NIDDK approved the study.
Total RNA was isolated using PAXGene Blood miRNA kits (Qiagen) according to the manufacturer's instructions. Samples were amplified and labelled (Ambion Message Ii Biotin-Enhanced aRNA amplification kit), and hybridized to the Illumina HumanHT-12 v4 Expression Beadchips as described by the manufacturer's protocol. Samples were scanned on the Illumina BeadArray 500GX Reader and the Illumina GenomeStudio software was used to perform a standard background normalization prior to exporting the data for statistical analysis. Four samples (2 from the family sample and 2 from the random sample) showed little evidence of overall expression above background levels and were excluded from analyses. Expression studies for the NIDDK-Phoenix samples were performed at the Department of Genetics Laboratories at Texas Biomedical Research Institute (San Antonio, Texas). The total number of NIDDK-Phoenix samples with good quality whole-genome expression data was 1457. Data are not deposited in public databases, but may be made available to potential collaborators, subject to IRB approval.

PBMC-MS
The PBMC-MS subjects were selected from the Comprehensive Longitudinal Investigation of Multiple Sclerosis at the Brigham and Women's Hospital (CLMB), which has been described earlier 38 . PBMCs were collected between 2002 and 2007, and they were frozen and stored in liquid nitrogen. Total RNA was isolated from RLT lysate using the RNeasy Mini kit (Qiagen) according to the manufacturer's instructions. After quality control (RIN >7), total RNA was amplified and labelled using the IVT RNA Amplification and Labeling system (Affymetrix, Santa Clara, CA). The dataset was preprocessed with SimpleAffy and quantile normalized following the GeneChip robust multiarray average (GCRMA) procedure in Bioconductor 2.8 (www.bioconductor.org). The limma R package was used for differential expression analysis 39 . Normalized data files and the .gct gene pattern compatible files were used in subsequent analyses. The expression data is available at GEO (Gene Expression Omnibus) public repository under the accession GSE16214.

44
The Rotterdam Study (RS) (www.epib.nl/rotterdamstudy) is a prospective, population-based cohort study in the district of Rotterdam, the Netherlands, and has been described in detail 40 . The initial design of the study is straight-forward: a prospective cohort study among 7,983 persons living in the well-defined Ommoord district in the city of Rotterdam (78% of 10,215 invitees), called Rotterdam Study I (or RS-I). They were all 55 years of age or older and the oldest participant at the start was 106 years. The study started in the second half of 1989. In 1999, 3,011 participants (out of 4,472 invitees) who had become 55 years of age or moved into the study district since the start of the study were added to the cohort, called Rotterdam Study II (or RS-II). In 2006, a further extension of the cohort was initiated in which 3,932 subjects were included, aged 45 years and older (out of 6,057 invited), called Rotterdam Study III (RS-III).
The participants were all examined in some detail at baseline. They were interviewed at home and then had an extensive set of examinations in a specially built research facility in the centre of their district. These examinations were repeated every 3-4 years in characteristics that could change over time. The participants in the Rotterdam Study are followed for a variety of diseases that are frequent in the elderly. Informed consent was obtained from each participant, and the medical ethics committee of the Erasmus Medical Center Rotterdam approved the study.
For the expression analyses, the RS-III cohort was used: whole-blood was collected (PAXGene Tubes -Becton Dickinson) and total RNA was isolated (PAXGene Blood RNA kits -Qiagen). To ensure a constant high quality of the RNA preparations, all RNA samples were analysed using the Labchip GX (Calliper) according to the manufacturer's instructions. Samples with an RNA Quality Score > 7 were amplified and labelled (Ambion TotalPrep RNA), and hybridized to the Illumina HumanHT12v4 Expression Beadchips as described by the manufacturer's protocol. Processing of the Rotterdam Study RNA samples was performed at the Genetic Laboratory of Internal Medicine, Erasmus University Medical Center Rotterdam. The RS-III expression dataset is available at GEO (Gene Expression Omnibus) public repository under the accession GSE33828: 881 samples are available for analysis.
For the methylation analysis, genomic DNA was extracted from peripheral venous blood using the salting-out method 41 . DNA methylation was assayed at over 450,000 sites on the Illumina Infinium HumanMethylation 450K BeadChips. Quality control was performed in the R statistical programming language using the minfi-and methylumi-package 7,42 . Total number of samples having both gene expression and methylation data is 726.

San Antonio Family Heart Study (SAFHS)
The longitudinal San Antonio Family Heart Study (SAFHS, funded by the long-running NHLBI program project 5P01HL04552, PI Blangero) started in 1991 and was designed to primarily investigate the genetics of cardiovascular disease and its risk factors in Mexican Americans. The SAFHS included 1,431 individuals in 42 extended families at baseline. Ascertainment occurred by way of random selection of an adult Mexican American proband, without regard to presence or absence of disease and almost exclusively from Mexican American census tracts in San Antonio. These individuals have been followed in a mixed longitudinal fashion, with the average participant having been seen ~2.5 times (ranging from 1 to 4 times). A wide range of phenotypic information is available. Blood samples were obtained in the morning after an overnight fast, and mRNA was isolated in lymphocytes only.
Expression profiles were generated for 1,371 lymphocyte samples (from the first clinic visit of study participants, and subsequently stored at -80C w/o interruption before processing) in total, including various controls and duplicates, using Illumina's WG6v1 microarrays (47,293 probes in total). This is an early generation Illumina platform, which did not include all 2,238 age-associated genes for replication. We identified "good" expression data samples based on the number of detected probes (at detection p-values <= 0.05), mean average raw signal across all transcripts, and mean correlation against all other samples across all probes. Samples falling within the bottom 5% in each of these 3 quality measures were deleted, as were controls. Among duplicate samples, we kept the one giving the "best" quality results based on these measures. At the end of the day, this resulted in 1,244 samples with high quality expression data 43 . The gene expression data was then processed by background noise subtraction, log2 transformation, and quantile normalization.
We used SOLAR 44 to analyze what transcripts were associated with age. A random effects "polygenic" model, based on the kinship matrix of study participants, was used to model the genetic nonindependence of family members based on their expected average autosomal sharing based on stated (and also statistically verified) pedigree relationships, and age (at time of blood draw) was included as an additional covariate in the analysis. The expression data is available at the ArrayExpress public repository under the accession E-TABM-305.

Study of Health In Pomerania (SHIP-TREND)
SHIP (Study of Health in Pomerania) is a longitudinal population-based project consisting of two independent cohorts, SHIP and SHIP-TREND, assessing the prevalence and incidence of common, population relevant diseases and their risk factors. Baseline examinations for SHIP-TREND were conducted between 2008 and 2012. The study region of SHIP is West Pomerania, a region in the north-east of Germany. Stratification variables for sampling from population registries were age and sex. Study design and sampling methods were previously described 45,46 . A total number of 4420 Caucasian adults aged 20 to 79 years (response 50.1%) participated in the baseline SHIP-TREND. The medical ethics committee of the University of Greifswald approved the study protocol, and oral and written informed consents were obtained from all study participants.
For this analysis, a subset of the SHIP-TREND cohort with available phenotype and gene expression data (n=991) was used: RNA was prepared from whole blood under fasting conditions in PAXGene tubes (BD) using the PAXGene Blood miRNA Kit (Qiagen, Hilden, Germany): this was done on a QIAcube according to protocols provided by the manufacturer (Qiagen). To ensure a constant high quality of the RNA preparations, all RNA samples were analysed using RNA 6000 Nano LabChips (Agilent Technologies) on a 2100 Bioanalyzer (Agilent Technologies) according to the manufacturer's instructions. Using the Illumina TotalPrep-96 RNA Amp Kit (Ambion), 500ng of RNA was reverse transcribed into cRNA, and biotin-UTP-labelled. 3000ng of cRNA were hybridized to the Illumina Whole-Genome Human HT12v3 Expression BeadChips, followed by washing steps as described in the Illumina protocol. Processing of the SHIP-TREND RNA samples was performed at the Helmholtz Zentrum München. The expression data is available at GEO (Gene Expression Omnibus) public repository under the accession GSE36382.