Edinburgh Research Explorer Identification, replication and characterization of epigenetic remodelling in the aging genome

Aging is a complex biological process regulated by multiple cellular pathways and molecular mechanisms including epigenetics. Using genome-wide DNA methylation data measured in a large collection of Scottish old individuals, we performed discovery association analysis to identify age-methylated CpGs and replicated them in two independent Danish cohorts. The double-replicated CpGs were characterized by distribution over gene regions and location in relation to CpG islands. The replicated CpGs were further characterized by involvement in biological pathways to study their functional implications in aging. We identified 67,604 age-associated CpG sites reaching genome-wide significance of FWER < 0.05, 86% demethylated with increasing age. Double-replication resulted in 5,168 CpGs (39% age-methylated and 61% age-demethylated) which were characterized by high concentration of age-methylated CpGs at 1stExon and TSS200 and a dominant pattern of age-demethylated CpGs at other gene regions, and by overwhelming age-related methylation in CpG islands and demethylation at shore/shelf and open sea. The differential distribution patterns over gene regions for methylated and demethylated CpGs both relate to reduced gene activity during aging. Pathway analysis showed that age-dependent methylations were especially involved in cellular signalling activities while demethylations particularly linked to functions of the extracellular matrix, all implicated in the aging process and age-related disease risk. aging epigenetic profiles have been detected by the epigenome-wide association studies (EWAS) using cross-sectional and longitudinal designs, respectively 1–6 . These studies have reported a considerable number of genomic sites differentially methylated over different ages, and during aging within the same individuals. Based on the significant CpG sites from EWAS, biological pathway analysis revealed important functional pathways concerning cell-cell signalling, synaptic transmission and multiple signalling pathways that overlap across studies 3, 5 . Although these were mainly obtained from the DNA methylome of whole blood corrected for cell composition, the identified pathways could reflect the generic aging-related epigenetic changes that are not tissue specific 5 . Aging is a complex biological process that involves numerous changes at various levels and in different organ systems from molecular modification to the functional regulation of systems through multiple biological mechanisms including epigenetics 7, 8 . As a reflection of this, most of the epigenetic association analyses of human aging have reported relatively large numbers of differentially regulated sites 4–6 . While the reported findings could reflect extensive involvement of epigenetic modification during the aging process, careful validation of the findings are and for materials should

Distribution of age-related CpG sites over gene region. The double-replicated 5,168 CpGs were first divided over gene regions separately for CpGs that displayed increased and decreased methylation patterns with age. The proportions of CpGs by gene region in the age-methylated or demethylated groups (columns 3 and 6 in Table 1) were then compared to the corresponding proportions for all CpGs in the whole array (column 2 in Table 1) to see if some gene regions were over-represented in each of the two groups. Figure 2a displays the proportion of CpGs by gene region in the groups of increased (red curve) and decreased (blue curve) methylation with the dashed curve representing proportion in the overall array. It is clear that the age-dependent DNA methylation is more frequent at 1stExon (18.44% vs 8.09%) but less frequent in the gene body (27.42% vs 36.09%). In contrast, the age-associated demethylation happens more frequently at the region covering −200 to −1500 nt upstream of the transcription start site (TSS) (TSS1500) (24.65% vs 17.34%) but less frequently at the region up to −200 nt upstream of TSS (TSS200) (5.73% vs 12.87%) and 1stExon (3.63% vs 8.09%). The differential patterns are supported by the very high statistical significance in Table 1.
In addition to comparing the relative proportions of CpGs by gene region, we also compared the absolute proportions of age-dependent methylation and demethylation in the double-replicated CpGs at each gene region ( Table 2). Most of the gene regions had more demethylated than methylated CpGs except for TSS200 and 1stExon. This is easily seen in Fig. 2b where most of the red curve for age-methylated CpGs is inside the blue curve for demethylated CpGs excluding TSS200 and 1stExon. The differential pattern is again extremely significant as indicated by p values in Table 2.
Distribution of age-related CpG sites in relation to CpG island (CGI). Similar to the characterization by gene region, we next grouped the age-associated CpGs by their relation to CGI. We started with comparing the relative proportions of CpGs at their locations to CGI (island, shore, shelf, non-CGI) with the corresponding proportions for all CpGs in the whole array (Table 3, Fig. 3a). As shown by Fig. 3a, the age-methylated CpGs are predominantly distributed to the island (76.63% vs 30.89%) and rarely to the non-CGI region (6.11% vs 36.38%). On the contrary, the age-demethylated CpGs are more frequently distributed over all locations except the island. Comparison of the absolute proportions of age-dependent methylation and demethylation in significant CpGs at each location is presented in Table 4 and Fig. 3b. The significant CpGs at the island are mostly age-methylated (92.39%) while the rest of locations are mainly occupied by age-associated demethylation.
After separately examining the differential methylation patterns by gene region and by relative location to CGI, we further divided the 5,168 CpGs by both gene region and CGI location and compared the proportions of age-associated methylation and demethylation at each combination to examine the methylation patterns across gene regions for a given CGI location or across CGI locations (Table 5). Table 5 shows that increased methylation     Table 3. Proportionality of locations to CGI for all CpGs of the array and in age-methylated (gain) and demethylated (loss) CpGs.    Table 5. Proportion of age-methylated CpGs in double-replicated CpGs by combination of gene region and relation to CGI.
is consistently observed in very high proportion at CGIs in all gene regions and in low proportion (thus high proportion of demethylation) at shore/shelf and non-CGI. This is visualized in Fig. 4 where it is shown that the proportion of age-methylated CpGs is mainly determined by relative location of the CpG sites to CGI with only minor variation across gene regions. Inspired by Fig. 4, we plot the proportion of age-methylated CpGs at each gene region together with proportion of CpGs located at the island from each gene region in Supplementary Figure S1. It is shown that the proportion of age-methylated CpGs by gene region is closely correlated with the proportion of CpGs residing at CGI in the region (correlation coefficient 0.995) again indicating that the differential methylation pattern across gene regions is highly related to region-specific composition of CpGs at CGI, shore/shelf or open sea.
Pathway analysis. The

Discussion
Based on a relatively large collection of samples from older people and using a double replication strategy, we have identified 5,618 CpGs that changed their white blood cell methylation profiles in the aging cohorts with majority of them demethylated over increasing age. The significantly higher proportion of demethylation (60.76%) than methylation (39.24%) (p < 2.2e-16) among the double-replicated CpGs is not surprising as global decline in DNA methylation level of the methylome is the predominant event in aging 9 . The excessive pattern of demethylation over methylation with aging has also been reported as 64% by Johansson et al. 10 using 421 individuals and as 54% by Marttila et al. 4 using 143 individuals, with the former very close to our estimate. Characterization of the double replicated CpGs by their distribution over gene regions was done first by examining the relative and absolute proportions of methylated and demethylated CpGs at each gene region. From the Tables 1 and 2 and Fig. 2, it is clear that there are excessive age-related gains in methylation at TSS200 and 1stExon and the rest of gene regions are characterized by loss of methylation. In the literature, it has been shown that increased methylation surrounding TSS is associated with gene inactivation 11 and also methylation of the 1stExon is linked to gene silencing 12 . On the other hand, demethylation in the gene body is associated with suppression of gene expression 13 . Taking together, the differential distributions of both methylated and demethylated CpGs over gene region can all be related to the reduced gene activity during aging.
Our estimates of relative proportions (Tables 1 and 3, Figs 2a and 3a) revealed similar patterns of distribution of differentially methylated sites across gene regions as described by Marttila et al. 4 . The consistent findings suggest that age-associated DNA methylations are differentially distributed over gene regions as compared with the total CpGs carried by the microarray used in the studies. The differential distribution of age-associated methylation and demethylation across gene regions (Tables 1 and 2, Fig. 2) and over locations to CpG island (Tables 3 and 4, Fig. 3) could implicate different biological functions. It is interesting to see that, our analysis revealed, indeed, different functional clusters or gene sets over-represented by genes linked to age-methylated and demethylated CpGs (see Supplementary Tables S2 and S3). In Supplementary Table S2, the top significant island (from open sea or non-CGI to shore/shelf and to CGI). It is clearly shown that methylation level at each gene region is mainly driven by age-associated methylation (gain) at CGI. canonical pathways are nearly all related to signalling pathways such as GPCR (G-protein coupled-receptors) signalling widely involved in overall physiological functions and pathological processes as well as progression of Alzheimer's disease 14 . Cell-cell signalling was also reported by Florath et al. 3 , Marttila et al. 4 and Tan et al. 5 in their top significant pathways regulated by age-associated increase in DNA methylation. Interestingly, in an early microarray study, Tan et al. 15 compared genome-wide gene expression between grandparents and grandchildren in the CEPH Utah families and reported cell-cell signalling and cell communication pathways as the top-most significant pathways enriched by differentially expressed genes dominated by decreased expression pattern with aging. Given the high concentration of age-methylated CpGs at gene promotor region that silences gene activity, the finding in this study could provide epigenetic evidence on the significant age-associated impairment in cellular signalling transduction during the aging process.
Although demethylation is found more frequently than methylation in the aging methylome, functional annotation of age-associated demethylation sites identified by Johansson et al. 10 and Marttila et al. 4 failed to show any significant enrichment for specific functional pathways. Based on the double replicated CpGs, our pathway analysis on genes linked to the significantly demethylated CpG sites identified a large number of gene-sets with the topmost ones unanimously involving genes encoding extracellular matrix (ECM) and ECM-associated proteins (see Supplementary Table S3). The ECM is a collection of molecules (sugars, proteins, etc.) that function not only as a scaffold but also as a place where signals affecting cell migration and differentiation originate. The ECM has been a central topic of discussion in aging studies 16 . It is associated with numerous age-related diseases and dysfunctions, most of them involving elasticity and strength of connective tissue, cartilage, bone, blood vessels, and skin 17 . In fact, the role of ECM goes beyond structural properties of tissue. For example, degradation of ECM in heart tissues could impact the heart's electrical conduction system, a probable contributor to the increased prevalence of arrhythmias and similar issues with advancing age 18 . In this sense, the top significant pathways from both age-methylated and demethylated CpGs overlap in regard to cell signalling. This is further evidenced by the gene set NABA_MATRISOME which appears as top-most significant gene set in both Supplementary Tables S2 and S3.
Based on genome-wide DNA methylation data measured over ages, Horvath 19 and Hannum et al. 20 19 , only 32 (9.07%) overlapped. This large difference in the overlapping rates can be due to the fact that the Horvath list was developed from relatively younger samples (mean age 43 years) while the mean age of the samples from Hannum et al. 20 was more than 20 years older which is closer to the age ranges covered in our discovery and replication samples. This evidence further shows that our current study is featured by discovery, validation and characterization of DNA methylation patterns associated with the aging process in the older population.
Finally, we want to point out that, although high overlapping rates had been found in the replication analysis, the number of the double-replicated CpGs could have been considerably reduced by the relatively small sizes of the replication samples. Some of the truly significant CpGs from the discovery stage could have failed in the replication stage purely due to the fact that the latter did not have enough statistical power to confirm their significance. Under this situation, the double-replicated CpGs could have been biased towards CpGs with large effect sizes as indicated by the red spots in Fig. 1.
In conclusion, profiling of the aging methylome in combination with a double-replication strategy identified 5,168 significantly age-associated CpGs with majority of them being demethylated with increasing age. The detected CpGs are characterized by high concentration of gain in methylation at 1stExon and TSS200 and a dominant pattern of loss of methylation at other gene regions and by overwhelming age-associated methylation in CpG island and demethylation at shore/shelf and open sea. Biological pathway analysis showed that age-related increase in DNA methylation is especially involved in cellular signalling activities while age-dependent demethylation is particularly related to functions of the extracellular matrix, both provide clues to epigenetic remodelling behind the aging process and age-related diseases.

Methods
The discovery samples. The discovery analysis was based on samples collected from the Lothian Birth The validation samples. The cross-sectional Danish twins. The cross-sectional sample for validation consisted of 150 pairs of identical twins aged 30 to 74 years 22 . The sample can be divided into two age groups, i.e. a young group aged from 30-37 years (154 samples), and an old group aged 57-74 years (146 samples) with a gap of 20 years between the two groups. DNA methylation was measured using the same array platform as in discovery samples. Both raw and processed DNA methylation data for the cross-sectional samples have been deposited to the NCBI GEO database (http://www.ncbi.nlm.nih.gov/geo) under accession number GSE61496.
The study was approved by The Regional Scientific Ethical Committees for Southern Denmark (S-20090033) and conducted in accordance with the Helsinki II declaration. Informed consent was obtained from all subjects.
The longitudinal Danish twins. The longitudinal sample for validation included of 86 elderly Danish twins (18 monozygotic or MZ pairs; 25 dizygotic or DZ pairs) collected by the Longitudinal Study of Aging Danish Twins (LSADT) initiated in 1995 5,23 . The project collected like-sex twin pairs born in Denmark for longitudinal assessment of aging-related phenotypes. The participants were born before 1923 with age ranging from 73-82 years in 1997 when first blood samples were taken. The second blood samples were drawn in 2007 after a ten-year follow-up. Genome-wide DNA methylation was measured using the same platform as in the discovery samples (i.e. Illumina Infinium Human DNA methylation 450 K beadchip). The raw and processed DNA methylation data have been deposited to the NCBI GEO database http://www.ncbi.nlm.nih.gov/geo under accession number GSE73115.
The study was approved by the Danish Scientific Ethics Committees and conducted in accordance with the Helsinki II declaration. Informed consent was obtained from all subjects.
Estimating and adjusting cell composition. It has been shown that cell composition in the whole blood can change as a result of aging 2 and it is thus critical to account for cellular heterogeneity in epigenetic studies of aging 24 . Based on the downloaded β-value for DNA methylation percentage for LBC, we estimated blood cell composition in each individual sample for 6 blood cell types: CD8T, CD4T, natural killer cell, B cell, monocyte, and granulocyte using the R package celltypes450 (https://github.com/brentp/celltypes450). The package estimates cell-type composition in 450 K data using Houseman's method 25 . Cell compositions for the Danish validation samples were estimated by applying the R package minfi (http://bioconductor.org/packages/release/bioc/ html/minfi.html) to measured signals derived from the green and red channels 26 .
Data analysis. The age-dependent DNA methylation patterns were analysed by fitting a regression model to each CpG site regressing methylation measurement on individual's age at sampling. Before fitting the model, DNA methylation percentage, i.e. the β-value, was transformed into M-value using logit transformation to ensure normal or approximately normal distribution. Besides age, the regression model also included the estimated blood cell composition for adjustment. Considering that the LBC samples contained repeated measurements over follow-ups, statistical significance of age-dependent association of DNA methylation was assessed empirically using computer permutation that randomized age at observation to generate a null distribution of random p values. In our analysis, the computer permutation was combined with correction for multiple testing by estimating family-wise error rate (FWER). To do that, for every random EWAS performed on age permuted data, a minimum p value was recorded. A null distribution of minimum p values was created based on K (we used K = 300) random EWAS. FWER for each CpG was estimated as . The analyses of validation data were done by fitting mixed effect models as described in Tan et al. 5 . All statistical analyses were performed using the free R software (https://www.r-project.org) and related packages. Pathway analysis. The identified significant CpGs (FWER<0.05) were annotated to nearest genes and evaluated for over-representation of gene-sets or pathways in the Molecular Signatures Database (MSigDB). The over-representation analysis compares a reference set of genes to a test gene-set using the hypergeometric test. The probability of finding X > k significant genes in a particular gene-set or pathway can be calculated using the hypergeometric distribution, i.e. , where N is the number of genes annotated to CpGs on the 450 K array, m is the number of genes annotated to the significant CpGs, n is the number of genes in the particular gene-set or pathway, and k is the number of genes that are both linked to significant CpGs and present in the particular gene-set or pathway. The analysis was performed using the analytical tool provided by Gene-Set Enrichment Analysis (GSEA) (http://www.broadinstitute.org/gsea/index.jsp).