Integrative analyses of reference epigenomes reveal complex context-specific relationships between chromatin state, accessibility, DNA methylation and gene expression
Distinct chromatin states exhibit different distributions of chromatin accessibility, DNA methylation and gene expression
Integrative analysis of 111 reference human epigenomes.
Roadmap Epigenomics Consortium et al.Nature 10.1038/nature14248
We used chromatin states to study the relationship between histone modification patterns, RNA expression levels, DNA methylation, and DNA accessibility. Consistent with previous studies 19,23, 43, 44, we found low DNA methylation and high accessibility in promoter states, high DNA methylation and low accessibility in transcribed states, and intermediate DNA methylation and accessibility in enhancer states (Fig. 4d-e, Extended Data 3a,b). These differences in methylation level were stronger for higher-expression genes than for lower-expression genes, leading to a more pronounced DNA methylation profile (Extended Data 3c, Fig. S5, Table S4f). Genes proximal to H3K27ac-marked enhancers show significantly higher expression levels (Extended Data 3d), and conversely, higher-expression genes were significantly more likely to neighbor H3K27ac-containing enhancers (Extended Data 3e).
Chromatin states sometimes captured differences in RNA expression that are missed by DNA methylation or accessibility. For example, TxFlnk, Enh, TssBiv, and BivFlnk states show similar distributions of DNA accessibility but widely differing enrichments for expressed genes (Fig. 4c,d). Enh and ReprPC states show intermediate DNA methylation, but very different distributions of DNA accessibility and different enrichments for expressed genes (Fig. 4c-e). Lack of DNA methylation, typically associated with de-repression, is associated with both the active TssA promoter state and the bivalent TssBiv and BivFlnk states. Bivalent states TssBiv and BivFlnk also showed overall lower DNA methylation and higher DNA accessibility than enhancer states Enh and EnhG and binding by both activating and repressive regulatory factors (Extended Data 2b). These results also held for alternate methylation measurement platforms (Extended Data 4a-c), and for the 18-state chromatin state model (Extended Data 4d-e). Overall, these results highlight the complex relationship between DNA methylation, DNA accessibility, and RNA transcription and the value of interpreting DNA methylation and DNA accessibility in the context of integrated chromatin states that better distinguish active and repressed regions.
Given the intermediate methylation levels of tissue-specific enhancer regions, we directly annotated intermediate methylation (IM) regions, based on 25 complementary DNA methylation assays of MeDIP 31, 45 and MRE-Seq 22,39 from 9 reference epigenomes46. This resulted in more than 18,000 IM regions, showing 57% CpG methylation on average, that are strongly enriched in genes, enhancer chromatin states (EnhBiv, EnhG, Enh), and evolutionarily-conserved regions. IM was associated with intermediate levels of active histone modification and DNaseI hypersensitivity. Near TSSs, IM correlated with intermediate gene expression, and in exons it was associated with an intermediate level of exon inclusion46. IM signatures were equally strong within tissue samples, peripheral blood, and purified cell types, suggesting that IM is not simply reflecting differential methylation between cell types, but likely reflects a stable state of cell-to-cell variability within a population of cells of the same type.
Global similarity and differences between epigenomes
Integrative analysis of 111 reference human epigenomes.
Roadmap Epigenomics Consortium et al.Nature 10.1038/nature14248
To understand the relationship among different tissue/cell samples beyond the constraints of a tree representation, we also studied the full similarity matrix of each mark in relevant chromatin states (Fig. S9) and also visualized the principal dimensions of epigenomic variation using multidimensional scaling (MDS) analysis (Fig. S10). The pairwise similarity matrices of different marks were most effective in distinguishing different subsets of the samples, with H3K4me1 in Enh primarily capturing immune cell similarities, and H3K27me3 in ReprPC capturing pluripotent cell similarities (Fig. S9). In the MDS analysis, the first four dimensions of variation for most marks separated several major sample groups (Extended Data 7a-i), with some subtle differences between marks. For example, pluripotent cells and immune cells were two strong outliers in the first two dimensions of H3K4me1 variation in Enh (Fig. 6b), but H3K27me3 in ReprPC showed more uniform spreading of reference epigenomes (Fig. 6c), consistent with the coverage distributions of immune and pluripotent cells for the corresponding chromatin states (Fig. 5b). For most marks, the first five dimensions captured most of the variance, with additional dimensions capturing at most 4-6% for each mark (Extended Data 7).
Relationship between allelic enhancer activity and allelic gene expression
Integrative analysis of haplotype-resolved epigenomes across human tissues.
Leung, D. et al.Nature 10.1038/nature14217
We discovered allelic enhancers resided in significantly closer proximity to genes with allelically biased expression, as compared to non-allelic enhancers (Fig. 4a and 4b). We also observed examples where distinct tissues from the same donor showed similar allelic biases of gene expression and H3K27ac at enhancers (left ventricle and right ventricle from donor3); however, the same tissue-type derived from a different donor (left ventricle from donor1) yielded no consistent patterns (Fig. 4b), supporting the hypothesis that allelically biased gene expression is driven by individual-specific genetic variation in enhancers.
Regulatory region dynamics correlate with gene expression changes in Alzheimer’s disease
Conserved epigenomic signals in mice and humans reveal immune basis of Alzheimer’s disease.
Gjoneska, E. et al.Nature 10.1038/nature14252
Genes flanking increased- and decreased-level regulatory regions (see Methods) showed consistent gene expression changes for both promoter and enhancers regions (Extended Data Fig. 5), and were consistently enriched in immune and stimulus-response functions for increased-level enhancers and promoters, and in synapse and learning-associated functions for deceased-level enhancers and promoters (Fig. 1d,e), consistent with our gene ontology results of changing gene expression levels.
Developmental origins influence epigenomes
Regulatory network decoded from epigenomes of surface ectoderm-derived cell types.
Lowdon, R. F. et al.Nature Communications 10.1038/ncomms6442
Developmental history shapes the epigenome and biological function of differentiated cells. Epigenomic patterns have been broadly attributed to the three embryonic germ layers. Here we investigate how developmental origin influences epigenomes. We compare key epigenomes of cell types derived from surface ectoderm (SE), including keratinocytes and breast luminal and myoepithelial cells, against neural crest-derived melanocytes and mesoderm-derived dermal fibroblasts, to identify SE differentially methylated regions (SE-DMRs). DNA methylomes of neonatal keratinocytes share many more DMRs with adult breast luminal and myoepithelial cells than with melanocytes and fibroblasts from the same neonatal skin. This suggests that SE origin contributes to DNA methylation patterning, while shared skin tissue environment has limited effect on epidermal keratinocytes. Hypomethylated SE-DMRs are in proximity to genes with SE relevant functions. They are also enriched for enhancer- and promoter-associated histone modifications in SE-derived cells, and for binding motifs of transcription factors important in keratinocyte and mammary gland biology. Thus, epigenomic analysis of cell types with common developmental origin reveals an epigenetic signature that underlies a shared gene regulatory network.
Intermediate DNA methylation is a conserved signature of genome regulation.
Elliott, G. et al.Nature Communications 10.1038/ncomms6442
The bimodal pattern of DNA methylation implies a binary control over gene expression, yet a significant number of loci throughout the genome have an intermediate level of DNA methylation. To comprehensively identify regions of intermediate methylation (IM) and their quantitative relationship with gene activity, integrative and comparative epigenomics were applied to 25 human primary cell and tissue samples. These analyses identified 18,452 IM regions located near 36% of genes. CpGs in IM regions had a mean methylation of 57% and 58%. IM regions were enriched at enhancers and exons and exhibit a quantitative relationship with enhancer signals and exon inclusion, respectively (Figure 2c,d,e). These associations were equally strong in tissue, unsorted peripheral blood and 6 highly purified cell types. Significant interspecies conservation, and conservation among different individuals at IM regions further suggests an important function, and potentially a shared mechanism for their establishment and maintenance. The data is consistent with the hypothesis that IM is a novel epigenetic signature of evolutionarily conserved, gene context-dependent function.
Characterization of age- and gene-expression-related methylation
Age-related variations in the methylome associated with gene expression in human monocytes and T cells.
Reynolds, L. M. et al.Nature Communications 10.1038/ncomms6366
Potentially functional age-dMS (age-associated differentially methylation sites) were defined as CpG sites whose % methylation was associated with age (FDR<0.001) and with mRNA expression of any gene within one megabase of the CpG site in question (FDR<0.001).
[...]
We detected 1,794 age- and expression-associated methylation sites (age-eMS) among the 1,264 monocyte samples (4.7% of 37,911 monocyte age-dMS; reported in Supplementary Data 2), with methylation correlated with age (prho ranging: -0.46 – 0.44; Fig. 3a), and cis-gene expression (prho ranging -0.69 – 0.62; Fig. 3b).
[...]
Age-dMS (age-associated differentially methylation sites) exhibiting increased methylation with age (hyper age-dMS) were located in distinctly different functional domains than age-dMS exhibiting decreased methylation with age (hypo age-dMS), consistent with previous reports 6, 10, 20. Compared to all CpG sites tested, hyper age-dMS were significantly enriched for inactive/repressive histone modifications 18 (H3K27me3, bivalent H3K27me3/H3K4me3), while being depleted for active chromatin marks 3, 18, 21 (H3K4me3, H3K27ac (Fig. 2a). However, there was no clear preference for hypo age-dMS among inactive vs. active histone modifications (fold enrichments ranging 0.9 – 1.1). We also replicated previous findings 10, 14, 22 that hyper age-dMS are enriched among CpG islands (Fig. 2b) and 1st exons (Fig. 2c), while hypo age-dMS are enriched among CpG island “shores”, and the 3’ untranslated regions (3’ UTR) of genes.
The most prominent features of age-eMS (age- and cis-gene expression-associated differentially methylation sites) were their enrichment for histone modifications indicative of open/active chromatin (H3K4me1 and H3K27ac, Fig. 2e) and predicted enhancer regions (Fig. 2h), while being depleted among repressed genomic regions (H3K27me3), for both hypo and hyper age-eMS.
Combinatorial patterns of chromatin marks at lncRNA loci
Epigenomic footprints across 111 reference epigenomes reveal tissue-specific epigenetic regulation of lincRNAs.
Amin, V. et al.Nature 10.1038/ncomms7370
To establish baseline patterns of chromatin marks in the H1 Embryonic Stem Cell (ESC) line, regions 3Kbp in size centering on the lincRNA TSSs that show ESC-specific marks were clustered by Spark25 based on the signal of all histone marks over those regions in H1. As illustrated in Figure 3a, Spark analysis revealed five clusters corresponding to regions with different combinations of chromatin marks. To further characterize Spark clusters, we examined their enrichment for distinct chromatin states identified by the ChromHMM program26. As indicated in Figure 3d, Spark and ChromHMM independently discovered highly correlated chromatin mark profiles.
The distances between lincRNA TSSs and the closest protein coding gene varied across Spark clusters (Fig. 3b). A majority of lincRNA TSSs belonging to the Quiescent, Active Enhancer and Heterochromatin clusters were located more than 50Kbp away from protein coding genes (Fig. 3b), in contrast to the much more proximal location of those belonging to the Active Promoter and Bivalent clusters.
We then asked whether the lincRNAs with TSSs belonging to different Spark clusters showed differences in their transcription levels. As expected, lincRNAs with TSSs within the Heterochromatin cluster were transcribed at lower levels than other distal lincRNA clusters and those within Bivalent cluster were transcribed at a lower level than those within the Active Promoter cluster (Fig. 3e). These patterns suggest an association between chromatin states and lincRNA transcription.
Using evolutionary dating information for lncRNAs27, we asked whether lincRNAs associated with different Spark-defined clusters share similar evolutionary history. As indicated in Figure 3c, heterochromatin cluster contained the largest fraction of human-specific lincRNAs. The rapid evolution of members of this cluster is consistent with reduced negative selection pressure due to apparent lack of specific function in early embryonic development, as suggested by lower levels of transcription of lncRNAs belonging to the Heterochromatin cluster in the H1 cell line (Fig. 3e).
The Bivalent cluster was most stable evolutionarily (Fig. 3c and Supplementary Fig. 8a), consistent with previous findings27. As expected, the Bivalent cluster showed lower transcription of both lincRNAs and their associated protein coding genes than the Active Promoter cluster (Fig. 3e). Enrichment analysis of biological processes and mouse phenotype associated with the Bivalent cluster using GREAT tool24 revealed strong enrichment for genes involved in developmental regulation (Fig. 3f and Supplementary Fig. 8c). Furthermore, enrichment analysis of ENCODE Transcription Factor binding28 within Bivalent cluster C4 in H1 embryonic stem cell revealed strongest enrichment for SUZ12 a key member of the PRC2 (Polycomb repressive complex 2)29 and CTBP2, a protein known to interact with Polycomb complex members30 (Fig. 3f and Supplementary Fig. 8b). These enrichments are consistent with the regulation of the bivalent state by the Polycomb complex.
Rights and permissions
About this article
Cite this article
2. Relationship between different epigenomic marks: DNA accessibility and methylation, histone marks, and RNA. Nature (2015). https://doi.org/10.1038/nature14310
Published:
DOI: https://doi.org/10.1038/nature14310