Distinct chromatin states exhibit different distributions of chromatin accessibility, DNA methylation and gene expression

Integrative analysis of 111 reference human epigenomes.

Roadmap Epigenomics Consortium et al.Nature 10.1038/nature14248

We used chromatin states to study the relationship between histone modification patterns, RNA expression levels, DNA methylation, and DNA accessibility. Consistent with previous studies 19,23, 43, 44, we found low DNA methylation and high accessibility in promoter states, high DNA methylation and low accessibility in transcribed states, and intermediate DNA methylation and accessibility in enhancer states (Fig. 4d-e, Extended Data 3a,b). These differences in methylation level were stronger for higher-expression genes than for lower-expression genes, leading to a more pronounced DNA methylation profile (Extended Data 3c, Fig. S5, Table S4f). Genes proximal to H3K27ac-marked enhancers show significantly higher expression levels (Extended Data 3d), and conversely, higher-expression genes were significantly more likely to neighbor H3K27ac-containing enhancers (Extended Data 3e).

Chromatin states sometimes captured differences in RNA expression that are missed by DNA methylation or accessibility. For example, TxFlnk, Enh, TssBiv, and BivFlnk states show similar distributions of DNA accessibility but widely differing enrichments for expressed genes (Fig. 4c,d). Enh and ReprPC states show intermediate DNA methylation, but very different distributions of DNA accessibility and different enrichments for expressed genes (Fig. 4c-e). Lack of DNA methylation, typically associated with de-repression, is associated with both the active TssA promoter state and the bivalent TssBiv and BivFlnk states. Bivalent states TssBiv and BivFlnk also showed overall lower DNA methylation and higher DNA accessibility than enhancer states Enh and EnhG and binding by both activating and repressive regulatory factors (Extended Data 2b). These results also held for alternate methylation measurement platforms (Extended Data 4a-c), and for the 18-state chromatin state model (Extended Data 4d-e). Overall, these results highlight the complex relationship between DNA methylation, DNA accessibility, and RNA transcription and the value of interpreting DNA methylation and DNA accessibility in the context of integrated chromatin states that better distinguish active and repressed regions.

Given the intermediate methylation levels of tissue-specific enhancer regions, we directly annotated intermediate methylation (IM) regions, based on 25 complementary DNA methylation assays of MeDIP 31, 45 and MRE-Seq 22,39 from 9 reference epigenomes46. This resulted in more than 18,000 IM regions, showing 57% CpG methylation on average, that are strongly enriched in genes, enhancer chromatin states (EnhBiv, EnhG, Enh), and evolutionarily-conserved regions. IM was associated with intermediate levels of active histone modification and DNaseI hypersensitivity. Near TSSs, IM correlated with intermediate gene expression, and in exons it was associated with an intermediate level of exon inclusion46. IM signatures were equally strong within tissue samples, peripheral blood, and purified cell types, suggesting that IM is not simply reflecting differential methylation between cell types, but likely reflects a stable state of cell-to-cell variability within a population of cells of the same type.

Global similarity and differences between epigenomes

Integrative analysis of 111 reference human epigenomes.

Roadmap Epigenomics Consortium et al.Nature 10.1038/nature14248

To understand the relationship among different tissue/cell samples beyond the constraints of a tree representation, we also studied the full similarity matrix of each mark in relevant chromatin states (Fig. S9) and also visualized the principal dimensions of epigenomic variation using multidimensional scaling (MDS) analysis (Fig. S10). The pairwise similarity matrices of different marks were most effective in distinguishing different subsets of the samples, with H3K4me1 in Enh primarily capturing immune cell similarities, and H3K27me3 in ReprPC capturing pluripotent cell similarities (Fig. S9). In the MDS analysis, the first four dimensions of variation for most marks separated several major sample groups (Extended Data 7a-i), with some subtle differences between marks. For example, pluripotent cells and immune cells were two strong outliers in the first two dimensions of H3K4me1 variation in Enh (Fig. 6b), but H3K27me3 in ReprPC showed more uniform spreading of reference epigenomes (Fig. 6c), consistent with the coverage distributions of immune and pluripotent cells for the corresponding chromatin states (Fig. 5b). For most marks, the first five dimensions captured most of the variance, with additional dimensions capturing at most 4-6% for each mark (Extended Data 7).

Relationship between allelic enhancer activity and allelic gene expression

Integrative analysis of haplotype-resolved epigenomes across human tissues.

Leung, D. et al.Nature 10.1038/nature14217

We discovered allelic enhancers resided in significantly closer proximity to genes with allelically biased expression, as compared to non-allelic enhancers (Fig. 4a and 4b). We also observed examples where distinct tissues from the same donor showed similar allelic biases of gene expression and H3K27ac at enhancers (left ventricle and right ventricle from donor3); however, the same tissue-type derived from a different donor (left ventricle from donor1) yielded no consistent patterns (Fig. 4b), supporting the hypothesis that allelically biased gene expression is driven by individual-specific genetic variation in enhancers.

Regulatory region dynamics correlate with gene expression changes in Alzheimer’s disease

Conserved epigenomic signals in mice and humans reveal immune basis of Alzheimer’s disease.

Gjoneska, E. et al.Nature 10.1038/nature14252

Genes flanking increased- and decreased-level regulatory regions (see Methods) showed consistent gene expression changes for both promoter and enhancers regions (Extended Data Fig. 5), and were consistently enriched in immune and stimulus-response functions for increased-level enhancers and promoters, and in synapse and learning-associated functions for deceased-level enhancers and promoters (Fig. 1d,e), consistent with our gene ontology results of changing gene expression levels.

Developmental origins influence epigenomes

Regulatory network decoded from epigenomes of surface ectoderm-derived cell types.

Lowdon, R. F. et al.Nature Communications 10.1038/ncomms6442

Developmental history shapes the epigenome and biological function of differentiated cells. Epigenomic patterns have been broadly attributed to the three embryonic germ layers. Here we investigate how developmental origin influences epigenomes. We compare key epigenomes of cell types derived from surface ectoderm (SE), including keratinocytes and breast luminal and myoepithelial cells, against neural crest-derived melanocytes and mesoderm-derived dermal fibroblasts, to identify SE differentially methylated regions (SE-DMRs). DNA methylomes of neonatal keratinocytes share many more DMRs with adult breast luminal and myoepithelial cells than with melanocytes and fibroblasts from the same neonatal skin. This suggests that SE origin contributes to DNA methylation patterning, while shared skin tissue environment has limited effect on epidermal keratinocytes. Hypomethylated SE-DMRs are in proximity to genes with SE relevant functions. They are also enriched for enhancer- and promoter-associated histone modifications in SE-derived cells, and for binding motifs of transcription factors important in keratinocyte and mammary gland biology. Thus, epigenomic analysis of cell types with common developmental origin reveals an epigenetic signature that underlies a shared gene regulatory network.

Intermediate DNA methylation is a conserved signature of genome regulation.

Elliott, G. et al.Nature Communications 10.1038/ncomms6442

The bimodal pattern of DNA methylation implies a binary control over gene expression, yet a significant number of loci throughout the genome have an intermediate level of DNA methylation. To comprehensively identify regions of intermediate methylation (IM) and their quantitative relationship with gene activity, integrative and comparative epigenomics were applied to 25 human primary cell and tissue samples. These analyses identified 18,452 IM regions located near 36% of genes. CpGs in IM regions had a mean methylation of 57% and 58%. IM regions were enriched at enhancers and exons and exhibit a quantitative relationship with enhancer signals and exon inclusion, respectively (Figure 2c,d,e). These associations were equally strong in tissue, unsorted peripheral blood and 6 highly purified cell types. Significant interspecies conservation, and conservation among different individuals at IM regions further suggests an important function, and potentially a shared mechanism for their establishment and maintenance. The data is consistent with the hypothesis that IM is a novel epigenetic signature of evolutionarily conserved, gene context-dependent function.

Characterization of age- and gene-expression-related methylation

Age-related variations in the methylome associated with gene expression in human monocytes and T cells.

Reynolds, L. M. et al.Nature Communications 10.1038/ncomms6366

Potentially functional age-dMS (age-associated differentially methylation sites) were defined as CpG sites whose % methylation was associated with age (FDR<0.001) and with mRNA expression of any gene within one megabase of the CpG site in question (FDR<0.001).

[...]

We detected 1,794 age- and expression-associated methylation sites (age-eMS) among the 1,264 monocyte samples (4.7% of 37,911 monocyte age-dMS; reported in Supplementary Data 2), with methylation correlated with age (prho ranging: -0.46 – 0.44; Fig. 3a), and cis-gene expression (prho ranging -0.69 – 0.62; Fig. 3b).

[...]

Age-dMS (age-associated differentially methylation sites) exhibiting increased methylation with age (hyper age-dMS) were located in distinctly different functional domains than age-dMS exhibiting decreased methylation with age (hypo age-dMS), consistent with previous reports 6, 10, 20. Compared to all CpG sites tested, hyper age-dMS were significantly enriched for inactive/repressive histone modifications 18 (H3K27me3, bivalent H3K27me3/H3K4me3), while being depleted for active chromatin marks 3, 18, 21 (H3K4me3, H3K27ac (Fig. 2a). However, there was no clear preference for hypo age-dMS among inactive vs. active histone modifications (fold enrichments ranging 0.9 – 1.1). We also replicated previous findings 10, 14, 22 that hyper age-dMS are enriched among CpG islands (Fig. 2b) and 1st exons (Fig. 2c), while hypo age-dMS are enriched among CpG island “shores”, and the 3’ untranslated regions (3’ UTR) of genes.

The most prominent features of age-eMS (age- and cis-gene expression-associated differentially methylation sites) were their enrichment for histone modifications indicative of open/active chromatin (H3K4me1 and H3K27ac, Fig. 2e) and predicted enhancer regions (Fig. 2h), while being depleted among repressed genomic regions (H3K27me3), for both hypo and hyper age-eMS.

Combinatorial patterns of chromatin marks at lncRNA loci

Epigenomic footprints across 111 reference epigenomes reveal tissue-specific epigenetic regulation of lincRNAs.

Amin, V. et al.Nature 10.1038/ncomms7370

To establish baseline patterns of chromatin marks in the H1 Embryonic Stem Cell (ESC) line, regions 3Kbp in size centering on the lincRNA TSSs that show ESC-specific marks were clustered by Spark25 based on the signal of all histone marks over those regions in H1. As illustrated in Figure 3a, Spark analysis revealed five clusters corresponding to regions with different combinations of chromatin marks. To further characterize Spark clusters, we examined their enrichment for distinct chromatin states identified by the ChromHMM program26. As indicated in Figure 3d, Spark and ChromHMM independently discovered highly correlated chromatin mark profiles.

The distances between lincRNA TSSs and the closest protein coding gene varied across Spark clusters (Fig. 3b). A majority of lincRNA TSSs belonging to the Quiescent, Active Enhancer and Heterochromatin clusters were located more than 50Kbp away from protein coding genes (Fig. 3b), in contrast to the much more proximal location of those belonging to the Active Promoter and Bivalent clusters.

We then asked whether the lincRNAs with TSSs belonging to different Spark clusters showed differences in their transcription levels. As expected, lincRNAs with TSSs within the Heterochromatin cluster were transcribed at lower levels than other distal lincRNA clusters and those within Bivalent cluster were transcribed at a lower level than those within the Active Promoter cluster (Fig. 3e). These patterns suggest an association between chromatin states and lincRNA transcription.

Using evolutionary dating information for lncRNAs27, we asked whether lincRNAs associated with different Spark-defined clusters share similar evolutionary history. As indicated in Figure 3c, heterochromatin cluster contained the largest fraction of human-specific lincRNAs. The rapid evolution of members of this cluster is consistent with reduced negative selection pressure due to apparent lack of specific function in early embryonic development, as suggested by lower levels of transcription of lncRNAs belonging to the Heterochromatin cluster in the H1 cell line (Fig. 3e).

The Bivalent cluster was most stable evolutionarily (Fig. 3c and Supplementary Fig. 8a), consistent with previous findings27. As expected, the Bivalent cluster showed lower transcription of both lincRNAs and their associated protein coding genes than the Active Promoter cluster (Fig. 3e). Enrichment analysis of biological processes and mouse phenotype associated with the Bivalent cluster using GREAT tool24 revealed strong enrichment for genes involved in developmental regulation (Fig. 3f and Supplementary Fig. 8c). Furthermore, enrichment analysis of ENCODE Transcription Factor binding28 within Bivalent cluster C4 in H1 embryonic stem cell revealed strongest enrichment for SUZ12 a key member of the PRC2 (Polycomb repressive complex 2)29 and CTBP2, a protein known to interact with Polycomb complex members30 (Fig. 3f and Supplementary Fig. 8b). These enrichments are consistent with the regulation of the bivalent state by the Polycomb complex.

Figure 1: Chromatin states and DNA methylation dynamics.
figure 1

a. Chromatin state definitions, abbreviations, and histone mark probabilities. b. Average genome coverage. Genomic annotation enrichments in H1-ESC. c. Active and inactive gene enrichments in H1-ESC (see Extended Data 2b for GM12878). d. DNA methylation. e. DNA accessibility. d-e. Whiskers show 1.5 interquartile range. Circles are individual outliers. f. Average overlap fold enrichment for GERP evolutionarily conserved non-coding regions. Bars denote standard deviation. g. DNA methylation (WGBS) density (color, ln scale) across cell types. red=max ln(density+1). Left column indicates tissue groupings, full list shown in Extended Data 4f. h. DNA methylation levels (left) and TF enrichment (right) during ESC differentiation. i. Chromatin mark changes during cardiac muscle differentiation. Heatmap=average normalized mark signal in Enh. C5 cluster enrichment54.

Figure 2: Relationship between histone marks, DNA methylation, DNA accessibility, and gene expression.
figure 2

a. H3K27ac-marked ‘active’ enhancers show higher levels of DNA accessibility, based on enrichment of DNase-seq signal confidence scores (-log10(Poisson p-value))for elements in each chromatin state in our extended 18-state model that includes the core five histone modification marks and H3K27ac, similar to Fig. 4e. b. Level of whole-genome bisulfite methylation for all chromatin states in the 18-state model shows that H3K27ac-marked „active‟ enhancers associated with H3K27ac in addition to H3K4me1 show lower methylation levels, consistent with higher regulatory activity. The whiskers in a. and b. show 1.5 x IQR (interquartile range) and the filled circles are individual outliers c. DNA methylation levels for genes showing different expression levels. The depletion of DNA methylation in promoter regions, and the enrichment of DNA methylation in transcribed regions, are both more pronounced for highly expressed genes. The enrichment for high DNA methylation is more pronounced in the 3‟ ends of the most highly expressed genes. d. Genes associated with active enhancer states have consistently significantly higher expression. „Active enhancer‟ associated genes have at least one EnhA1 and/or EnhA2 +/-20Kb from TSS (18-state model). „Weak-enhancer‟ genes are associated with EnhG1, EnhG2, EnhWk, EnhBiv. Lowest expression have genes that are not associated with any enhancer. Plots with red markers show median expression of genes associated with „active‟ enhancers, yellow markers „weak‟ enhancers, and white markers no association with any enhancer state. e. Higher-expression genes show greater association with H3K27ac-marked ‘active’ enhancers. Highly expressed genes are consistently more frequently associated with H3K27ac-marked active enhancers (EnhA1 and EnhA2) across all cell types. Fraction of genes associated with H3K27ac-marked „active‟ enhancers (red), H3K27ac-lacking „weak‟ enhancers only (yellow), or no enhancers (white) for genes of varying expression levels in each cell type with RNA-seq data.

Figure 3
figure 3

Heatmaps showing tissue/cell type similarity measured using different epigenomic marks. a-c. Pearson correlation values calculated between epigenomes for a variety of marks, assessed within relevant chromatin states of the 15-state core model (see Methods). a. Five core marks in their corresponding relevant chromatin states. b. H3K27ac in Enh and TssA state regions. c. H3K9ac in Enh and TssA state regions. d-e. Similarly, for DNase (d) in DNase regions, RNA-seq (e) using RPKM values across genes.

Figure 4
figure 4

Multidimensional scaling (MDS) plots showing tissue/cell type similarity using different epigenomic marks. a-i. Multi-Dimensional Scaling (MDS) analysis results, showing reference epigenomes using their group coloring defined in Fig. 2. Thin lines connect same-group reference epigenomes. The first 5 axes of variation are shown in pairs. Marks are assessed in the same regions as used for Figures S1 and S9. a. H3K4me1, b. H3K4me3, c. H3K27me3, d. H3K27ac, e. H3K9ac, f. DNase, g. H3K36me3, h. RNA-seq RPKM, i. H3K9me3.

Figure 5: a-i. Multidimensional scaling (MDS) plots showing tissue/cell type similarity using different epigenomic marks.
figure 5

Multi-Dimensional Scaling (MDS) analysis results, showing reference epigenomes using their group coloring defined in Fig. 2. Thin lines connect same-group reference epigenomes. The first 4 axes of variation are shown in pairs. Marks are assessed in regions with relevant chromatin states (see Methods). j. Variance explained by each MDS dimension. The first 5 dimensions shown in Fig. S10 (Fig. 6b,c) explain between 45% and 80% of the total epigenome-to-epigenome variance for all histone modification mark correlations, and additional dimensions explain less than 10%. Only a few components of H3K4me3 in TssA chromatin states explains a much larger fraction of the variance than other marks, possibly due to its stability across cell types.

Figure 6: Epigenome relationships
figure 6

a. Hierarchical epigenome clustering using H3K4me1 signal in Enh states. Numbers indicate bootstrap support scores over 1,000 samplings. b-c. Multidimensional scaling (MDS) plot of cell type relationships based on similarity in H3K4me1 signal in Enh states (b) and H3K27me3 signal in ReprPC states (c). First four dimensions shown as dim1 vs. dim2 and dim3 vs. dim4.

Figure 7: Allelic histone acetylation at enhancers is associated with allelically biased gene expression
figure 7

a) Average distance of allelic (5% FDR) and non-allelic enhancer to the closest allelically expressed geneis significantly different (n=3,829, *** -p-value<2.2e-16, KS-test). b) Genome browser snapshots show an allelic enhancer within the intron of the allelically expressed A4GALTgene (P1- red, P2 – blue) on chromosome 22 across 3 samples. c) Density plot presents the fraction of concordant allelic bias between allelically expressed genes and allelic enhancers in terms of distance. The allelic enhancer-­‐gene pairs were defined with FDR cutoff values of 5% (n=14,082)(black), 1% (n=6,057)(blue) and 0.1% (n=2,362)(yellow). Permutated control of a set of enhancer- ‐gene pairs was included (n=14,082)(grey). Distance between allelically biased enhancer- gene pairs and fraction of concordant allelic bias are denoted by x- and y-axes, respectively (p-value<2.2e-16, KS-test).

Figure 8: Relationship between gene changes of gene expression and regulatory regions in CK-p25 mice.
figure 8

For each class of gene expression change in the CK-p25 model (x axis), enrichment to overlap different histone modifications is shown (y axis) for a, H3K4me3 at promoters; b, H3K27ac at enhancers; c, H3K27me3 at Polycomb repressed regions. Histone modifications were mapped to the nearest transcription start site (Supplementary Table S3) to show the enrichment of the changing regulatory regions relative to those that are stable in CK-p25. The significance is calculated based on the hypergeometric p-value of the overlap.

Figure 9: Correlations between age-methylation and methylation-gene expression for Age-eMS in 1,264 monocyte samples.
figure 9

CpG sites with methylation associated with age (FDR<0.001) and cis-gene expression (1,794 age-eMS; FDR<0.001; Supplementary Data 2). (a) The partial correlation of age-eMS methylation with age (y-axis), compared with age-eMS genomic location (by chromosome, x-axis); the strongest correlations (prho=−0.46) were between age and methylation of cg10628205 and cg12079303 (red circle), which were also correlated with the expression of NFIA. (b) The partial correlation between age-eMS methylation and cis-gene expression (y-axis), compared with age-eMS distance to the associated gene transcription start site (TSS, x-axis); the strongest correlation (prho=−0.69) was between methylation of cg11805027 and expression of VASH1. Linear regression analysis also included the following covariates: race, sex, site of data collection, microarray chip and residual sample contamination with non-targeted cells (see Methods).

Figure 10: Enrichment of Age-dMS and Age-eMS in monocytes for regulatory features.
figure 10

Fold enrichment of age-associated CpG sites (37,911 age-dMS, FDR<0.001, left), and fold enrichment of age and cis-gene expression-associated methylation sites (FDR<0.001, 1,794 age-eMS, right), stratified by methylation positively associated with age (hyper age-dMS, shaded) and negatively associated with age (hypo age-dMS, white) for (a,e) histone modifications reported by ENCODE in a monocyte sample, (b,f) CpG islands and ‘shores’, (c,g) gene regions, including: within 1.5 kb upstream of the transcription start site (TSS1500), the 5′ untranslated region (5′ UTR), the 1st gene exon, the gene body, the 3′ UTR or intergenic, and (d,h) predicted gene expression regulatory regions based on histone modifications (enhancer and promoter based on H3K4me1/3, H3K27ac), CTCF binding and DNase peaks reported in a monocyte sample (ENCODE/UCSC browser), and transcription factor binding sites (TFBS) reported in any cell type available from the UCSC Genome Browser. Fold enrichments presented are from 1,264 monocyte samples, and are relative to all 448,523 CpG sites tested (y-axis); *1 × 10−6≤P<0.01, **P<1 × 10−6, χ2-test.

Figure 11
figure 11

(Text of figure legend if needed here): Epigenomic footprints of lincRNA transcription start sites (lincRNA TSSs) in the H1 embryonic stem cells. Figure Caption - LincRNA TSSs belong to 5 distinct chromatin state classes. (a) LincRNA TSS that had differential histone modification signals (in at least one histone mark) in stem cells were used to perform Spark analysis. Spark performs k-means clustering (k=5, bin size = 100bps) to group regions that have similar epigenomic footprint. Clustering analysis reveals five distinct classes of lincRNA TSS for H1 stem cells: quiescent (C1: Quies), enhancer (C2: Enh), transcription start site active (C3: TssA), bivalent (C4: Biv) and quiescent/heterochromatin (C5: Quies/Het). Each Spark cluster was subjected to further analyses (b-f). (b) Absolute distance of lincRNA TSS to the nearest protein coding TSS determined using GREAT basal + extension rule (1kb downstream + 5kb upstream + up to 500 kb distal). The absolute distances are binned into <5kb, 5-50kb, and >50-500 kb windows. (c) Evolutionary age estimates of lincRNA based on sequence conservation. (d) ChromHMM state enrichments of the lincRNA TSS clusters. (e) Density function showing expression of lincRNA (left) and neighboring protein coding genes in RPKM (reads per kilobase per million) units. (f) Enrichment of ENCODE transcription factor binding sites for bivalent lincRNA TSS clusters (hypergeometric tests, p < 0.0005). Gene ontology terms (blue - biological process, red - mouse phenotype, and green - mouse genomic institute (MGI) expression) enrichment of neighboring protein coding genes for the bivalent lincRNA TSS cluster. Terms identified using GREAT are significant by both hypergeometric and binomial tests (p < 0.05). We have developed an on-line tutorial (link: http://genboree.org/theCommons/projects/aminv-natcomm-2015/wiki ) on how to use on-line tools integrated within the Genboree Workbench to carry out the types of analyses reported in this Figure.