DNA methylation and chromatin state dynamics during differentiation

Integrative analysis of 111 reference human epigenomes.

Roadmap Epigenomics Consortium et al.Nature 10.1038/nature14248

We next studied the relationship between DNA methylation dynamics and histone modifications across 95 epigenomes with methylation data, extending previous studies that focused on individual lineages 19,47-49. We found that the distribution of methylation levels for CpGs in some chromatin states varied significantly across tissue and cell types (Fig. 4g, Extended Data 4f, Table S4a). For example: TssAFlnk states are largely unmethylated in terminally-differentiated cells and tissues, but frequently methylated for several pluripotent and ESC-derived cells (Bonferroni-corrected F-test p<.01); Enh and EnhG states are highly methylated in pluripotent cells, but show a broader distribution of intermediate methylation in differentiated cells and tissues (p<.01); EnhBiv states are unmethylated in most primary cells and tissues, but show a broader distribution of methylation levels in pluripotent cells, possibly reflecting cell-to-cell heterogeneity (p<.01); the repressed state ReprPC shows varying methylation levels among epigenomes; the Het state showed high levels of methylation in almost all epigenomes.

We also studied DNA methylation changes in three different systems. First, we studied DNA methylation changes during Embryonic Stem Cell (ESC) differentiation 49,50. We identified regions that lost methylation (Differentially Methylated Regions, DMRs, Table S4c) upon differentiation of ESCs (E003) to mesodermal (E013), endodermal (E011), and ectodermal (E012) lineages (Fig. 4h). Each lineage showed a largely distinct set of ~2200-4400 DMRs, that are enriched for distinct transcription factor binding events (Fig. 4h, right column)51, consistent with their distinct developmental regulation. Upon further differentiation, ectodermal DMRs remained hypomethylated in three neural progenitor populations52, despite the usage of distinct hESC lines, and mesodermal and endodermal DMRs remained highly methylated (Fig. 4h), highlighting the lineage-specific nature of changes in DNA methylation during early differentiation49,53.

Second, we studied DNA methylation changes associated with breast epithelia differentiation44. Ectoderm to breast epithelia differentiation was dominated by DNA methylation loss (1.3M CpGs lost methylation vs. 0.2M gained), consistent with other primary somatic cell types50. Distinguishing luminal vs. myoepithelial cells by flow sorting, and comparing a set of DMRs (Table S4d) defined specifically in epithelial lineages44, we found differences in nearest-gene enrichments54 (mammary gland epithelium development vs. actin filament bundle, respectively), and differences in motif density (luminal DMRs show greater motif density for 51 TFs and lower for 0 TFs). Proximal DMRs were highly associated with increased transcription, consistent with regulatory element de-repression associated with DNA methylation loss.

Third, we asked whether tissue environment or developmental origin is the primary driving factor in DNA methylation differences observed in more differentiated cell types55, using epigenomes from skin cell types (keratinocytes E057/058, melanocytes E059/E061, and fibroblasts E055/056) that share a common tissue environment but possess distinct embryonic origins (surface ectoderm, neural crest, and mesoderm, respectively). We found that despite the shared tissue environment, these three cell types displayed lower overlap in their DNA methylation and histone modification signatures, and instead were more similar to other cell types with a shared developmental origin. Using a set of DMRs (Table S4e) defined specifically in the skin cell types55, keratinocytes shared 1392 (18%) of DMRs with surface ectoderm-derived breast cell types (Hypergeometric P-value<10-6), and 97% of these were hypomethylated. These shared DMRs were enriched for regulatory elements and cell-type relevant genes, suggesting a common gene regulatory network and shared signaling pathways and structural components55. These results suggest that common developmental origin can be a primary determinant of global DNA methylation patterns, and sometimes supersedes the immediate tissue environment in which they are found.

In addition, we examined coordinated changes in chromatin marks associated with cellular differentiation56. We found that enhancers showing coordinated differences in multiple marks are enriched near genes showing common tissue-specific expression, and common knockout phenotypes based on their mouse orthologs. For example, enhancers that showed higher H3K27ac and H3K4me3 (Fig. 4i, Cluster C2) in left ventricle (E095) relative to their ESCs (E003) and mesendodermal (E004) precursor lineages were enriched for heart ventricle expression and cardiac and muscle phenotypes in their mouse orthologs.

Relationships between epigenomic landscapes of cell types, tissues and lineages

Integrative analysis of 111 reference human epigenomes.

Roadmap Epigenomics Consortium et al.Nature 10.1038/nature14248

We next used epigenome similarity to study the relationship between tissues and cell types, based on the similarity of diverse histone modification marks evaluated in their relevant chromatin states. Hierarchical clustering of our 111 reference epigenomes using H3K4me1 signal in Enh (Fig. 6a) showed consistent grouping of biologically-similar cell and tissue types, including ESCs, iPSCs, T-cells, B-cells, adult brain, fetal brain, digestive, smooth muscle, and heart. We also found several initially surprising but biologically-meaningful groupings: fetal brain and germinal matrix samples clustered with neural stem cells rather than adult brain, consistent with fetal neural stem cell proliferation; many ES-derived cells clustered with ESCs and iPS cells rather than the corresponding tissues, suggesting that those are still closer to pluripotent states than corresponding somatic states; adult and fetal thymus samples clustered with T-cells rather than other tissues, consistent with roles in T-cell maturation and immunity. Several marks successfully recovered biologically-meaningful groups when evaluated in their relevant chromatin states (Fig. S8), including H3K4me3 in TssA, H3K27me3 in ReprPC state, and H3K36me3 in Tx, suggesting that the signal of each mark in relevant chromatin states is highly indicative of cell type and tissue identity. These alternative clusterings also showed some differences; for example, H3K4me3 in TssA states grouped several fetal samples together with each other, in a cluster sister to ESCs and iPSCs, rather than in separate tissue groups.

We applied this approach to compare the Roadmap Epigenomics reference epigenomes with the 16 ENCODE 2012 samples with broad mark coverage (Extended Data 6). We found that H3K4me1 signal in enhancer chromatin states correctly groups primary cells from similar tissues across the two projects, emphasizing the robustness of our annotations and signal tracks across projects (Extended Data 6a). For example, epidermal keratinocytes NHEK group with other keratinocytes, mammary epithelial cells HMEC with other skin cells, and skeletal muscle myoblasts HSMM and osteoblasts with bone marrow. Some cancer cell lines also grouped with corresponding primary tissues, including hepatocellular carcinoma HepG2 with liver tissue, primary lung fibroblasts NHLF with the IMR90 lung fibroblast cell line, and T cell leukemia Dnd41 with Thymus, while in other cases cancerous cell lines grouped together, e.g. HeLa-S3 cervical carcinoma with A549 lung carcinoma. Similarly, H3K27me3 signal in Polycomb-repressed states grouped five immortalized cell lines together (Extended Data 6c), despite their T-cell, Lung, Cervical, Leukemia, and Hepatocellular origins12,64. This larger trees spanning ENCODE 2012 and Roadmap Epigenomics also highlighted the large number of lineages not previously covered by reference epigenomes, including brain, muscle, smooth muscle, heart, mucosa, digestive tract, and fetal tissues.

Causal autoimmunity variants map to immune-cell-specific enhancers

Genetic and epigenetic fine mapping of causal autoimmune disease variants.

Farh, K. K.-H. et al.Nature 10.1038/nature13835

To investigate the functions of predicted causal non-coding variants, we generated a resource of epigenomic maps for specialized immune subsets (Extended Data Fig. 6). We examined primary human CD4+ T-cell populations from pooled healthy donor blood, including FOXP3+CD25hiCD127lo/− regulatory (Tregs), CD25−CD45RA+CD45RO— naive (Tnaive) and CD25−CD45RA−CD45RO+ memory (Tmem) T cells, and ex vivo phorbol myristate acetate (PMA)/ionomycin stimulated CD4+ T cells separated into IL-17-positive (CD25−IL17A+; TH17) and IL-17-negative (CD25−IL17A−; THStim) subsets. We also examined naive and memory CD8+ T cells, B cell centroblasts from paediatric tonsils (CD20+CD10+CXCR4+CD44−), and peripheral blood B cells (CD20+) and monocytes (CD14+). We mapped six histone modifications by chromatin immunoprecipitation followed by sequencing (ChIP-seq) for all ten populations, and performed RNA sequencing (RNA-seq) for each CD4+ T-cell population. We also incorporated data for B lymphoblastoid cells17, TH0, TH1 and TH2 stimulated T cells10, and non-immune cells from the NIH Epigenomics Project25 and ENCODE26, for a total of 56 cell types.

For each cell type, we computed a genome-wide map of cis-regulatory elements based on H3 lysine 27 acetylation (H3K27ac), a marker of active promoters and enhancers12. We then clustered cell types based on these cis-regulatory element patterns (Extended Data Fig. 7). Fine distinctions could be drawn between CD4+ T-cell subsets based on quantitative differences in H3K27ac at thousands of putative enhancers (Fig. 2a). These cell type-specific H3K27ac patterns correlate with the expression of proximal genes. In contrast, H3 lysine 4 mono-methylation (H3K4me1) was more uniform across subsets, consistent with its association to open or ‘poised’ sites shared between related cell types12.

Mapping of autoimmune disease PICS SNPs to these regulatory annotations revealed enrichment in B-cell and T-cell enhancers (Fig. 2a). A disproportionate correspondence to enhancers activated upon T-cell stimulation prompted us to examine such elements more closely. Substantial subsets of immune-specific enhancers markedly increase their H3K27ac signals upon ex vivo stimulation, often in conjunction with non-coding eRNA transcription, and induction of proximal genes (Fig. 2a, b). Compared to naive T cells, enhancers in stimulated T cells are strongly enriched for consensus motifs recognized by AP-1 transcription factors, master regulators of cellular responses to stimuli. PICS SNPs are strongly enriched within stimulus-dependent enhancers (P < 10−20 for combined PMA/ionomycin; P < 10−11 for combined CD3/CD28), whereas enhancers preferentially marked in unstimulated T cells show no enrichment for causal variants. Candidate causal SNPs were further enriched in T-cell enhancers that produce non-coding RNAs upon stimulation (1.6-fold; P < 0.01).

Delineation of tissue-restricted cis-regulatory elements across developmental lineages

Integrative analysis of haplotype-resolved epigenomes across human tissues.

Leung, D. et al.Nature 10.1038/nature14217

We performed ChIP-seq experiments to generate extensive datasets profiling 6 histone modifications across 16 human tissue-types from four individual donors (181 datasets). Combining with previously published datasets2,3, we conducted in-depth analyses across 28 cell/tissue-types, covering a wide spectrum of developmental states, including embryonic stem cells, early embryonic lineages and somatic primary tissue-types representing all three germ layers (Fig. 1a). The modifications demarcate active promoters (histone H3 lysine 4 trimethylation (H3K4me3) and H3 lysine 27 acetylation (H3K27ac)), active enhancers (H3 lysine 4 monomethylation (H3K4me1) and H3K27ac), transcribed gene bodies (H3 lysine 36 trimethylation (H3K36me3)) and silenced regions (H3K27 or H3K9 trimethylation (H3K27me3 and H3K9me3, respectively))4,5. We systematically identified cis-regulatory elements by employing a random-forest based algorithm (RFECS)2,6, predicting a total of 292,495 enhancers (consisting of 175,912 strong enhancers with high H3K27ac enrichment) across representative samples of all 28 tissues-types (Supplementary table 1). We additionally identified 24,462 highly active promoters with strong H3K4me3 enrichment (see Supplementary Information) (Supplementary table 2). Subsequently, we defined tissue-restricted promoters (n=10,396) and enhancers (n=115,222) (Extended Data Fig. 1a). Consistent with previous studies7-9, enhancers appear more tissue-restricted than promoters and cluster along developmental lineages (Extended Data Fig. 1b). Moreover, tissue-restricted enhancers were enriched for putative binding motifs of particular transcription factors (TFs) known to be important in maintaining the cell/tissue-type’s identity and function10-15 (Extended Data Fig. 2).

Developmental origin influences epigenomes

Regulatory network decoded from epigenomes of surface ectoderm-derived cell types.

Lowdon, R. F. et al.Nature Communications 10.1038/nature14248

In the absence of a strong skin tissue-specific epigenetic signature, we hypothesized that developmental origin is a major determinant of skin cell-type epigenetic patterns. We explored this hypothesis by focusing on skin keratinocytes and breast epithelial cells, which are both derived from surface ectoderm (SE)15. Consistent with their shared developmental origin, neonatal skin keratinocytes clustered with adult breast epithelial cell types based on DNA methylation values at the DMRs previously identified in skin and non-skin cell pairwise comparisons (Fig. 3c). To specifically define the DNA methylation signature of SE-derived cell types, we identified DMRs for each of the surface ectodermal cell types in a pairwise manner compared with neonatal skin melanocytes and fibroblasts, which are derived from other embryonic germ layers. There were 1,392 DMRs with the same methylation state in keratinocyte, breast myoepithelial and breast luminal epithelial cells relative to the two other cell types, which we inferred to be SE-specific DMRs (SE-DMRs) (Methods and Fig. 4a). Therefore, common developmental origin influences SE-derived cell epigenomes to a greater extent than does the shared skin tissue environment.

We examined whether SE-DMRs, similar to cell-type-specific DMRs, possessed regulatory potential. The majority (97%) of SE-DMRs were hypomethylated with 12% located in gene promoters and 40% within intergenic regions (Supplementary Fig. 8a). Hypomethylated SE-DMRs were enriched for promoter- and enhancer-associated histone modifications in both keratinocytes and breast myoepithelial cells, and for DNase I-hypersensitive sites in keratinocytes (Fig. 4b and Supplementary Fig. 8b). Hypomethylated SE-DMRs were also enriched for transcription factor binding motifs including TFAP2 and KLF4 (Fig. 4c); transcription factors that bind to these two motifs function in keratinocyte and mammary epithelium development, differentiation and/or maintenance of cell fate16, 17, 18, 19, 20. Genes associated with hypomethylated SE-DMRs were enriched for functions relevant to the biology of these cell types, such as ‘epidermis development’ (P-value=4.35e−15) and ‘mammary gland epithelium development’ (P-value=2.10e−9) (Fig. 4d Supplementary Data 5). DNA hypomethylation status of genes with hypomethylated SE-DMRs in their promoter regions correlated with increased expression in SE-derived cells relative to non-SE cells (Fig. 4e and Supplementary Table 6). These annotations suggested that the majority of SE-DMRs were at distal enhancer or gene promoter elements and regulate genes important for keratinocyte and mammary gland development.

Developmental dynamics of surface ectoderm regulatory elements

Regulatory network decoded from epigenomes of surface ectoderm-derived cell types.

Lowdon, R. F et al.Nature Communications 10.1038/ncomms7363

To explore the developmental dynamics of DNA methylation at SE-DMRs, we obtained whole-genome bisulfite sequencing (WGBS) data for samples representing early stages in SE development: H1 ESCs and ESCs differentiated to represent an early ectoderm developmental stage2. A majority of hypomethylated SE-DMRs were methylated in both early developmental stages, but hypomethylated in keratinocytes and mammary gland epithelia (Methods and Fig. 7a). The few exceptions are regulatory transcription factors. For example, the DMR near the TFAP2a promoter was demethylated in ESCs, whereas the DMR in KLF4 was methylated in ESCs but demethylated in early SE-differentiated cells. Both genes are most highly expressed in keratinocytes (Fig. 7b–e). The remaining hypomethylated SE-DMRs, many of which putatively regulate genes that are TFAP2a, TFAP2c or KLF4 targets in the network analysis, were lowly methylated only in differentiated cells. Accordingly, expression of these genes was increased in keratinocytes relative to H1 ESCs (Fig. 7f).

The role of intermediate methylation (IM) states in DNA is unclear. To comprehensively identify regions of IM and their quantitative relationship with gene activity, here we apply integrative and comparative epigenomics to 25 human primary cell and tissue samples. We report 18,452 IM regions located near 36% of genes and enriched at enhancers, exons, and DNase I Hypersensitivity sites. IM regions average 57% methylation, are predominantly allele-independent, and are conserved across individuals and between mouse and human, suggesting a conserved function. IM regions at enhancers have an intermediate level of active chromatin marks and their associated genes have intermediate transcriptional activity. Exonic IM correlates with a level of exon inclusion in between that of fully methylated and unmethylated exons, highlighting gene context-dependent functions. We conclude that intermediate DNA methylation is a conserved signature of gene regulation and exon usage.

Compartment switching

Chromatin architecture reorganization during stem cell differentiation.

Dixon, J. R. et al.Nature 10.1038/nature14222

Hi-C interaction maps provide information on multiple hierarchical levels of genome organization4. Previous studies demonstrated that the genome is organized into A and B compartments, containing relatively active and inactive regions, respectively5, 11. Currently, it is unclear if the A and B compartments change during differentiation and how this relates to lineage specification. We observe a large degree of spatial plasticity in the arrangement of the A/B compartments across cell types, with 36% of the genome switching compartments in at least one of the lineages analyzed (Supplemental methods; Figure 1a, Extended Data Figure 2a-c). Many of the A/B compartment transitions are lineage-restricted (Figure 1b). Notably, there appears to be a large expansion of the B compartment upon differentiation of hESCs to MSCs or in IMR90 fibroblasts. These two cell types have previously been shown to undergo an expansion of repressive heterochromatin modifications during differentiation13, 17. In this regard, there appears to be a similar redistribution of the spatial organization of their genomes as well. We observe that the regions that change their A/B compartment status typically correspond to a single or series of TADs (Figure 1a,c, Extended Data Figure 2d,e), suggesting that TADs are the units of dynamic alterations in chromosome compartments. Consistent with previous studies of individual loci 18, 19, 20, we found that genes that change from compartment A to B tend to show reduced expression, while genes that change from B to A tend to show higher expression (Figure 1d). In addition, lineage-restricted compartment A regions tend to include more lineage-restricted genes compared to other regions (Extended Data Figure 3a). While statistically significant, the overall patterns of change in expression are subtle. Reasoning that this modest correlation may be due to the possibility that only a subset of genes may be affected by compartment changes, while most genes remain unaffected, we identified a subset of 718 genes with co-variation between gene expression and compartment switching (Supplemental Methods, Extended Data Figure 3b,c, Figure 1e). These genes were enriched for low CpG content promoters (21.8% vs. 15.6% for non-concordant genes, p-value 8e-11, Fisher’s Exact Test), and several significant Gene Ontology (GO) terms, most notably related to extra-cellular proteins and extra-cellular matrix (Supplemental Table 3). Taken together, these results indicate that at a global level, there is a high degree of plasticity in the A and B compartments, yet relatively subtle corresponding changes in gene expression, indicating that the A and B compartments have a contributory but not deterministic role in determining cell type specific patterns of gene expression.

Age-related differentially methylated sites in monocytes and T cells

Age-related variations in the methylome associated with gene expression in human monocytes and T cells.

Reynolds, L. M. et al.Nature Communications 10.1038/ncomms6366

We first characterized DNA methylation at ~450,000 CpG sites across the genome in CD14+ purified cells (predominately monocytes) and CD4+ purified cells (T cells) collected from 227 MESA individuals. Using association analysis with a false discovery rate (FDR) threshold of 0.001, and adjusting for biological and technical covariates (Methods), we identified 2,285 monocyte specific age-dMS, 2,023 T cell specific age-dMS, and 572 overlapping age-dMS across the two cell types. We then expanded our monocyte sample size to 1,264 MESA individuals, and identified 37,911 CpG sites with age-associated methylation (~8% of all CpG sites, FDR<0.001; Fig. 1). The majority of age-dMS we detected in 227 T cell samples shared a similar effect direction between methylation and age in the 1,264 monocyte samples (Supplementary Fig. 1a and Supplementary Data 1). Many of the most significant age-dMS detected in both monocytes and T cells were previously reported to have age-associated methylation measured in whole blood 17, including CpG sites in ELOVL2 (ELOVL fatty acid elongase 2; cg16867657, prho = 0.66, FDR = 3.65x10-140), FHL2 (four and a half LIM domains 2; cg06639320, partial correlation (prho) = 0.55, FDR = 4.45x10-88), and PENK (proenkephalin; cg16419235, prho = 0.52, FDR = 2.85x10-75).

[…]

Potentially functional age-dMS were defined as CpG sites whose % methylation was associated with age (FDR<0.001) and with mRNA expression of any gene within one megabase of the CpG site in question (FDR<0.001). Among 227 T cell samples, 44 age-, and expression-associated methylation sites (age-eMS) were detected (2% of the 2,595 T cell age-dMS), with methylation correlated with age (prho ranging -0.54 – 0.70) and with cis-gene expression (prho ranging -0.62 – 0.56). Half of these T cell age-eMS (22 CpG sites) had methylation profiles associated with age in 1,264 monocyte samples; however, there was no replication of the association between methylation and gene expression for these 22 CpG sites in monocyte samples (Supplementary Fig. 1b).

DNA sequence motifs associated with cell-type-specific chromatin marks

Predicting the human epigenome from DNA motifs.

Whitaker, J. W., Chen, Z. & Wang, W. et al.Nature Methods 10.1038/nmeth.3065

To identify cell type– or mark–specific motifs, we separately clustered the motifs by cell type and modification specificity (Fig. 3a). The clusters contain motifs whose gene expression patterns matched their interplay with H3K27ac (Fig. 3b) and that had known associations with particular epigenomic modifications and cell types. For example, the SOX2 monomer motif was found to be associated with H3K27ac in H1 and NPC, whereas the OCT4-SOX2 heterodimer motif was found in only H1. This observation is consistent with the functional roles of OCT4 in H1 and SOX2 in both H1 and NPC25. The motif that is recognized by the four TEAD family members was associated with H3K27ac in all cell types, which is consistent with loss of H3 acetylation following deletion of a TEAD binding site26 (Supplementary Note).

Epigenomic footprints reveal Polycomb regulation of lncRNAs during cellular differentiation

Epigenomic footprints across 111 reference epigenomes reveal tissue-specific epigenetic regulation of lincRNAs.

Amin, V. et al.Nature 10.1038/ncomms7370

We next analyzed epigenomic programming of lincRNAs upon differentiation. We first examined dynamic epigenomic footprints within the mesodermal germ lineage, using CD8+ T-cells as a representative of the lineage. The focus was on a list of variable lincRNA TSSs that showed changes in at least one histone mark along the T-cell subtree. The following three stages of cellular differentiation were analyzed: (1) Embryonic stem cell H1; (2) CD34+ Hematopoietic stem cell; and (3) Fully differentiated CD8+ T-cells. By combining chromatin marks at the three stages into a single Spark analysis we aimed to identify groups of lincRNAs TSSs that show similar trajectories of epigenetic programming, each trajectory consisting of a distinct patterns of coordinated changes in histone marks as cells transition between the three stages.

As illustrated in Figure 4a, the largest Spark cluster (C1) consisted of a combination of Quiescent and Heterochromatin states and the second largest (C2) showed signs of Polycomb silencing. The smallest cluster (C6; Fig. 4a) showed Bivalent state. In contrast to those generally inactive lincRNA TSSs that were mostly located more than 50Kbp away from TSSs of protein-coding genes, the lincRNA TSSs in the Active Promoter state were generally located within 5Kbp of TSSs of protein coding genes (Fig. 4b).

Two clusters (C4 and C5) showed enhancer-like activation patterns upon differentiation. Cluster C5 showed an early activation pattern during a transition from embryonic stem cell state to CD34+ HSCs, while cluster C4 showed activation upon transition from CD34+ HSCs to CD8+ T-cells. The results of pathway enrichment analysis using GREAT tool (Supplementary Fig. 9a) are consistent with the different timing of activation: the early activating cluster (C5) is enriched for more generic terms such as “immunity” and “leukocyte” while the late activating cluster (C4) is enriched for more specific terms “lymphocyte” and “T-cell”. In contrast to both of these activating clusters, the inhibiting clusters C2 and C6 are enriched for pathways leading to other mesodermal lineages, consistent with the well-known Polycomb-mediated inhibition of such pathways during lineage specification31.

During differentiation along the hematopoietic lineage, a majority of Polycomb regulated lincRNA TSSs follow a Polycomb silencing trajectory (cluster C2 in Fig. 4a) while a minority retain bivalent state (cluster C4). Those that are silenced tend to be located further away from protein coding genes (Fig. 4b).

We next examined the pattern of epigenomic state transitions at lincRNA TSSs along the Polycomb silencing trajectory (cluster C2 in Fig. 4a and for others see Supplementary Fig. 10). Specifically, at each of the three stages of differentiation (ESCs, HSCs, T-cells) we counted lincRNA TSSs that belong to each of the fifteen ChromHMM states and determined counts for each of the 15x15 state transitions for both stages of differentiation. Transitions from bivalent (TssBiv, BivFlnk, EnhBiv) to repressed Polycomb (RepPC) state are prominent at the transition from ESCs to HSCs. Strikingly, transitions into Repressed Polycomb state practically disappear upon differentiation into T-cells while the transitions into weakly Polycomb repressed (RepPCWk) and particularly into Quiescent (Quies) states gain prominence.

As the bivalent states disappear during differentiation by being resolved into active or inactive states, the combinatorial diversity of histone marks at lincRNA TSSs diminishes and the lincRNA TSSs concentrate within a smaller number of states, particularly within the Quiescent state. This loss of combinatorial diversity may be quantitated using the entropy function over the fifteen states. As indicated in Figure 4d, entropy for lincRNA TSSs in cluster C2 decreases as cells differentiate. Notably, contribution of Polycomb-regulated states is dominant but not exclusive because the diversity of non-Polycomb marks also decreases.

We next examined this trend more broadly and found it across all three germ layers (Supplementary Fig. 11a-c). Specifically, transitions from bivalent (TssBiv, BivFlnk, EnhBiv) to repressed Polycomb (RepPC) state are prominent as the cells differentiate from ESCs toward specific lineages. As in the hematopoietic lineage, transitions into Repressed Polycomb state practically disappear during terminal differentiation while the transitions into weakly Polycomb repressed (RepPCWk) and Quiescent (Quies) states increase.

Epigenomic changes during ES-cell-derived neural progenitor cell differentiation

Dissecting neural differentiation regulatory networks through epigenetic footprinting.

Ziller, M. J. et al.Nature 10.1038/nature13990

Models derived from human pluripotent stem cells that accurately recapitulate neural development in vitro and allow for the generation of specific neuronal subtypes are of major interest to the stem cell and biomedical community. Notch signalling, particularly through the Notch effector HES5, is a major pathway critical for the onset and maintenance of neural progenitor cells in the embryonic and adult nervous system1-3. This can be exploited to isolate distinct populations of human embryonic stem-cell-derived neural progenitor cells4. Here we report the transcriptional and epigenomic analysis of six consecutive stages derived from a HES5::e–GFP HES5–GFP reporter human embryonic stem cell line5 differentiated along the neural trajectory. In order to dissect the regulatory mechanisms that orchestrate the stage-specific differentiation process, we developed a computational framework to infer key regulators of each cell-state transition based on the progressive remodelling of the epigenetic landscape and then validated these through a pooled short hairpin RNA screen. Taken together, we demonstrate the utility of our system and outline a general framework, not limited to the context of the neural lineage, to dissect regulatory circuits of differentiation.

Context-dependent rewiring of transcription-factor binding

Transcription factor binding dynamics during human ES cell differentiation

Tsankov, A. M. et al.Nature 10.1038/nature14233

Pluripotent stem cells provide a powerful system to dissect the underlying molecular dynamics that regulate cell fate changes during mammalian development. Here we report the integrative analysis of genome wide binding data for 38 transcription factors with extensive epigenome and transcriptional data across the differentiation of human embryonic stem cells to the three germ layers. We describe core regulatory dynamics and show the lineage specific behavior of selected factors. In addition to the orchestrated remodeling of the chromatin landscape, we find that the binding of several transcription factors is strongly associated with specific loss of DNA methylation in one germ layer and in many cases a reciprocal gain in the other layers. Taken together, our work shows context-dependent rewiring of transcription factor binding, downstream signaling effectors, and the epigenome during human embryonic stem cell differentiation.

Figure 1: Chromatin states and DNA methylation dynamics.
figure 1

a. Chromatin state definitions, abbreviations, and histone mark probabilities. b. Average genome coverage. Genomic annotation enrichments in H1-ESC. c. Active and inactive gene enrichments in H1-ESC (see Extended Data 2b for GM12878). d. DNA methylation. e. DNA accessibility. d-e. Whiskers show 1.5 * interquartile range. Circles are individual outliers. f. Average overlap fold enrichment for GERP evolutionarily conserved non-coding regions. Bars denote standard deviation. g. DNA methylation (WGBS) density (color, ln scale) across cell types. red=max ln(density+1). Left column indicates tissue groupings, full list shown in Extended Data 4f. h. DNA methylation levels (left) and TF enrichment (right) during ESC differentiation. i. Chromatin mark changes during cardiac muscle differentiation. Heatmap=average normalized mark signal in Enh. C5 cluster enrichment54.

Figure 2: Epigenome relationships.
figure 2

a. Hierarchical epigenome clustering using H3K4me1 signal in Enh states. Numbers indicate bootstrap support scores over 1,000 samplings. b-c. Multidimensional scaling (MDS) plot of cell type relationships based on similarity in H3K4me1 signal in Enh states (b) and H3K27me3 signal in ReprPC states (c). First four dimensions shown as dim1 vs. dim2 and dim3 vs. dim4.

Figure 3
figure 3

Hierarchical clustering of reference epigenomes using additional marks. a-d. Clustering of 111 Roadmap Epigenomes using H3K4me3 (a), H3K27me3 (b), H3K36me3 (c) and H3K9me3 (d) signal in TssA, ReprPC, Tx and Het chromatin states, respectively. All panels show hierarchical clustering with optimal leaf ordering. Colors indicate sample groups (Fig. 2b). Numbers on internal nodes represent bootstrap support scores over 1,000 bootstrap samples.

Figure 4: Hierarchical clustering of epigenomes using diverse marks.
figure 4

a-e. Clustering of all 127 reference epigenomes, including ENCODE samples, using H3K4me1, H3K4me3, H3K27me3, H3K36me3 and H3K9me3 signal in Enh, TssA, ReprPC, Tx and Het chromatin states, respectively. All panels show hierarchical clustering with optimal leaf ordering. Colors indicate sample groups, as defined in Fig. 2. Numbers on internal nodes represent bootstrap support scores over 1,000 bootstrap samples.

Figure 5: Epigenetic fine-mapping of enhancers.
figure 5

a, Heatmaps show H3K27ac and H3K4me1 signals for 1000 candidate enhancers (rows) in 12 immune cell types (columns). Enhancers are clustered by the cell type-specificity of their H3K27ac signals. Adjacent heatmap shows average RNA-seq expression for the genes nearest to the enhancers in each cluster. Gray-scale (right) depicts the enrichment of PICS autoimmunity SNPs in each enhancer cluster (hypergeometric p-values calculated based on the number of PICS SNPs overlapping enhancers from each cluster, relative to random SNPs from the same loci). The AP-1 motif is over-represented in enhancers preferentially marked in stimulated T-cells, compared to naïve T-cells. b, Candidate causal SNPs displayed along with H3K27ac and RNA-seq signals at the PTGER4 locus. A subset of enhancers with disease variants (shaded) shows evidence of stimulus dependent eRNA transcription. c, Stacked bar graph indicates percentage overlap with immune enhancers and coding sequence for PICS SNPs at different probability thresholds, compared to control SNPs drawn from the entire genome (All SNPs) or the same loci (Locus CTRL). d, Venn diagram compares PICS SNPs to GWAS catalog SNPs with indicated r2 thresholds. e, Bar graph indicates percentage overlap with annotated T-cell enhancers for PICS SNPs, GWAS SNPs at indicted thresholds, locus control SNPs, and three subsets of SNPs defined and shaded as in panel d.

Figure 6: Tissue--‐restricted enhancers are enriched for TF motifs important for cell identity and/or function.
figure 6

Significantly enriched motifs (p-­‐value<10e-­‐10) across all 28 tissues are divided into 29 clusters (method described in Supplementary Information). An overall p-­‐value is generated for the enrichment of each tissue for each cluster. The figure illustrates –log(p-­‐value) of a) pancreas b) anterior caudate[...]

Figure 7: Identification and characterization of SE-DMRs.
figure 7

a) Venn diagram showing SE-specific DMRs, defined as the overlap of keratinocyte, breast myoepithelial and luminal epithelial cell DMRs. (b) Enrichment of H3K4me1, H3K4me3, H3K27ac and DNAse I-hypersensitivity at SE-DMRs in one keratinocyte sample replicate. Each heat map column represents histone modification ChIP-seq or DNAse-seq signal at 500 bp SE-DMRs ±5 kb. Each heat map row represents a single hypomethylated SE-DMR, ordered by decreasing H3K4me1 signal, then increasing H3K4me3 signal. (c) Bar plot of enrichment values for top ten enriched TFBS motifs determined by motif scanning of hypomethylated SE-DMRs using FIMO47 (Methods). Enrichment based on hg19 genome background. (d) Selected GO terms enriched for hypomethylated SE-DMRs. P-value of enrichment calculated by GREAT14. Full list of enriched GO terms is in Supplementary Data 5. (e) Box plots showing RNA expression levels for genes with hypomethylated SE-DMRs in promoter regions. Skin cell-type RNA-seq RPKM values over exons are averages (mean) of three biological replicates; luminal epithelial and myoepithelial values are a single biological replicate. The middle line indicates the median value, top and bottom box edges are the third- and first-quartile boundaries, respectively. The upper whisker is the highest data value within 1.5 times the interquartile range; the lower whisker indicates the lowest value within 1.5 times the interquartile range. The interquartile range is the distance between the first and third quartiles. Points indicate data beyond whiskers. Logarithmic scale transformation was applied before box plot statistics were computed. RPKM distributions for SE cell-type expression levels versus non-SE cell-type expression levels are statistically significant (Wilcoxon-ranked test, paired, *P-value0.02; n=150 genes) (F, fibroblasts; K, keratinocytes; Lum, breast luminal epithelial cells M, melanocytes; Myo, breast myoepithelial cells; Supplementary Table 6).

Figure 8: DNA methylation dynamics of SE-DMRs across samples from different developmental stages.
figure 8

(a) Heatmap and clustering dendrogram based on average CpG DNA methylation values of hypomethylated SE-DMRs for different developmental samples. Each row represents one of 1,307 DMRs for which there are CpGs with ≥10 × coverage in WGBS data. Methylation values for H1 ESCs, ectoderm differentiated ESCs (‘EC’) and keratinocyte (‘K’) are from WGBS; breast luminal (‘Lu’) and myoepithelial (‘My’) values are the average of single CpG methylCRF predictions in each DMR. MethylCRF predictions are based on MeDIP-seq and MRE-seq data for these samples (Methods). A value of ‘1’ is fully methylated; ‘0’ is completely unmethylated. (b) KLF4 gene body SE-DMR average CpG DNA methylation levels across developmental stages. (c) KLF4 RNA expression across developmental stages. Values are RPKM over coding exons; error bars for keratinocytes are s.e.m., n=3. Sample abbreviations as in a. (d) TFAP2A promoter SE-DMR average CpG DNA methylation levels across developmental stages. (e) TFAP2A RNA expression across developmental stages. Values are RPKM over coding exons; error bars for keratinocytes are s.e.m., n=3. Sample abbreviations as in a. (f) RNA expression levels in keratinocytes relative to H1 ESCs for selected genes with hypomethylated SE-DMRs in their promoters. These SE-DMRs, as in the majority of hypomethylated SE-DMRs, were methylated in H1 and ectoderm-differentiated ESCs but lowly methylated in differentiated SE cell types. Increased expression relative to an earlier developmental sample suggests these DMRs are transcriptional regulatory regions for their associated genes.

Figure 9: Dynamic reorganization of chromatin structure during differentiation of hESCs.
figure 9

a, First principle component (PC1) values and Hi-C interaction heat maps in H1 ES cells and H1-derived lineages. PC1 values are used to determine theA/B compartment status of a given region, where positive PC1 values represent “A” compartment regions (blue), and negative values represent “B” compartment regions (yellow). Dashed lines indicate TAD boundaries in ESCs. b, K-means clustering (k=20) of PC1values for 40kb regions of the genome that change A/B compartment status in at least one lineage. c, K-means clustering of PC1 values surrounding TAD boundaries. d, Distribution of fold-change in gene expression for genes that change compartment status (“A to B” or “B to A”) or that remain the same (“stable”) upon differentiation. e, Genome browser of two genes where one (OTX2)shows concordant expression and PC1 values while a second (TMEM260) does not.

Figure 10
figure 10

Replication of T cell Age-dMS and Age-eMS in 1,264 monocyte samples. Analysis of age and methylation in 227 T cell (CD4+) samples included 436,393 CpG methylation sites, of which 2,595 had methylation associated with age (age-dMS; FDR<0.001): 546 negatively associated with age (hypo age-dMS), 2,049 positively associated with age (hyper age-dMS); see Supplementary Data 1. a) Comparison of the partial correlation (prho) between methylation and age in 227 T cell samples (x-axis) compared to the correlation detected in 1,264 monocyte samples (CD14+, y-axis). The most significant T cell age-dMS (red circle, cg16867657, prho = 0.70, FDR = 6.72x10-28) was detected on chromosome 6 in the ELOVL2 (ELOVL fatty acid elongase 2) promoter.

Figure 11: The specificities of interplay between DNA motifs and the epigenome
figure 11

(a) Left, 589 motif groups hierarchically clustered by their interplay with epigenomic modification. Each row represents a different motif, and the positions are colored if the motif associates with the modification. The first six columns show positive interplay (when a motif is enriched within a modification peaks), and the last six columns show negative interplay (when a motif is depleted in the modification peaks). The rightmost bar (and the leftmost bar on the right subpanel) indicates groups of motifs that are specific to certain modifications or combinations thereof. These bars follow the same color scheme as the heat map. Additionally, purple represents H3K4me1 and H3K27ac, which corresponds to active enhancers. Right, groups clustered by cell-type specificity. Here additional colors represent the following combinations of cell types: black, positive interplay with both H1 and NPC; green, positive interplay with both MSC and TBL; magenta, positive interplay with all cell types. Center, example motifs: the known motif (top) and the identified de novo motif (bottom). (b) Left, positive interplay between H3K27ac and TFs. Right, normalized expression values of the genes. Gene expression values were taken from ref. 17and normalized for each gene separately. The low expression levels of FOS in TBL can be explained by the ability of JUN to bind the AP-1 binding site as a homodimer48. (c) Modification-specific motifs. The motif group numbers and consensus sequences are given. Motifs in bold match known motifs (see text).

Figure 12
figure 12

(a) Spark clustering reveals coordinated changes in histone marks between human embryonic stem cells (H1), hematopoietic stem cells (CD34+), and T-lymphocyte cells (CD8+). Black barplot indicates number of lincRNA TSS that show specific pattern of epigenetic programming across the three developmental time points. (b) Absolute distance of lincRNA TSS to the nearest protein coding TSS determined using GREAT basal + extension rule (1kb downstream + 5kb upstream + upto 500 kb distal). The absolute distances are binned into <5kb, 5-50kb, and >50-500 kb windows. (c) ChromHMM defined states transitions between Embryonic Stem Cells (ESCs) to Hematopoietic Stem Cells (HSCs) and from HSCs to T-cells were mapped for lincRNA TSS in C2. Size of the node reflects the number of states and edge width reflects the number of transitions. Transitions greater than 30% relative to each state are shown in the arc diagram. (d) Barplot showing Shannon entropy calculated for all states, Polycomb states, or non-Polycomb states for the three developmental time points (ESCs, HSCs, and T-cells). We have developed an on-line tutorial (link: http://genboree.org/theCommons/projects/aminv-natcomm-2015/wiki ) on how to use on-line tools integrated within the Genboree Workbench to carry out the types of analyses reported in this Figure.

Figure 13: Consecutive stages of ES cell derived neural progenitors are characterized by distinct epigenetic states
figure 13

a. Left: Schematic of the cell system. Middle: Normalized read-count level for H3K27ac over a 1.4 mega base (mb) region around the SOX2 locus (chr3:180,854,252-182,259,543). ChIP-Seq read counts were normalized to 1 million reads and scaled to the same level (1.5) for all tracks shown. Right: Additional tracks for H3K4me3, H3K4me1 and H3K27me3 as well as DNAme (scale 0-100%), OTX2 and expression covering a 100 kilo base (kb) sub-region (chr3:181,389,523-181,490,148) of this locus. Histone and RNA-Seq data were normalized to 1 million reads and are shown on distinct scales.

b. Maximum gene set activity levels shown as z-scores for genes expressed in defined brain structures (left) and developmental time points (right) based on the mouse Allen Brain Atlas. Gene set activity was defined as average expression level of all member genes followed by z-score computation across all nine cell types.

Abbreviations: Rostral secondary prosencephalone (RSP), Telencephalon (Tel), peduncular (caudal) hypothalamus (PHy), Hypothalamus (p3), pre-thalamus (p2), pre-tectum (p1), midbrain (M), prepontine hindbrain (PPH), pontine hindbrain (PH), pontomedullary hindbrain (PMH), medullary hindbrain (MH); and embryonic (E)11.5, E13.5, E15.5, E18.5 as well postnatal P4, P14 and P28.

c. Distribution of DNAme levels for differentially methylated regions (delta meth≥0.2, p≤0.01) across state transitions, For instance, distributions for regions gaining methylation in the transition from ES cell to NE (top left) at all stages of differentiation. Distinct methylation level trace plots are shown for regions gaining methylation (left) during the specific transitions (indicated on the side) and loss of methylation (right). Black labeled samples are based on WGBS data and grey color samples (LRG and LNP) were profiled by RRBS.

d. Barplot of the frequency and associated mark of epigenetic changes for all cell state transitions broken up into gain and loss for consecutive differentiation stages.