The Polycomb repressive complexes PRC1 and PRC2 maintain embryonic stem cell (ESC) pluripotency by silencing lineage-specifying developmental regulator genes1. Emerging evidence suggests that Polycomb complexes act through controlling spatial genome organization2,3,4,5,6,7,8,9. We show that PRC1 functions as a master regulator of mouse ESC genome architecture by organizing genes in three-dimensional interaction networks. The strongest spatial network is composed of the four Hox gene clusters and early developmental transcription factor genes, the majority of which contact poised enhancers. Removal of Polycomb repression leads to disruption of promoter-promoter contacts in the Hox gene network. In contrast, promoter-enhancer contacts are maintained in the absence of Polycomb repression, with accompanying widespread acquisition of active chromatin signatures at network enhancers and pronounced transcriptional upregulation of network genes. Thus, PRC1 physically constrains developmental transcription factor genes and their enhancers in a silenced but poised spatial network. We propose that the selective release of genes from this spatial network underlies cell fate specification during early embryonic development.
To understand the potential role of spatial genome organization in maintaining ESC self-renewal potential, we analyzed mouse ESC data from promoter capture Hi-C (CHi-C; a high-throughput chromosome conformation capture assay) using GOTHiC (Genome Organization Through Hi-C) (B.M., I. Martincorena, E. Darbo, R.S. and S.S. et al., unpublished data). Spatial genome architecture can be interrogated at high resolution by combining sequence capture with 3C10 (chromosome conformation capture) or Hi-C11. Promoter CHi-C specifically enriches Hi-C12 libraries for promoters and their spatially contacting DNA elements13,14, providing genome-wide, restriction fragment–resolution chromosomal contact data for 22,225 annotated mouse promoters14. Our unbiased approach detected strong enrichment for long-range contacts between promoters bound by Polycomb group proteins, especially the PRC1 component RING1B (Supplementary Fig. 1a). In particular, among the spatially interacting Polycomb-bound genes, analysis of promoter-promoter contacts identified an unusually strong intra- and interchromosomal spatial network comprising gene promoters from the four Hox gene clusters (Hoxa, Hoxb, Hoxc and Hoxd). We detected extensive contacts for gene promoters across all four Hox clusters, with particularly strong connections between Hoxb and Hoxc and between Hoxb and Hoxa (Supplementary Fig. 1b). In addition, after filtering out promoters <10 Mb away from the Hox clusters to exclude contacts based merely on genomic proximity, we found 66 other gene promoters with strong direct intra- and interchromosomal contacts to the Hox clusters, with these genes occupying a central position within the global promoter network (Fig. 1a,b, Supplementary Fig. 1c and Supplementary Table 1).
Gene Ontology and protein domain analyses showed that this Hox network was enriched for the promoters of developmental transcription factor genes encoding homeobox and paired-box domain proteins, which control body plan specification, morphogenesis and organogenesis (Supplementary Fig. 1d,e). Polycomb group proteins epigenetically repress Hox and developmental transcription factor genes, which are enriched for bivalent chromatin marks in ESCs15,16. Analysis of PRC1 occupancy via RING1B chromatin immunoprecipitation and sequencing (ChIP-seq) data17 demonstrated that approximately 80% of the promoters in the Hox network were bound by PRC1 (Fig. 1a,b and Supplementary Table 1), with 85.9% of these also harboring the active chromatin mark trimethylation of histone H3 at lysine 4 (H3K4me3). We also identified five smaller independent spatial networks, mainly formed by long-range intrachromosomal contacts between PRC1-bound promoters (Fig. 1a and Supplementary Table 1). The Hox network promoters were distinguished from other PRC1-regulated genes by having significantly greater enrichment of RING1B at RING1B ChIP-seq peaks (P = 2.014 × 10−14) (Supplementary Fig. 1f), suggesting that they are high-affinity, high-occupancy PRC1-binding sites. These data suggest that the Hox clusters act as central three-dimensional nucleation points for a network of genes bound by PRC1 with high affinity and that this spatial network is a major constraint on ESC genome organization.
To address the role of PRC1 in this spatial network, we generated promoter CHi-C libraries from mouse ESCs lacking one or both genes encoding the catalytic core E3 ubiquitin ligase subunits of all known PRC1 complexes1: RING1A (constitutive RING1A knockout) and RING1B (tamoxifen-induced Cre-mediated RING1B knockout in the RING1A-knockout background (RING1A/RING1B double knockout)) (Ring1A−/–; Ring1Bfl/fl; Rosa26::CreERT2 genotype). Knockout of both RING1A and RING1B resulted in undetectable levels of RING1B and monoubiquitination of histone H2A at lysine 119 (H2AK119ub1), while cells retained key features of pluripotency (Supplementary Fig. 2a and Supplementary Table 2).
In RING1A-knockout cells, the Hox network showed reduced connectivity as compared to wild-type ESCs, as did the five smaller networks, which were no longer discernible (Fig. 1c,d and Supplementary Fig. 2b). In double-knockout cells, the Hox spatial network was completely disrupted (Fig. 1c,d and Supplementary Fig. 2c), demonstrating that PRC1 is essential to maintain the Hox spatial network. We also found that intrachromosomal contacts within all four Hox clusters were substantially reduced, indicating that PRC1 has a critical role in maintaining the Hox clusters in compact repressive domain structures (Fig. 1e and Supplementary Fig. 2d–f)18,19,20,21.
We validated selected PRC1-dependent intra- and interchromosomal contacts by 3C and three-dimensional DNA FISH. Three-dimensional DNA FISH analysis showed that, for the Hox network, genes on the same chromosome were significantly closer than intervening control regions in both wild-type and RING1A-knockout cells (P < 0.0001). These Hox network associations were significantly reduced in cells with double knockout of RING1A and RING1B (P ≤ 0.037) (Fig. 2a,b and Supplementary Fig. 3). Similarly, interchromosomal associations between Hox network members were reduced in double-knockout cells (Fig. 2d,e). 3C analyses extended and confirmed these results (Fig. 2c,f and Supplementary Figs. 4 and 5). Collectively, these data validate the promoter CHi-C results identifying a complex, PRC1-dependent spatial network centered on the four Hox clusters and key developmental regulator genes.
Recent work has implicated EED, a PRC2 complex component, in mediating contacts between Hox gene clusters5. However, the relative contributions of PRC1 and PRC2 to spatial chromosome organization are unknown. We used ChIP-seq data (Supplementary Table 3) to classify Polycomb-bound promoters as being bound by PRC2 only (PRC2) or by PRC1 and PRC2 (PRC1/PRC2) (note that 98% of PRC1-bound promoters were also occupied by PRC2) (Supplementary Fig. 6a,b) and compared the spatial network connectivity of these promoter classes. We again found that the Hox network was highly centralized, and it was composed almost exclusively of PRC1/PRC2 promoters with extremely high connectivity (Fig. 3a). In general, we found that PRC1/PRC2 promoters had substantially higher connectivity than PRC2 promoters (Fig. 3a–c). PRC1/PRC2 promoters generally had higher occupancy for EZH2 and SUZ12 (two PRC2 complex components) (Supplementary Fig. 6b) and more often overlapped with peaks of trimethylation of histone H3 at lysine 27 (H3K27me3) than PRC2 promoters (Supplementary Fig. 6a). However, even when comparing promoter subsets from the two classes with similar H3K27me3 levels or EZH2 or SUZ12 occupancy, the connectivity between PRC1/PRC2 promoters was higher than between PRC2 promoters (Fig. 3d and data not shown). We next interrogated select long-range interactions involving Hox clusters in EED-knockout (Eed−/−) ESCs15 (Supplementary Fig. 6c) by 3C. We found that Hox cluster interactions were still detectable in EED-knockout cells, although in some cases with reduced ligation product frequencies in comparison to wild-type ESCs (Fig. 3e and Supplementary Fig. 6d–g).
We observed that PRC1/PRC2 promoters were spatially segregated from and had significantly stronger connectivity (P < 0.00001) than the previously identified three-dimensional pluripotency networks14,22 that are formed through contacts between promoters bound by OCT4, SOX2 and NANOG (Fig. 3b,c). The pluripotency networks were largely unaltered in RING1A/RING1B double-knockout ESCs (Fig. 3b,f and Supplementary Fig. 7a). In contrast, genes in linear genomic proximity to PRC1-bound genes showed higher colocalization than control genes, and this colocalization was reduced in double-knockout cells (Supplementary Fig. 7b), indicative of 'bystander' effects as a result of spatial genome rearrangement upon loss of RING1B. Collectively, our data identify PRC1 as a major regulator of three-dimensional genome architecture in ESCs.
Following on from our analysis of promoter-promoter contacts, we next focused on promoter contacts with putative regulatory (non-promoter) elements (Supplementary Table 2), by integrating ChIP-seq data (Supplementary Table 3) with promoter CHi-C data. We found that Hox network promoters contacted genomic regions enriched for RING1B, monomethylation of histone H3 at lysine 4 (H3K4me1), H3K4me3, H3K27me3, p300, CTCF and cohesin but depleted for acetylation of histone H3 at lysine 27 (H3K27ac), trimethylation of histone H3 at lysine 36 (H3K36me3) and trimethylation of histone H3 at lysine 9 (H3K9me3) (Fig. 4a,b). Combinations of histone marks have been used to distinguish functional classes of enhancers23,24,25. To characterize the contacts of RING1B-bound promoters with enhancers, we classified enhancers as previously described23,24,25: active (H3K4me1 and H3K27ac), intermediate (H3K4me1) or poised (H3K4me1 and H3K27me3) (Supplementary Table 4). We found that Hox network promoters and (to a lesser but still highly significant extent) RING1B-bound promoters were highly enriched in contacts with poised enhancers in comparison to expression-matched promoters (Fig. 4c and Supplementary Fig. 8a,b). The majority of these Hox network promoters maintained contacts with at least one poised enhancer in RING1A/RING1B double-knockout cells (Fig. 4c,d), demonstrating that RING1B is not necessary to maintain these contacts, despite a substantial fraction of these enhancers being occupied by RING1B in wild-type ESCs (Fig. 4e). Thus, RING1B is necessary to maintain promoter-promoter contacts but not promoter-enhancer contacts in the Hox network.
To correlate three-dimensional promoter-promoter and promoter-enhancer circuitry with transcriptional output, we generated nuclear RNA sequencing (RNA-seq) libraries from RING1A-knockout and RING1A/RING1B double-knockout ESCs. In agreement with previous microarray steady-state expression analysis26, promoters occupied by RING1B in wild-type cells showed significant derepression in RING1A/RING1B double-knockout cells (false discovery rate (FDR) < 0.05) (Fig. 5a,b). RING1B target genes preferentially formed new contacts with active genes in double-knockout cells (Supplementary Fig. 8c), concomitant with their loss of contacts with other RING1B-regulated genes (Fig. 1d). Strikingly, Hox network genes, most of which maintained contacts with poised enhancers in double-knockout cells (Fig. 4c), showed the most significant transcriptional upregulation, correlating with the disruption of promoter-promoter contacts within the network (Fig. 5b). We further found that the subset of RING1B-regulated genes that maintained contacts with poised enhancers also showed significant upregulation of expression (Fig. 5c). In contrast, only a small proportion of RING1B-regulated promoters (25.9%) gained contacts with active enhancers, and such gains correlated poorly with changes in gene expression (Supplementary Fig. 8d,e). This observation demonstrates that gaining contacts with active enhancers is not the major mechanism of transcriptional upregulation at PRC1 target genes. Thus, our data suggest that silencing is maintained through PRC1-mediated promoter-promoter contacts and that preformed contacts between promoters and poised enhancers may have a role in transcriptional upregulation upon PRC1 removal.
To investigate potential epigenetic changes at enhancers in RING1A-knockout and RING1A/RING1B double-knockout cells, we generated ChIP-seq profiles for H3K4me1 and H3K27ac and used published H3K27me3 data27 (Supplementary Fig. 8f and Supplementary Table 3). We found that poised enhancers that maintained contacts with Hox network and RING1B-bound promoters showed a transition toward an active state (loss of H3K27me3 and gain of H3K27ac) in RING1A/RING1B double-knockout cells (Fig. 5d). The status of active and intermediate enhancers contacting RING1B-bound promoters remained largely unchanged (Supplementary Fig. 8g), whereas a subset of intermediate enhancers that maintained contacts with Hox network promoters underwent a transition toward an active state, although this was less pronounced than was observed for poised enhancers (Supplementary Fig. 8h). Thus, gain in acetylation was most prominent at enhancers with preexisting Hox network promoter contacts, and such gains correlated with the most pronounced transcriptional upregulation (Fig. 5e,f).
Here we identify an unusually strong PRC1-dependent spatial network in ESCs, composed of the four Hox gene clusters, key developmental genes and their associated poised enhancers. We observe complete dissociation of the promoter-promoter contacts in this network upon PRC1 knockout. However preexisting contacts between Hox network genes and poised enhancers are largely maintained. These enhancers transition to an active chromatin state in the knockout cells, which correlates with significant upregulation of the genes they contact. Thus, we speculate that this higher-order genome organization mediated by PRC1 is a key determinant in maintaining network genes in a silent state, poised for activation during early development. Similar silencing mechanisms involving three-dimensional genome organization may be evolutionarily conserved, as contacts between Polycomb target genes exist in distantly related species2,3,28 and preformed contacts between developmental genes and regulatory sequences have been observed in Drosophila melanogaster development29. Thus, PRC1 physically constrains developmental genes in a repressive three-dimensional spatial network in pluripotent stem cells, and we propose that selective release of genes from this network results in transcriptional upregulation and underlies key cell fate decisions associated with organogenesis and body plan specification in early development.
J1 (129S4/SvJae) and 129SvJae/Cast wild-type mouse ESCs were grown under standard ESC culture conditions7. Ring1A−/−; Ring1Bfl/fl; Rosa26::CreERT2 mouse ESCs (RING1A knockout) were cultured as described previously7. Eed+/+ G2.1 and Eed−/− G8.1 mouse ESCs15 were cultured as described previously17. For wild-type, RING1A-knockout, Eed+/+ G2.1 and Eed−/− G8.1 mouse ESCs, dishes were coated with 0.1% gelatin and irradiated mouse embryonic fibroblasts (MEFs). Conditional deletion of Ring1B in RING1A-knockout cells was carried out by the addition of 800 nM tamoxifen for 48 h as previously described7. For collection, ESCs were trypsinized and preplated twice for 30 min each plating to remove contaminating feeder cells.
Extracts, immunoblot analysis and antibodies.
To obtain whole-cell extracts, cells were collected, washed once with PBS and washed twice with cold TB buffer (20 mM HEPES, pH 7.3, 110 mM potassium acetate, 5 mM sodium acetate, 2 mM magnesium acetate, 1 mM EGTA, 2 mM DTT and protease inhibitors (Roche)). Extracts were then incubated for 30 min at 37 °C in TB buffer with 0.1% NP-40, 10 mM MnCl2 and 20 μg/ml DNase I (Life Technologies). For acid extraction of histones, cells were trypsinized, washed with room-temperature PBS and incubated in ice-cold PBS containing protease inhibitors for 10 min. Cells were spun for 4 min at 235g. The pellet was resuspended in 1 ml of ice-cold 0.2 M H2SO4 and incubated on ice for 30 min. Samples were then spun for 2 min at 20,000g at 4 °C. Histones were precipitated with 25% TCA for 30 min at 4 °C. Precipitated histones were pelleted by spinning for 10 min at 20,000g at 4 °C and washed twice with 1 ml of ice-cold acetone for 10 min on ice for each wash. Histones were centrifuged for 10 min at 20,000g at 4 °C and dissolved in 100 mM Tris (pH 7.6).
Immunoblot blocking, antibody incubations and washes were in milk powder dissolved in PBS (5%; pH 8) with 0.1% Tween-20. Primary antibodies (H2AK119ub1 (05-678, Millipore) (1:500 dilution); H3K27me3 (07-449, Millipore), EED (09-774, Millipore), EZH2 (pAB-039-050, Diagenode) and RING1B31 (1:1,000 dilution); H3 (ab1791, Abcam) (1:10,000 dilution); OCT4 (ab19857, Abcam) (1:200 dilution); and NANOG (RCAB0002, Reprocell) (1:500 dilution)) were incubated with blots overnight. Secondary antibodies (sheep anti-mouse IgG linked to horseradish peroxidase (HRP) (GE Healthcare; 1:2,000 dilution), donkey anti-rabbit IgG linked to HRP (GE Healthcare; 1:2,000 dilution) or anti-mouse IgM linked to HRP (Dako; 1:1,500 dilution)) were incubated with blots for 1 h.
Immunofluorescence analysis was carried out essentially as described previously32,33. ESCs were split onto poly-L-lysine slides coated with 0.1% gelatin in PBS for 3 h. The antibodies used were to RING1B31 (1:50 dilution) and OCT4 (ab19857, Abcam; 1:200 dilution). The secondary antibodies used were Alexa Fluor 488–conjugated goat anti-rabbit IgG (H+L) (A11008, Molecular Probes; 1:400 dilution) and Alexa Fluor 568–conjugated goat anti-mouse IgG (H+L) (A11031, Molecular Probes; 1:400 dilution). Images were captured using an Olympus BX61 multicolor fluorescence microscope.
Promoter capture Hi-C.
ESCs (3–4 × 107; RING1A knockout or RING1A/RING1B double knockout) were fixed in 2% formaldehyde for 10 min, and promoter capture Hi-C was performed essentially as described previously14. Hi-C DNA was amplified with 9 pre-capture PCR amplification cycles using the PE PCR 1.0 and PE PCR 2.0 primers (Illumina). Hi-C DNA was hybridized to a custom-designed capture bait system consisting of biotinylated RNAs targeting the HindIII restriction fragment ends of 22,225 mouse gene promoters14 (Agilent Technologies). Biotin pulldown (MyOne Streptavidin T1 Dynabeads, Life Technologies) and washes were performed following the SureSelect target enrichment protocol (Agilent Technologies), and post-capture PCR (four amplification cycles using Illumina PE PCR 1.0 and PE PCR 2.0 primers) was performed on DNA bound to the beads via biotinylated RNA. Promoter capture Hi-C libraries were sequenced (50-bp paired-end reads) on the HiSeq 1000 platform (Illumina).
Cells were fixed in 2% formaldehyde for 10 min, and 3C was performed essentially as previously described34. 3C DNA was purified using an Amicon Ultracel 0.5-ml column. For promoter CHi-C validation, long-range 3C-PCR amplicons were designed by combining a bait primer (located within a captured promoter HindIII fragment) with primers (Supplementary Table 5) in contacting or non-contacting HindIII fragments, as determined by promoter CHi-C. To generate a standard curve for PCR, the corresponding ligation products were generated from a template library by digestion and ligation of the corresponding BAC DNA (Life Technologies) (Supplementary Table 5). To control for cross-linking and ligation efficiency in individual 3C libraries, short-range 3C-PCR amplicons were designed for each of the Hox clusters (Hoxa5-Hoxa7, Hoxb7-Hoxb9, Hoxc10 5′–Hoxc10 3′ and Hoxd12-Hoxd13) and the Hist1 cluster (Hist1h2ae 5′–Hist1h2ae 3′). In the case of interchromosomal contacts, this control was performed by analyzing the contact frequency between the corresponding Hox cluster and Calr. For each of the three cell types (J1 wild-type ESCs, RING1A-knockout ESCs and RING1A/RING1B double-knockout ESCs), two independent biological replicates were analyzed. The identity of the 3C ligation products was verified by DNA sequencing.
Three-dimensional DNA FISH.
BAC (Life Technologies) DNA (Supplementary Table 5) was purified and chemically coupled with Alexa Fluor 488 or Alexa Fluor 555 reactive dye (Life Technologies) according to the manufacturer's instructions, as described previously14. Three-dimensional DNA FISH was performed as described previously35. DNA FISH signals were imaged and analyzed with the MetaCyte automated imaging system (MetaSystems). Three-dimensional distances between the specified genomic loci were calculated (Supplementary Table 6). For comparisons of interprobe distances in the same cell, the Mann-Whitney test was applied. For comparisons of interprobe distances between RING1A-knockout and RING1A/RING1B double-knockout cells, the Kruskall-Wallis/Dunn's multiple-comparisons test was used.
Nuclear strand-specific RNA sequencing.
Mouse ESCs (wild-type 129SvJae/Cast, RING1A-knockout and RING1A/RING1B double-knockout cells) were washed in PBS, and approximately 30–50 × 106 ESCs were lysed for 5 min in 0.5 ml of cold RLN buffer (50 mM Tris-HCl, pH 7.5, 140 mM NaCl, 1.5 mM MgCl2, 1 mM DTT and 0.4% NP-40). Nuclei were pelleted by spinning at 300g for 10 min at 4 °C. Nuclear RNA was isolated using TRIsure (Bioline), treated with DNase I (Roche) and repurified using the RNeasy Mini kit (Qiagen). Strand-specific RNA-seq libraries were prepared as described previously by marking the second strand with dUTP36,37 but with some modifications. Nuclear RNA (250 ng) was fragmented using a Covaris E220 instrument at standard RNA settings for 60 s. Fragmented RNA was precipitated, and first-strand synthesis was carried out using SuperScript III (Invitrogen) with 4 μg of actinomycin D (Sigma). Nucleotides were removed with mini quick-spin DNA columns (Roche), and second-strand synthesis was performed using Escherichia coli DNA ligase (New England BioLabs), DNA polymerase I (New England BioLabs) and RNase H (Fermentas), replacing dTTP with dUTP (Fermentas). After the products were purified on QIAquick columns (Qiagen), they were ligated to TruSeq Illumina adaptors with T4 DNA ligase (Enzymatics). Libraries were purified on QIAquick columns, treated with USER (New England BioLabs) to destroy the second strand and size selected using AMPure XP beads. Libraries were amplified with 9–11 PCR cycles and sequenced (50-bp paired-end reads) on the HiSeq 1000 platform.
RING1A-knockout and RING1A/RING1B double-knockout cells (2 × 108 cells) were fixed in ChIP fixation buffer (1% formaldehyde, 5 μM EGTA, 10 μM EDTA, 1 mM NaCl and 0.5 mM HEPES in PBS) for 10 min at room temperature. Fixation was stopped by the addition of glycine (final concentration of 125 mM). Cells were washed with PBS, buffer A (10 mM HEPES, pH 7.5, 10 mM EDTA, 0.5 mM EGTA and 0.75% Triton X-100) and buffer B (10 mM HEPES, pH 7.5, 200 mM NaCl, 1 mM EDTA and 0.5 mM EGTA). Cells were lysed in lysis buffer (25 mM Tris, pH 7.5, 150 mM NaCl, 5 mM EDTA, 0.1% Triton X-100, 1% SDS, 0.5% deoxycholate and Complete protease inhibitor (Roche)) for 30 min on ice. Sonication was performed using a Biorupter sonicator (Diagenode) to obtain an average DNA fragment size of 300 bp. Chromatin was diluted with ChIP dilution buffer (25 mM Tris, pH 7.5, 150 mM NaCl, 5 mM EDTA, 1% Triton X-100, 0.1% SDS, 0.5% deoxycholate and Complete protease inhibitor). Dynabeads Protein G beads (Life Technologies) were blocked for 1 h at 4 °C with 1 mg/ml BSA and 1 mg/ml yeast tRNA (Life Technologies). For each immunoprecipitation, 150 μg of chromatin and 5 μg of antibody recognizing H3K4me1 (ab8895, Abcam) or H3K27ac (ab4729, Abcam) were used. Chromatin was precleared with the blocked beads for 1 h at 4 °C. Chromatin was then incubated with antibody overnight at 4 °C with rotation. Protein-antibody complexes were precipitated by the addition of beads for 2 h. Complexes were washed twice with wash buffer A (50 mM Tris, pH 8.0, 150 mM NaCl, 0.1% SDS, 0.5% deoxycholate, 1% NP-40 and 1 mM EDTA), once with wash buffer B (50 mM Tris, pH 8.0, 500 mM NaCl, 0.1% SDS, 0.5% deoxycholate, 1% NP-40 and 1 mM EDTA), once with wash buffer C (50 mM Tris, pH 8.0, 250 mM lithium chloride, 0.5% deoxycholate, 1% NP-40 and 1 mM EDTA) and once with TE. Samples were treated with RNase A and proteinase K, and cross-links were reversed overnight. DNA was purified using a ChIP DNA clean and concentrator column (Zymo Research). Libraries were prepared using the NEB Next Fast DNA Fragmentation and Library Preparation set for the Ion Torrent kit (E6285S), following the manufacturer's instructions. Briefly, 40 ng of ChIP or input DNA was used for library generation. Libraries were size selected for 250-bp fragments using 2% E-gel Size Select agarose gels (Life Technologies) and were amplified with five PCR cycles. Libraries were sequenced on the Ion Proton Sequencer using Ion PI Chip v2 (Life Technologies). Templates were generated using Ion PI Template OT2 200 kit v3 and Ion PI Sequencing 200 kit v3 or the Ion PI IC 200 kit (Life Technologies).
ChIP-seq data processing.
The publically available ChIP-seq data sets used are listed in Supplementary Table 3. ChIP-seq peaks from the Encyclopedia of DNA Elements (ENCODE) project were directly imported without reprocessing the data. RING1B ChIP-seq peaks were defined previously17, and EZH2 and SUZ12 ChIP-seq peaks were called in the same way. For other publically available data sets, raw reads were mapped to the mm9 mouse genome using Bowtie 2 (ref. 38) with a seed length of 25 bp, allowing reads that had at most only one mismatched nucleotide in the seed, returning only one possible mapping and setting the remaining parameters to default values. After mapping, MACS39 was used to call peaks using default parameters.
For H3K4me1 and H3K27ac ChIP-seq in RING1A-knockout and RING1A/RING1B double-knockout cells, raw reads were aligned to the mouse mm9 genome using Bowtie 2 with default alignment parameters but excluding non-unique mappings (-m 1) and removing duplicated reads. Each replicate was downsampled to the same number of aligned reads (Supplementary Table 2).
HindIII fragments were considered to be occupied by a protein factor or histone modification if they overlapped a ChIP-seq peak. Baited promoter fragments that overlapped a RING1B peak were considered to be bound by PRC1. Promoter fragments that overlapped EZH2 and SUZ12 peaks were considered to be bound by PRC2. Promoter fragments that overlapped any OCT4, SOX2 or NANOG peak (but were depleted for both PRC1 and PRC2) were designated OSN.
Enhancers were defined as previously described23 in wild-type ESCs using ENCODE data (Supplementary Table 3). H3K4me1 peaks were filtered to remove peaks within 1,000 bp (edge to edge) of a RefSeq promoter or an H3K4me3 peak. Remaining H3K4me1 peaks were overlapped with H3K27ac and H3K27me3 peaks. H3K4me1 peaks that did not overlap either of these marks were designated intermediate enhancers, those overlapping only H3K27ac peaks were designated active enhancers and those overlapping only H3K27me3 peaks were designated poised enhancers (Supplementary Table 4). HindIII fragments that overlapped only one type of enhancer were annotated with this classification (Supplementary Table 4). Analysis of differential histone modification was carried out using the DiffBind Bioconductor package40 (edgeR method) at all H3K4me1 peaks in wild-type cells.
RNA sequencing analysis.
Reads were mapped with TopHat41 using default parameters and filtered to remove read pairs also aligning to ribosomal DNA sequences. SeqMonk was used to generate read counts for genes >200 bp in length (considering the entire gene body, using read pairs separated by <1 kb) (Supplementary Table 2). Downstream analysis was performed using the DESeq2 Bioconductor package42. Analysis of differential gene expression was performed using the default settings in DESeq2 but without independent filtering of the results and with an FDR of 0.05.
Mean-normalized FPKM (fragments per kilobase of transcript per million mapped reads) values in wild-type ESCs were used to categorize promoter fragments as having no expression (0 FPKM) or having expression falling into one of four quartiles. To form a control set of promoters not bound by RING1B that had a similar expression profile to RING1B-bound promoters (expression-matched promoters), the RING1B-bound promoters were counted in each expression category and an equal number of promoters not binding RING1B were randomly selected from each category.
Promoter CHi-C contact calling.
Raw sequencing reads from RING1A-knockout and RING1A/RING1B double-knockout CHi-C libraries were processed using the HiCUP pipeline, which maps the positions of paired-end reads against the mouse genome (mm9), filters out experimental artifacts, such as circularized reads and re-ligations, and removes duplicate reads (Supplementary Table 2). The preprocessed reads from two replicates of wild-type ESC promoter CHi-C data (E-MTAB-2414) (ref. 14) were combined, and a random subset of 98,842,763 reads was selected to correspond to the average number of reads from the RING1A-knockout and RING1A/RING1B double-knockout CHi-C data sets.
Significantly contacting regions were identified using the GOTHiC Bioconductor package. This approach assumes that biases occurring in Hi-C and CHi-C experiments are captured in the coverage (total number of reads mapping to a given fragment or larger bin), and significantly contacting regions or true contacts therefore can be separated from background noise using a cumulative binomial test followed by Benjamini-Hochberg correction for multiple testing (FDR cutoff of 0.05). Biological replicates were pooled for analysis, and promoter-promoter and promoter-genome contacts were handled separately.
For promoter-promoter contacts, we calculated a modified null distribution to account for the non-multiplicative capture bias in products targeted by two baits. A random ligation CHi-C sample14 was used to build a generalized linear model. The product and the sum of the coverage values of the two ends were used as input variables, whereas the contact frequencies of random ligation events were used as dependent variables. Predicted contact frequencies for the actual samples were calculated from the model using logit regression. The GOTHiC binomial test was then applied with this modified background distribution to identify significant contacts. Short-range intrachromosomal contacts were excluded by filtering out contacts separated by <10 Mb. Strong promoter-promoter contacts were defined as those represented by three or more independent reads.
Hox cluster contacts were calculated with the HindIII fragments of the Hox cluster regions binned together and normalized for the number of captured HindIII fragments in the cluster region. The following mm9 coordinates were used to define the Hox clusters: Hoxa, chr. 6: 52,099,000–52,277,000; Hoxb, chr. 11: 96,045,674–96,240,000; Hoxc, chr. 15: 102,740,000–102,877,000; and Hoxd, chr. 2: 74,486,000–74,614,000.
Promoter-promoter network connectivity maps were generated using significant contacts between all 22,225 captured promoters or considering only Polycomb-bound promoters (in each case, contacts from the binned Hox cluster regions were used). Networks were visualized with Cytoscape43 using a force-directed layout with the following parameters: number of iterations, 100; weight attribute, read count; minimum weight to consider, 3; no partitioning of subgraphs before layout.
The Hox network was defined as comprising RING1B-bound promoters in the defined Hox cluster regions (Hox cluster genes) and RING1B-bound promoters making direct interchromosomal or long-range (>10 Mb) intrachromosomal contacts with the Hox clusters (Hox cluster–contacting genes) (Supplementary Table 1). Contacts between Hox cluster promoters and contacts between Hox network members were visualized as Circos plots using the ggbio R package. Promoters not in the Hox network were defined as RING1B-bound promoters not making direct contacts with the Hox cluster regions and located >10 Mb away from them (Supplementary Table 1).
Contact enrichment and contact strength between promoters.
To measure the enrichment of contacts within a set of promoters, 100 random promoter sets were generated with comparable pairwise distance distributions to the experimental set. The P value by its classical definition was acquired by counting the number of random control sets with more contacts than the experimental set. Because the contact counts in control sets generally followed a near-normal distribution, a t test with 99 degrees of freedom was used to more accurately estimate low P values. Contact enrichment was derived by dividing the number of contacts in the experimental set by the average expected number of contacts in the control sets.
To compare contact strength within and between sets of promoters (for example, between PRC1/PRC2-bound and OCT4-bound promoters), a distance-normalized contact strength value for each set or pair of sets was calculated by summing the log(fold observed/expected) values for contacts and dividing by the summed contact distances of all possible contacts within or between promoter sets.
Promoter-genome contacts (contacts between promoters and non-promoter regions in the genome) were calculated using an unmodified null distribution; however, contacts were subsequently filtered in three steps. First, contacts in which one of the fragments had extremely high coverage (among the 10 most covered bait fragments in at least 2 of 3 samples or among the 50 most covered non-bait fragments in any of the samples) were removed. A 'neighbor filter' was applied to control for spurious contact spikes involving a single fragment pair by only keeping contacts that were supported by at least one valid read pair involving a neighboring fragment. Finally, because the log(fold observed/expected) values of all contacts had a bimodal distribution, contacts that were likely to be background contacts were removed by fitting a normal distribution to the lower peak and applying a cutoff at the 95th percentile of the normal distribution (∼10). Contacts were considered maintained if present in wild-type, RING1A-knockout and RING1A/RING1B double-knockout samples. Contacts that were only present in RING1A/RING1B double-knockout samples were considered to be gained.
Enrichment at non-bait promoter-contacting fragments.
To calculate enrichment for chromatin marks or transcription factors (Supplementary Table 3) at non-bait promoter-contacting HindIII fragments for each promoter category, the proportion of promoter category–contacting fragments occupied by the factor or mark was divided by the proportion of all promoter-contacting fragments occupied by the same mark or factor. The resulting value was converted to its log2 value, such that positive values represent an enrichment compared with all promoter-contacting non-bait fragments and a negative value represents depletion.
ArrayExpress data repository, http://www.ebi.ac.uk/arrayexpress/; DiffBind Bioconductor package, http://bioconductor.org/packages/release/bioc/vignettes/DiffBind/inst/doc/DiffBind.pdf; HiCUP pipeline, http://www.bioinformatics.babraham.ac.uk/projects/hicup/; GOTHiC Bioconductor package, http://www.bioconductor.org/packages/release/bioc/html/GOTHiC.html; SeqMonk, http://www.bioinformatics.babraham.ac.uk/projects/seqmonk/.
Nuclear RNA-seq (E-MTAB-3125), ChIP-seq (E-MTAB-3156), promoter CHi-C raw data and lists of significant contacts (E-MTAB-3109) are available from the ArrayExpress data repository under the indicated accessions.
We thank members of the Elderkin, Fraser and Luscombe groups for discussions and J. Houseley and P. Rugg-Gunn for commenting on the manuscript. We thank F. Krueger for help with data processing and formatting. We thank R.J. Klose and N. Brockdorff for sequencing. We thank D. Bolland, J. Martins and A. Corcoran for help and advice with the three-dimensional DNA FISH and MetaCyte data analyses. This work was funded by the Wellcome Trust (WT085102MA) (S.E.), the Biotechnology and Biological Science Research Council, the Medical Research Council UK (P.F.) and the European Union Framework Programme 7 Epigenesys Network of Excellence (N.M.L.).
Integrated supplementary information
Promoter fragment baits in different categories.
Next-generation sequencing statistics for promoter CHi-C, nuclear RNA-seq and ChIP-seq.
Publically available data sets used.
Enhancer fragment baits in different enhancer classes.
BACs and primer sequences used for 3C-PCR and 3D DNA FISH.
Interprobe distances for 3D DNA FISH.