Main

The capacity to initiate and maintain defined gene expression patterns is fundamental to complex multi-cellular development. At its most basic level, this relies on transcription factors recognizing DNA sequences in gene regulatory elements to control RNA polymerase (Pol) II activity at the core gene promoter1. However, in eukaryotes, chromatin states at gene regulatory elements can also profoundly influence transcription and gene expression, and the systems that create these states are essential for normal gene regulation and development1,2,3,4. While there is an emerging appreciation of the mechanisms through which transcription factors instruct transcription1, how chromatin-based systems influence transcription remains very poorly understood and a major conceptual gap in our knowledge of gene regulation.

The Polycomb repressive system represents a paradigm for chromatin-based gene regulation and is essential for appropriate gene expression during animal development5,6,7. It comprises two distinct histone modifying complexes, Polycomb repressive complexes 1 and 2 (PRC1 and PRC2, respectively). PRC1 mono-ubiquitylates H2A at lysine 119 (H2AK119ub1) and PRC2 methylates histone H3 at lysine 27 (H3K27me3). In vertebrates, both PRC1 and PRC2 are targeted to promoters of genes that have CpG island elements. Here they can deposit histone modifications and through feedback mechanisms create Polycomb chromatin domains that have high levels of H2AK119ub1, H3K27me3 and occupancy of PRC1 and PRC2 complexes. We refer to target genes where Polycomb domains form as Polycomb genes6,8. Polycomb chromatin domains have important roles in counteracting gene expression and help to maintain the inactive state of genes in tissues where they should not be expressed5,6,7, with previous work also suggesting a more pervasive role in constraining gene expression8,9,10,11,12. However, how the Polycomb controls transcription to repress gene expression remains very poorly understood.

A central experimental constraint that has limited our understanding of how gene regulatory mechanisms function in situ is that the process of transcription is not uniform across cells. Instead, transcription is stochastic within individual cells over time and varies substantially between cells in a population13,14. As such, ensemble approaches for analysing transcription do not capture key features of the transcription cycle that are essential for understanding how regulatory mechanisms effect gene expression. To overcome this, single-cell transcription analysis complemented with detailed understanding of the cellular dynamics of the factors that regulate transcription is emerging as an important avenue to uncover how transcription is controlled to regulate gene expression13,14.

We and others have shown using ensemble approaches in embryonic stem (ES) cells that the Polycomb system, in particular PRC1 and H2AK119ub1 (PRC1/H2AK119ub1)8,15,16,17,18,19,20, has a central role in constraining gene expression through limiting the activity of RNA Pol II at Polycomb genes21. This has demonstrated that factors necessary to promote transcription of Polycomb genes are present and that the Polycomb system limits some key aspect of transcription to enable repression. Analysis of these effects in single cells suggested that Polycomb could influence the frequency of transcriptional bursts, but this observation relied on inferring kinetic parameters based on modelling RNA transcript levels in fixed cells21,22,23. As such, how the Polycomb system controls transcription remains essentially unknown.

To address this fundamental question, here we use rapid degron approaches, live-cell imaging and genomics to determine how PRC1/H2AK119ub1 regulate transcription. We discover that non-canonical PRC1 and H2AK119ub1 have an important role in sustaining a deep promoter OFF state by limiting transcription pre-initiation complex (PIC) engagement with gene promoters to counteract transcription. As such, we reveal that Polycomb chromatin domains limit the earliest steps of transcription to enable gene repression.

Results

Imaging Polycomb gene transcription in live cells

To begin understanding how the Polycomb system influences transcription, we used a highly sensitive MS2 aptamer-based system, which is capable of capturing transcription with single-transcript sensitivity in living cells24 (Fig. 1a). To implement this, we used CRISPR–Cas9 engineering in mouse ES cells to create lines in which MS2 repeats were inserted into the first intron of two representative Polycomb genes (Zic2 and E2f6) that have their promoters embedded within a typical Polycomb chromatin domain (Extended Data Fig. 1a–c) and are subject to very low levels of transcription in wild-type cells but become de-repressed when PRC1 is depleted (Extended Data Fig. 1c,d). We also engineered MS2 repeats into a moderately expressed reference gene that lacks a discernible Polycomb chromatin domain (Hspg2) and is not influenced by PRC1 repression (Extended Data Fig. 1b–d). These cell lines were engineered to express an MS2 RNA-binding protein fused to green fluorescent protein (MCP–GFP), enabling nascent transcription imaging and quantification of transcription in live cells24 (Fig. 1a and Extended Data Figs. 1b and 3a,b).

Fig. 1: Imaging Polycomb gene transcription in live cells.
figure 1

a, Top: schematic illustrating the transcription imaging approach. MS2 repeats were inserted into a promoter-proximal intron of the genes of interest. As RNA Pol II passes through the array, nascent RNA presents MS2 stem loops that are bound by MCP–GFP leading to accumulation of fluorescence signal at the active transcription site. Bottom: an example image of a cell with a nascent transcription spot corresponding to the active TSS. The white dashed lines indicate the cell outline. b, Example of a transcription activity trajectory from cells engineered to contain the MS2/MCP–GFP system (Zic2). Maximal projections of the focalized MCP–GFP signal are shown above the trajectory to illustrate the pulsatile nature of transcription. c, Example transcription activity trajectories for Polycomb genes (Zic2 and E2f6) and a reference gene (Hspg2). ON (green), permissive (violet), and OFF periods (black) are illustrated. The y axis represents transcriptional activity (in RNA molecules). Source numerical data are available in Source data.

Source data

When we imaged these cell lines, bright MCP–GFP foci were evident which corresponded to nascent RNA-fluorescence in situ hybridization (FISH) signal for each gene (Extended Data Fig. 1b), and we found that nascent transcription could be quantified in live cells with single-transcript sensitivity (Extended Data Fig. 3c,d). Importantly, transcription of Polycomb genes was detected in agreement with these genes being expressed, albeit at low levels (Extended Data Fig. 1c,d). When we measured MCP–GFP fluorescence signal corresponding to nascent transcription over time, we observed that transcription was pulsatile (Fig. 1b), in line with previous live-cell transcription imaging in mammalian cells13,14. Furthermore, transcription trajectories for all three genes were characterized by transcriptionally permissive periods, within which there were distinct bursts of transcription initiation that we refer to as ON periods, where multiple RNA polymerases transcribe in close succession (Fig. 1c). Permissive periods were interspersed by long-lived OFF periods where the gene was not transcribed at all. Some OFF periods were highly persistent, extending for the entire duration (8 h) of the imaging movie, and clonal expression analysis revealed instances where OFF periods could extend across cell divisions24,25 (Extended Data Fig. 2a,b). Therefore, our imaging approach captures the transcriptional behaviour of Polycomb genes and provides us with an opportunity to study how the Polycomb system regulates transcription in live cells.

PRC1 does not constrain transcription during ON periods

With the capacity to image the transcription of Polycomb genes, we could begin to explore how the Polycomb system might regulate transcription. Initially we focused on ON periods and developed a transcription imaging analysis approach to extract the number of transcripts produced, duration and Pol II loading frequency during ON periods (Fig. 2a and Extended Data Fig. 3e,f). When we compared ON-period features for Polycomb genes (Zic2 and E2f6) and the reference gene (Hspg2), we found that they were similar (Fig. 2b) despite Polycomb genes being much more lowly expressed (Fig. 2d).

Fig. 2: PRC1 does not constrain transcription during ON periods.
figure 2

a, Schematic illustrating the ON-period features extracted from transcription imaging trajectories. These include the rate of RNA Pol II initiation within the ON period (from linear fit of the slope), the duration of the ON period (min) and the amplitude of the ON period (transcripts). b, Box plots centred on the median value comparing the ON-period features, with the interquartile range (IQR) demarcating the minimal and maximal values, whiskers as 1.5× IQR and outliers as dots. Individual data points correspond to individual ON periods (at least 573 measured per box plot). P values were estimated using a two-sided Kolmogorov–Smirnov. Box plots represent data from four (Zic2), three (E2f6) and two (Hspg2) biological replicates. c, Left: diagram illustrating the auxin-inducible system used to rapidly deplete the catalytic subunit of PRC1 (RING1B) in a Ring1a−/− background. Right: western blot analysis of RING1B-AID levels over a 2-h period after addition of auxin (IAA) (right) compared with a wild-type (WT) mouse ES cell line. Shown is a representative example of three independent experiments. d, smRNA-FISH analyses of E2f6, Zic2 and Hspg2 expression 4 h after PRC1 depletion. Dots represent individual biological replicates (n = 3, with >400 cells per replicate) and error bars represent the s.d. e, Schematic illustrating the approach to image transcription in live cells with (+IAA) or without (untreated, UNT) PRC1 depletion. f, Box plots (as in b) corresponding to ON-period analysis for Zic2, E2f6 (Polycomb genes) and Hspg2 (reference) in untreated and PRC1-depleted conditions. Individual data points correspond to individual ON periods (at least 599 measured per box plot). Statistical significance was calculated as in a and P values < 0.05 are shown. For Hspg2 Pol II loading, P = 0.01376; for Hspg2 amplitude, P = 4.2704 × 10−5. Box plots represent data from four (Zic2), three (E2f6) and two (Hspg2) biological replicates. Source numerical data and unprocessed blots are available in Source data.

Source data

This suggested that Polycomb-mediated repression may not primarily manifest from limiting transcription during ON periods. To test this, the MS2 reporter system was integrated into a degron cell line in which the addition of the small-molecule auxin (indole-3-acetic acid, IAA) leads to rapid depletion of RING1B, the structural core and catalytic subunit of PRC1, leading to turnover of H2AK119ub121,26 (Fig. 2c). Importantly, depletion of PRC1/H2AK119ub1 caused Polycomb gene de-repression and resulted in an approximately 2–2.5-fold increase in transcript levels as assessed by single-molecule RNA-FISH (smRNA-FISH), with Zic2 reaching transcript levels similar to the reference gene (Fig. 2d). Examining ON-period features, we found they were largely unaffected after PRC1/H2AK119ub1 depletion despite these genes displaying increased transcript levels (Fig. 2d–f). Therefore, we conclude that Polycomb-mediated repression is not achieved by PRC1 constraining transcription during ON periods.

PRC1 sustains a deep OFF state refractory to transcription

Depletion of PRC1/H2AK119ub1 did not affect transcription during ON periods, suggesting that PRC1/H2AK119ub1 regulates some other feature of transcription. One possibility was that PRC1 could limit the frequency of transcription events (ON periods) during permissive periods or the duration of permissive periods (Fig. 3a). To test this, we imaged transcription in the presence or absence of PRC1 and quantified the time between ON periods within permissive periods (Fig. 3b) and the duration of permissive periods (Fig. 3c). Similarly to ON-period analysis, depletion of PRC1/H2AK119ub1 had only minor effects on transcription during permissive periods, although we did observe a small increase in the duration of permissive periods for the reference gene (Fig. 3c). Therefore, PRC1/H2AK119ub1 does not repress Polycomb genes by regulating either ON-period (Fig. 2) or permissive-period (Fig. 3b,c) features.

Fig. 3: PRC1 sustains a deep OFF state that is refractory to transcription and counteracts gene expression.
figure 3

a, Schematic illustrating the features extracted from transcription imaging trajectories for permissive-period analysis. These include the time between ON periods within permissive periods (grey arrow) and the duration of permissive periods (purple arrow). b, Box plots centred on the median value comparing the time between ON periods for E2f6, Zic2 and Hspg2 in untreated or IAA-treated (PRC1-depleted) conditions showing the IQR, which is demarcated by the minimal and maximal value, and whiskers as 1.5× IQR. At least 1,010 instances of time intervals between ON periods represent each box plot. P values represent two-sided Kolmogorov–Smirnov and P values < 0.05 are shown. Box plots represent data from four (Zic2), three (E2f6) and two (Hspg2) biological replicates. c, Box plots (as in b) comparing the duration of permissive-periods for E2f6, Zic2 and Hspg2 in untreated or IAA-treated conditions. At least 212 durations of permissive period represent each box plot. Box plots represent data from four (Zic2), three (E2f6) and two (Hspg2) biological replicates. For Hspg2, P = 1.263 × 10–4. d, Bar graphs showing the fraction of total imaging time spent in permissive periods for E2f6, Zic2 and Hspg2. Data are mean and s.d. from four (Zic2), three (E2f6) and two (Hspg2) biological replicates. For E2f6, P = 0.01756; for Zic2, P = 3.850 × 10–4. e, Heat maps illustrating transcription imaging trajectories of individual cells for E2f6, Zic2 and Hspg2 in untreated or IAA-treated conditions over the 8-h imaging time course (horizontal axis). The amplitude of transcription is illustrated in the scale bar (right) and the number of imaging time courses is indicated on the y axis. Heat maps were randomly subsampled to represent equal number of measurements in untreated and IAA-treated conditions to facilitate qualitative comparison. f, Top: schematic illustrating the simple three-state model of transcription used to simulate gene expression distributions. Bottom: histograms comparing transcript per cell distributions from smRNA-FISH in experiments (blue bars, experimental) and simulations (red bars) for Polycomb genes in untreated or IAA-treated conditions. The best-fit PO>P value for both untreated and 4 h IAA-treated conditions are indicated. Source numerical data are available in Source data.

Source data

Having observed little effect of PRC1/H2AK119ub1 on either ON-period or permissive-period features, we postulated the effects on expression must manifest from an increase in the frequency with which Polycomb genes exit from long-lived OFF periods and enter into permissive periods where transcription occurs. Consistent with this, when we examined the fraction of time that promoters spend in permissive periods, we discovered PRC1/H2AK119ub1 depletion caused a clear increase, despite permissive-period duration remaining largely unaltered (Fig. 3d). This was also evident in heat maps illustrating single-cell transcription imaging traces for Polycomb genes (Fig. 3e). Although the relative increase in the fraction of time spent in permissive periods and the expression changes after PRC1 depletion do not precisely converge (Figs. 3d and 2d), this is probably due to the non-equilibrium nature of transcript accumulation in our rapid degron system, which relies on the interplay between new transcript production and mRNA half-life. Therefore, we posit that PRC1/H2AK119ub1 counteracts transcription by sustaining promoters in a long-lived deep OFF state and that elevated expression after PRC1 depletion results from an increased fraction of time spent in the permissive period.

PRC1 decreases the probability of exiting the deep OFF state

If PRC1/H2AK119ub1 represses transcription by sustaining a deep OFF state, an increased frequency of transitioning out of this deep OFF state should account for elevated gene expression observed in smRNA-FISH after PRC1/H2AK119ub1 depletion (Fig. 2d). To investigate this possibility, we built a simple three-state gene expression model that incorporated parameters measured in live-cell imaging for ON periods (Fig. 2b), the number of ON periods and time between them within permissive periods (Extended Data Fig. 4a–d), and transcript half-lives (Extended Data Fig. 4e). Stochastic simulations of gene expression were then carried out with differing probabilities of transitioning from OFF periods to permissive periods (PO>P; Fig. 3f) to identify the PO>P value that corresponded to the transcript distributions measured by smRNA-FISH in untreated cells (Extended Data Fig. 4f,g). We then asked whether increasing the PO>P value in these gene expression simulations would reproduce the increased expression and transcript distributions measured in cells when PRC1/H2AK119ub1 was depleted (Figs. 2d and 3f). Importantly, for both E2f6 and Zic2 an approximately 2.5-fold increase in PO>P resulted in similar transcript distributions to those observed experimentally after PRC1/H2AK119ub1 depletion, consistent with this being the point of transcriptional control (Fig. 3f). Therefore, by combining live-cell imaging, stochastic simulations and gene expression analysis, we show that the Polycomb system sustains a long-lived deep promoter OFF state that is refractory to transcription to repress gene expression.

PRC1 counteracts binding of early PIC-forming components

The process of transcription is orchestrated by several distinct regulatory mechanisms that contribute to transcript production1,27,28. To understand how PRC1 sustains the deep OFF state, we set out to define what regulatory feature of transcription PRC1/H2AK119ub1 controls. The behaviour of individual factors that regulate the core process of transcription are, like the process of transcription itself, known to be stochastic and highly dynamic. Therefore, capturing the breadth of their dynamic behaviours is not possible using classical ensemble genomic approaches. However, these dynamic behaviours can be measured and quantified in living cells using single-particle tracking (SPT), where the dynamics of individual molecules is directly observed as they interact with chromatin29,30,31,32,33,34,35. Therefore, we reasoned that a similar approaches could be applied to explore the regulatory stage of transcription affected by PRC1/H2AK119ub1.

To enable SPT, we used CRISPR–Cas9 genome engineering and the HaloTag protein fusion system to label core transcription regulators involved in distinct steps of transcription27,28 (Fig. 4c and Extended Data Fig. 5a,b). To examine early transcription initiation, we fused a HaloTag to the TATA-box binding protein (TBP) and the TAF1 and TAF11 components of TFIID36. TBP function in PIC formation is counteracted by negative cofactor 2 (NC2) through binding to a surface on TBP required for engagement of the general transcription factors TFIIA and B37. Therefore, we fused NC2β to a HaloTag to capture inhibition of early PIC formation, and TFIIB whose interaction with TBP is essential for progression of PIC formation38. PIC formation then advances through binding of the mediator coactivator complex, so we fused a HaloTag to the MED14 component of mediator. Once RNA Pol II engages with the PIC, TFIIH is recruited by contacting mediator and RNA Pol II39,40 and its CDK7 component phosphorylates the C-terminal repeats of RNA Pol II during early transcription elongation. Therefore, we fused CDK7 to a HaloTag to capture this step of transcription. As RNA Pol II enters into early elongation, CDK9 phosphorylates the negative elongation factor (NELF) and RNA Pol II to overcome RNA Pol II pausing and ensure productive elongation. To capture factors related to this stage of transcription we fused a HaloTag to CDK9, NELF-B and the largest subunit of RNA Pol II, RPB1.

Fig. 4: PRC1 counteracts binding of early PIC-forming components.
figure 4

a, An example of individually colour-coded single-molecule tracks acquired at high frame rate (left). These tracks are used for kinetic modelling in SPOT-ON42 to obtain bound fractions. b, An example frame from stable binding time measurements acquired at low frame rate with stably bound molecules indicated with arrow heads. Stable binding times for the protein of interest (POI) are extracted from bi-exponential fits (dotted lines) from cumulative distributions (solid lines) and corrected for photobleaching using estimates of stable binding of histone H2B-HT (blue). c, A cartoon illustrating stages of PIC assembly and transcription regulation. Protein factors studied by SPT are indicated. d, Dot plots illustrating the bound fractions (top) and stable binding time (bottom) for a panel of transcription regulators in untreated or PRC1-depleted (IAA-treated) conditions. Individually colour-coded dots represent values for individual biological replicates and are connected with grey lines, error bars represent s.d. and horizontal lines show the mean value. A minimum of three biological replicates were measured with approximately 100 cells per replicate for bound fraction analysis and approximately 20 cells for stable binding time measurements per biological replicate. P values were determined by one-sided paired t-tests and are presented whenever data reach statistical significance (P < 0.05). Bound fraction: TBP, P = 0.012903; TAF11, P = 0.01352; MED14, P = 0.032109; stable binding time: TBP, P = 0.010049; TAF1, P = 0.006401; TAF11, P = 0.040219; NC2β, P = 0.024727; TFIIB, P = 0.025326; MED14, P = 0.041231; CDK9, P = 0.023027; NELF-B, P = 0.024577. e, Scatter plot integrating the effects on bound fraction and stable biding times measured in SPT. Dots correspond to the mean fold change (FC) values for individual proteins and the error bars correspond to s.e.m. The data represents at least three biological replicates as indicated in d. Solid grey vertical and horizontal lines correspond to 1 (no change). Source numerical data are available in Source data.

Source data

To image these transcription regulators in single cells with single-molecule precision, we used a photo-activatable Halo dye coupled with highly inclined and laminated optical sheet microscopy41. By imaging at a high frame rate, we quantified the fraction of molecules bound to chromatin (measure of association)42 (Fig. 4a and Extended Data Fig. 5c) and by imaging at a low frame rate, we estimated the stable binding time of molecules (measure of dissociation)43 (Fig. 4b and Extended Data Fig. 5d). Interestingly, by focusing on the earliest regulatory steps involving TBP (Fig. 4c), we observed that PRC1/H2AK119ub1 depletion resulted in a nearly 50% increase in the bound fraction of TBP and its binding time also increased (Fig. 4d). This indicates that TBP engages more frequently and remains bound for longer in the absence of PRC1/H2AK119ub1. When we examined the dynamics of other TFIID components, TAF11 showed an increased bound fraction whereas TAF1 was unaffected, but both factors displayed increases in stable binding time. It has been proposed that lobe A of TFIID, which contains TAF11, and lobes B/C of TFIID, which contain TAF1, may exist in distinct pre-assembled subcomplexes44,45. This suggests that PRC1/H2AK119ub1 may primarily influence engagement of TBP and TFIID lobe A, with the net result being more stable binding of the TFIID holocomplex. In contrast, the bound fraction of the TBP inhibitory factor NC2β was largely unaffected, but its duration of binding was dramatically reduced, consistent with elevated stable binding of a TBP-containing TFIID complex. The bound fraction and duration of MED14 binding was also elevated upon PRC1 depletion, consistent with mediator engagement depending on TFIID46. This suggests that in the absence of PRC1/H2AK119ub1, the association and stable binding of early PIC forming components is increased, whereas the stable binding time of the negative cofactor complex is reduced.

To understand whether these early effects would influence downstream general transcription factors, we examined TFIIB and the TFIIH component CDK7 (Fig. 4d). TFIIB showed only a slight increase in bound fraction but displayed elevated stable binding, whereas CDK7 was largely unaffected. We then examined CDK9 and NELF-B and found that their bound fractions were unaffected, but the stable binding time of CDK9 increased whereas it decreased slightly for NELF-B, in line with elevated transcription initiation when PRC1/H2AK119ub1 is depleted47. Importantly, when we examined RNA Pol II via measuring RPB1 dynamics, we observed little effect, supporting the idea that PRC1 regulates early transcription events and does not considerably affect the amount of elongating RNA Pol II, which is primarily captured in our measurements. Furthermore, this result indicates that the increase in the amount of elongating RNA Pol II that occurs at more lowly expressed Polycomb genes does not contribute enough to the overall amount of elongating RNA Pol II to influence our measurements. On the basis of these detailed kinetic measurements, we propose that PRC1/H2AK119ub1 limits the binding of factors involved in the earliest stages of PIC formation (Fig. 4e).

cPRC1 does not control stable PIC binding or repression

There are a number of distinct PRC1 complexes which are characterized either as canonical (cPRC1) or non-canonical (ncPRC1) depending on their subunit composition and function (Fig. 5a). cPRC1 complexes contain chromobox (CBX) and polyhomeotic (PHC) proteins, which compact chromatin and can nucleate phase separation of Polycomb chromatin domains48. cPRC1 complexes are poor E3 ubiquitin ligases contributing only modestly to H2AK119ub1. ncPRC1 complexes interact with RYBP and YAF2 proteins that stimulate their E3 ubiquitin ligase activity leading to deposition of most H2AK119ub1 in Polycomb chromatin domains6,8,20 (Extended Data Fig. 6a). To define which PRC1 complexes control the earliest stages of PIC binding to counteract gene expression, we focused on cPRC1 complexes that uniquely form around a single scaffold protein (PCGF2) in ES cells. If the effects on the binding dynamics of the early PIC-forming components and Polycomb gene de-repression were dependent on cPRC1, its depletion should phenocopy complete removal of all PRC1 complexes. Therefore, we engineered bTAG or dTAG degrons into the endogenous Pcgf2 gene. Addition of the small-molecule compounds AGB1 or dTAG-13 caused a rapid depletion of PCGF2 and a corresponding loss of cPRC1 complex binding to chromatin in Polycomb chromatin domains (Fig. 5b,d and Extended Data Fig. 6b,c).

Fig. 5: cPRC1 complexes do not regulate stable PIC binding nor contribute centrally to Polycomb repression.
figure 5

a, A cartoon illustrating the composition of ncPRC1 and cPRC1 complexes. b, Cartoon representation of the degron cell line in which cPRC1 complexes can be depleted and TFIID dynamics measured by examining TAF11 by SPT imaging (left). Western blot analysis illustrating depletion of cPRC1 complexes within 2 h of dTAG-13 treatment (right). BRG1 was used as a loading control. The western blot was performed once. c, Dot plots illustrating the bound fraction (left) and stable binding time (right) for HaloTag-fused TAF11 (HT-TAF11) in untreated or cPRC1-depleted (dTAG-13-treated) conditions. Individually colour-coded dots represent values for individual biological replicates (n = 4) and are connected with grey lines, error bars represent s.d. and horizontal lines show the mean value. P values represent one-sided paired t-tests. P = 0.020877 (fraction bound); and P = 0.846956 (stable binding time). A minimum of approximately 100 cells for bound fraction analysis and approximately 20 cells for stable binding time measurements were measured per biological replicate (indicated as colour-coded dots). d, As in b except PCGF2 was tagged with bromoTAG (bTAG). Western blot analysis demonstrates PCGF2-bTAG degradation after 2 h of AGB1 treatment. This experiment was performed once. e, smRNA-FISH analysis of transcript-per-cell distributions for untreated cells, cells with PCGF2-bTAG depleted (AGB1-treated), and cells with RING1B-AID depleted (IAA-treated). Depletions were performed for 4 h and at least 400 cells were measured for each gene in each condition. Source numerical data and unprocessed blots are available in Source data.

Source data

We then depleted cPRC1 in a HaloTag-labelled TAF11 cell line and carried out SPT to capture the chromatin binding dynamics of TFIID (Fig. 5b). Depletion of cPRC1 increased the bound fraction of TAF11 (Fig. 5c), consistent with the effects observed when all PRC1 complexes were depleted simultaneously (Fig. 4d). However, interestingly, in contrast to the simultaneous depletion of all PRC1 complexes, depletion of cPRC1 did not affect the stable binding time of TAF11 (Fig. 5c). To understand how these cPRC1-dependent effects on TAF11 binding dynamics were related to PRC1-dependent repression, we depleted cPRC1 and examined the expression of E2f6 and Zic2 using smRNA-FISH. In stark contrast to depleting all PRC1 complexes simultaneously, rapid depletion of cPRC1 did not result in de-repression of E2f6 or Zic2 (Fig. 5d,e and Extended Data Fig. 6d). Together, this demonstrates that cPRC1 can regulate the dynamic interactions TFIID makes with chromatin and its bound fraction, but it does not regulate stable binding of TFIID (Fig. 5c and Extended Data Fig. 6b). This suggests that ncPRC1, as opposed to cPRC1, predominates in counteracting stable TFIID binding and that the absence of ncPRC1 complexes and H2AK119ub1 leads to Polycomb gene de-repression.

PRC1 constrains TFIID binding to inhibit gene expression

SPT suggested that ncPRC1 or H2AK119ub1 may counteract the stable binding time of TFIID to limit the very earliest regulatory steps of transcription and maintain gene repression. While SPT captures transcription factor binding dynamics with single-molecule precision, it does not provide information about where effects on binding occur in the genome. To understand where TFIID binding was affected, we carried out calibrated chromatin immunoprecipitation coupled to massively parallel sequencing (cChIP–seq) for endogenously tagged TAF1 before and after PRC1 depletion. We chose TAF1 as it is the largest subunit of TFIID and a component of the TFIID holocomplex36. When we sorted Polycomb gene and non-Polycomb gene transcription start sites (TSSs) based on PRC1 occupancy, we observed on average the highest levels of TAF1 at non-Polycomb genes (Fig. 6a) in line with these genes being more highly expressed. Importantly, we also observed some TAF1 binding at Polycomb genes, but the levels were much lower, in line with the repressed state of these genes and consistent with the idea that PRC1 could limit TFIID complex binding to sustain a deep promoter OFF state. To test this possibility, we depleted PRC1 and observed a clear increase in TAF1 occupancy at Polycomb genes (Fig. 6a,b), which is qualitatively consistent with increased stable binding times measured by SPT (Fig. 4d,e). We also validated these effects by ChIP–quantitative PCR (ChIP–qPCR) analysis for TAF1 and other factors identified in our SPT analysis (Extended Data Fig. 7a). Interestingly, using cChIP–seq analysis we also observed a modest yet significant increase in TAF1 binding across non-Polycomb gene TSSs, indicating that PRC1 may also constrain the binding of TFIID more broadly (Fig. 6a,b and Extended Data Fig. 7b). Consistent with this possibility, low levels of PRC1 are detected at non-Polycomb gene promoters, and when we analysed gene expression across these genes, we observed a modest increase in expression after PRC1 depletion (Fig. 6a and Extended Data Fig. 7c). These findings agree with previous observations that PRC1 and H2AK119ub1 may have more subtle yet pervasive effects on gene expression8,21. Nevertheless, we find the effects on expression and increases in TAF1 binding correlated best at Polycomb genes (Extended Data Fig. 7e), suggesting that the Polycomb system has a prominent role maintaining these genes in a lowly transcribed or inactive state. Together, these observations indicate that PRC1 limits transcription and gene expression by counteracting TFIID binding to gene promoters, with the largest effects occurring at lowly transcribed Polycomb genes with high levels of PRC1 and H2AK119ub1.

Fig. 6: PRC1 constrains TFIID binding to inhibit gene expression.
figure 6

a, Heat map illustrating cChIP–seq signal for RINGB (PRC1) (green, left) or endogenously T7-tagged TAF1 (blue, right) in untreated or IAA-treated ES cells across TSSs. The distance in kilobases from left and right of TSSs is shown below each heat map. To visualize changes in T7-TAF1 signal, the log2-transformed fold change in IAA-treated versus untreated (that is, log2FC(IAA/UNT)) value is shown to the right of the T7-TAF1 cChIP–seq signal. To visualize steady-state gene expression levels and increases in gene expression after RING1B-AID depletion, RPKM (reads per kilobase per million mapped reads) values for untreated cells and log2FC(IAA/UNT) values were calculated for each corresponding gene using calibrated nuclear RNA sequencing (cnRNA-seq) data21 and plotted as heat maps on the right. TSSs were segregated into non-Polycomb (n = 9,899), Polycomb (n = 4,869) and non-CpG islands (n = 5,869) groupings based on the presence of non-methylated CpG island (CGI) and binding of PRC1 and PRC2 at their promoters as previously described8. Heat maps are ranked by RING1B signal. Genes examined in live-cell imaging of transcription (red) as well as some classical Polycomb genes (black) are indicated. b, A meta plot (left) illustrating the log2FC(IAA/UNT) of T7-TAF1 cChIP–seq signal at the three classes of TSSs shown in a and a box plot (right) showing log2FC(IAA/UNT) of cChIP signal integrated over ±1 kb from TSSs. The boxes centred on median value show the IQR to represent minimal and maximal values, the centre lines represent the median and whiskers extend by 1.5× IQR or the most extreme point (whichever is closer to the median), whereas notches extend by 1.58× IQR/n1/2, giving a roughly 95% confidence interval for comparing medians. P values were calculated using a two-sided Wilcoxon rank sum test. ***P = 2.2 × 10–16 (non-Polycomb versus non-CGI), ***P = 6.5 × 10–132 (non-Polycomb versus Polycomb) and ***P = 2.2 × 10–16 (Polycomb versus non-CGI). c, Schematic illustrating the combinatorial degron strategy used to examine the contribution of TFIID to de-repression of Polycomb target genes after depletion of PRC1. d, Western blot analysis of the levels of RING1B-AID and dTAG-TAF1 after simultaneous addition of IAA and dTAG-13 over a 2-h time course. SUZ12 is shown as a loading control. Shown are the representative result of at least three experiments. e, A smRNA-FISH image labelling Zic2 (Polycomb target) transcripts in untreated cells or after 4 h of IAA treatment (RING1B depletion) illustrating increased transcript numbers. White dashed lines indicate cell outlines. Scale bar, 10 µm. f, smRNA-FISH analysis of transcript-per-cell distributions for untreated cells, TAF1-depleted (dTAG-13-treated) cells, RING1B-AID-depleted (IAA-treated) cells, and both RING1B- and TAF1-depleted (IAA + dTAG-13-treated) cells. Depletions were performed for 4 h and at least 400 cells were measured for each gene in each condition. Source numerical data and unprocessed blots are available in Source data.

Source data

PRC1/H2AK119ub1 depletion caused increased TFIID binding at Polycomb genes and an increased propensity to exit from the deep transcriptional OFF state. Therefore, we wondered whether TFIID was required for the de-repression of Polycomb genes. To test this, we engineered a degron tag into the endogenous Taf1 gene in the PRC1 degron cell line (Fig. 6c,d) as TAF1 is integral to the formation of the TFIID holocomplex45. We then depleted either PRC1 or PRC1 and TAF1 simultaneously and examined expression of Zic2 and E2f6 Polycomb genes using smRNA-FISH (Fig. 6e,f). This revealed that neither Polycomb gene was de-repressed without TAF1, indicating that TFIID binding enables elevated expression in the absence of PRC1/H2AK119ub1 and that Polycomb-dependent transcription control is focused on limiting TFIID-dependent transcription initiation. Therefore, we discover Polycomb-mediated gene repression relies on sustaining a deep OFF state through limiting TFIID binding at gene promoters.

Discussion

How chromatin states regulate transcription to control gene expression has remained a major conceptual gap in our understanding of gene regulation. Using rapid degron-based protein depletion, transcription imaging and simulations, we discover that the Polycomb system counteracts transcription by sustaining promoters in a long-lived deep OFF state (Figs. 13). Using live-cell SPT and genomic approaches, we demonstrate that the Polycomb system sustains this deep OFF state by counteracting binding of factors that enable early PIC formation (Fig. 4) and that this relies on non-canonical as opposed to canonical PRC1 complexes (Fig. 5). Finally, we show Polycomb gene de-repression is caused by increased TFIID association, demonstrating that the Polycomb system limits association of general transcription factors to maintain repression (Fig. 6). These discoveries provide a rationale for how the Polycomb system regulates transcription.

Several distinct models have been proposed to explain how the Polycomb system influences transcription to counteract gene expression6,21,49,50,51,52,53,54,55,56. However, these mostly originate from in vitro biochemistry or ensemble fixed-cell analyses that are blind to the dynamic control processes that regulate transcription in living cells. Our transcription imaging now reveals that PRC1/H2AK119ub1 primarily represses transcription and gene expression by limiting transition out of a deep promoter OFF state and into a permissive state where ON periods or bursts of transcription occur. Previously, using static smRNA-FISH analysis and a two-state model of transcription, we concluded that PRC1 might influence gene expression by regulating transcription burst frequency (that is, the frequency of ON periods within permissive periods)21. Now, using live-cell imaging in which we directly observe Polycomb gene transcription, we reveal these genes adhere to a three-state model within which PRC1 limits entry into the permissive state. We demonstrate that this is mediated by counteracting association of early PIC components with the promoter, consistent with recent observations demonstrating that alterations in TATA box sequences that reduce their affinity for TBP and manipulating factors that affect PIC formation also limit entry into permissive periods22,24,57,58. Importantly, effects on PIC formation and gene de-repression appear to rely on non-canonical PRC1 complexes that deposit the majority of H2AK119ub1 at Polycomb chromatin domains, consistent with previous work demonstrating the importance of H2AK119ub1 for Polycomb-mediated repression9,15,16. Therefore, we identify central role for Polycomb-mediated and chromatin-based repression in regulating the OFF-to-permissive promoter state transition.

Importantly, our findings in live cells differ from previous in vitro biochemical observations suggesting that Polycomb complexes might block recruitment of mediator, but not TBP or TFIID49. A possible explanation for this discrepancy is that chromatin templates used in in vitro reconstitution experiments do not contain H2AK119ub1, which we and others have shown is important for repression in vivo15,16. Unlike most other histone modifications, ubiquitylation is a bulky 76 amino acid adduct that dramatically alters the nucleosome, suggesting that it could possibly function to repress transcription by influencing how transcription and other regulatory factors interact with promoter chromatin39,40. Recent biochemical and structural work has shown that TFIID and other components of the general transcription machinery make key contacts with nucleosomes as part of early transcription initiation mechanisms40. With this in mind, an important avenue for future in vitro biochemical and structural work will be to understand whether H2AK119ub1 influences core transcriptional machinery interaction with promoter chromatin to enable gene repression.

Gene expression is dynamic throughout mammalian development. For example, genes may be inactive during early development and their repression maintained by the Polycomb system, but later in development their expression may be required. Consistent with this requirement, we now discover that Polycomb-dependent repression does not act as a constitutive block to transcription, but instead functions by limiting binding of early PIC-forming components to reduce the probability that a promoter enters into a transcriptionally permissive state. Given the breadth of gene types the Polycomb system must regulate in distinct cellular contexts, limiting general transcription factor function may provide a universal means to constrain transcription at genes with diverse regulatory inputs without having to influence highly divergent gene-specific DNA binding factors or other regulatory influences. In the context of developmental transitions when Polycomb genes become activated, we envisage that limiting the frequency of entering into permissive periods could also ensure low-level activation signals are quelled, yet the gene promoter would remain receptive to strong and persistent activation signals necessary to initiate gene expression, as we show is the case of the Polycomb gene Meis1 (Extended Data Fig. 8). Counteracting weak or inappropriate activation signals may be particularly important during development for suppressing noise and maintaining cell identity, as has been proposed previously as a key role for the Polycomb system6. Once genes are activated, persistent transcription leads to Polycomb chromatin domain erosion in part through the transcriptional machinery guiding Trithorax chromatin-modifying systems, which deposit histone modifications that inhibit Polycomb chromatin domain integrity5,6,59. This suggests Polycomb and Trithorax systems may counteract each other by installing chromatin states that decrease or increase the probability that a gene promoter is in a state that is permissive to transcription. In the context of future work, it will be important to uncover whether this control point is the focus of antagonistic Polycomb or Trithorax systems.

In conclusion, we demonstrate that the integration of rapid degron approaches, live-cell imaging of transcription and detailed analysis of transcription regulatory factors by SPT can provide an insight into how chromatin-based gene regulation is controlled in living cells. In doing so, we provide compelling evidence that non-canonical PRC1/H2AK119ub1 represses gene expression by sustaining promoters in a deep OFF state that is refractory to PIC formation and transcription.

Methods

Cell culture

The Ring1a−/−, RING1B-AID mouse embryonic cell line was as previously described and extensively characterized21,26. Cells were grown on a gelatinized culture plate at 37 °C and 5% CO2 in DMEM (Gibco) with 10% foetal bovine serum (Sigma), 2 mM l-glutamine (Life Technologies), 1× non-essential amino acids (Life Technologies), supplemented with 0.5 mM β-mercaptoethanol (Life Technologies) and 10 ng ml−1 leukaemia inhibitory factor (produced in house) and split every other day. To deplete RING1B-AID, cells were treated with IAA (Life Technologies) at 500 µM. To deplete T7-dTAG-TAF1, cells were treated with 20 µM 5,6-dichloro-1-beta-d-ribofuranosylbenzimidazole (DRB) for 1 h, washed three times and treated with 100 nM dTAG-13 (Tocris) for 4 h60. To induce degradation of PCGF2-bTAG or PCGF2-dTAG, cells were treated for 4 h with either 500 nM AGB1 or 100 nM dTAG-13, respectively. To induce expression of Meis1, the cells were grown in a medium described above for 72 h with 1 µM all-trans retinoic acid without leukaemia inhibitory factor.

Genome engineering

To knock-in HaloTag61, FKBP12F36V (dTAG, Addgene, 62988), bTAG62, MS2x128 array24 or tdMCP-GFP (Addgene, 40649) into specific genomic locations (typically N or C termini of a gene, or the first intron for MS2 array), guide sequences were designed using the CRISPOR tool63 and cloned into pSptCas9(BB)-2A-Puro(PX459)-V2.0 guide expression plasmid (Addgene, 62988). The complete list of guide sequences can be found in Supplementary Table 1. Targeting constructs used as templates for homology-directed repair were Gibson assembled using Gibson master mix (New England Biolabs) and PCR-amplified homology arms corresponding to the genomic sequence flanking the desired site of insertion. A list of primers used to amplify homology arms are included in Supplementary Table 2. MCP–GFP, dTAG or HaloTag were amplified by PCR from the respective plasmids. The MS2x128 array was cut out of its original plasmid (gift from E. Bertrand)24 using AleI/NheI restriction enzymes. dTAG was Gibson assembled to include a 3xT7-3xStrepII-tag. Cells were transfected with 2 µg of the targeting construct and 0.5 µg of the guide expressing construct using Lipofectamine 3000 according to the manufacturer’s protocol (Thermo Fisher Scientific). One day after transfection, cells were plated sparsely and selected with 1 µg ml−1 puromycin for 48 h. Puromycin was removed and the cells were grown until distinct colonies formed. Individual clones were picked and propagated in 96-well plates that were then screened for homozygous insertion by PCR. Screening primers are available in Supplementary Table 2. HaloTag and dTAG labelling was validated at the protein level by western blot, and in the case of HaloTag by labelling with tetramethylrhodamine and microscopy (Extended Data Fig. 5a,b). MCP–GFP cells were inspected for expression uniformity (Extended Data Fig. 3a). The integrity of MS2x128-containing lines was further confirmed by PCR using Q5 (New England Biolabs) and Terra (Takara) polymerases as well as by microscopy using RNA-FISH detecting intronic sequences (Extended Data Fig. 1b and Supplementary Table 4) expected to colocalize with nuclear MS2x128/MCP–GFP foci.

Nuclear extraction and western blot

Nuclear extraction and western blot analysis were performed as described previously21. In brief, for nuclear extraction, cells growing on a 10-cm plate were collected, washed once with PBS and resuspended in ten volumes of buffer A (10 mM HEPES pH 7.9, 1.5 mM MgCl2, 10 mM KCl, 0.5 mM dithiothreitol, 0.5 mM phenylmethyl sulfonyl fluoride and protease inhibitor cocktail (Roche)). Subsequently, cells were spun down at 1,500g for 5 min and resuspended in three volumes of buffer A with 0.1% NP-40. Following centrifugation, the pellet was resuspended in one volume of buffer C (5 mM HEPES pH 7.9, 26% glycerol, 1.5 mM MgCl2, 0.2 mM EDTA, protease inhibitor cocktail (Roche) and 0.5 mM dithiothreitol) with 400 mM NaCl and incubated on ice for 1 h. Nuclei were pelleted by centrifugation at 16,000g for 20 min at 4 °C. The supernatant was retained as nuclear extract. For western blotting, 15–20 µg of nuclear extract was heated in SDS loading buffer at 95 °C for 5 min and loaded on to an acrylamide gel (8–12%) run with Tris–glycine buffer or a 3–8% Tris–acetate NuPAGE gradient gel run with NuPAGE Tris–acetate running buffer (Thermo Fisher Scientific) and separated by electrophoresis. Next, the resolved proteins were transferred onto nitrocellulose membrane using Trans-Blot Turbo Transfer System (Bio-Rad). The membrane was blocked with 5% milk in PBS and 0.1% Tween-20 (PBST-milk) for 1 h. The membrane was transferred to PBST-milk containing primary antibodies and incubated overnight at 4 °C (Supplementary Table 3 contains information on antibodies and dilutions). The next day, membranes were washed three times with PBST-milk and incubated for 1 h with secondary antibody conjugated with IRDye (Li-COR). Following 3 × 5-min washes with PBST and a 5-min wash with PBS, the membrane was visualized with the Odyssey Fc system (Li-COR).

cCHIP and high-throughput sequencing

cCHIP was performed as previously described64. In brief, 5 × 107 ES cells engineered with T7-dTAG-TAF1 were fixed with 1% formaldehyde (methanol-free, Thermo Fisher Scientific) for 10 min at 25 °C under constant gentle rotation. Fixation was quenched with 150 mM glycine and the cells were washed with ice-cold PBS and snap frozen in LN2. Additionally, 5 × 107 HEK293T T7-SCC1 cells (a gift from M. Houlard) were fixed with 1% formaldehyde as above and snap frozen in 2 × 106 aliquots.

For spike-in calibration, 2 × 106 HEK293T cross-linked cells were resuspended in 100 µl ice-cold lysis buffer (50 mM HEPES pH 7.9, 150 mM NaCl, 2 mM EDTA, 0.5 mM EGTA, 0.5% NP-40, 0.1% sodium deoxycolate and 0.1% SDS) and added to 5 × 107 fixed ES cells resuspended in 900 µl lysis buffer. The cells were incubated on ice for 10 min and sonicated using Bioruptor Pico sonicator (Diagenode) for 23 cycles (30 s on/30 s off), shearing genomic DNA to produce fragments between 300 bp and 1 kb.

Before immunoprecipitation, chromatin was diluted to 300 µg ml−1 with lysis buffer and pre-cleared with Protein A agarose beads (Repligen) and blocked with BSA and transfer RNA for 1 h at 4 °C. The pre-cleared chromatin was then incubated with the respective antibody overnight rotating at 4 °C. Antibody-bound chromatin was purified with 20 µl blocked Protein A agarose beads for 3 h at 4 °C. ChIP washes were performed as described previously64. ChIP DNA was eluted in 1% SDS and 100 mM NaHCO3 and cross-links were reversed at 65 °C with 200 mM NaCl and RNase A (Sigma) under constant shaking. The samples were then treated with 20 µg ml−1 proteinase K (Sigma) and purified using a ChIP DNA clean and concentrator kit (Zymo Research). The corresponding input DNA was purified for each sample. The efficiency of each ChIP reaction was confirmed by qPCR. All primers used are listed in Supplementary Table 6.

For cChIP–seq, three reactions were set up for each condition and pooled for library preparation. Before library preparation, 5 ng ChIP DNA was diluted to 50 µl in TLE buffer (10 mM Tris–HCl pH 8.0 and 0.1 mM EDTA) and sonicated with a Bioruptor Pico sonicator for 17 min (30 s on/30 s off). Libraries were prepared using NEBNext Ultra II DNA library prep kit for Illumina (New England Biolabs) and sequenced as 40-bp paired-end reads on Illumina NextSeq 500 platform.

Massively parallel sequencing, data processing and visualization

For cChIP–seq, paired-end reads were aligned to concatenated mouse (mm10) and spike-in human (hg19) genomes using Bowtie 2 (ref. 65) with the ‘–no-mixed’ and ‘–no-discordant’ options specified. Reads that were mapped more than once were discarded, followed by removal of PCR duplicates using Sambamba66.

For cChIP–seq visualization and annotation of genomic regions, mouse reads were randomly downsampled based on the spike-in ratio in each sample8. Individual replicates (n = 3) were compared using multiBamSummary and plotCorrelation functions from deepTools (version 3.1.1)67, confirming a high degree of correlation (Pearson’s correlation coefficient >0.9). Normalized replicates were pooled for downstream analysis. Genome-coverage tracks for visualization on the University of California, Santa Cruz (UCSC) genome browser68 were generated using the pileup function from MACS269 for cChIP–seq.

Heat map and meta plot analysis for cChIP–seq was performed using computeMatrix and plotProfile and plotHeatmap functions from deepTools (v.3.1.1)67, looking at read density at transcription start sites of a custom-built non-redundant mouse gene set (n = 20,633), divided into three categories (non-Polycomb bound, Polycomb bound and non-CGI) based on the presence of a non-methylated CGI and binding of PRC1 + PRC2 at their promoters as defined previously8. Intervals of interest were annotated with read counts from merged replicates, using a custom-made Perl script utilizing SAMtools (v1.7)70. Box plot analysis of the distribution of log2FC was performed using a custom R script with boxes showing the IQR and whiskers extending by no more than 1.5× IQR were used. P values were calculated using a Wilcoxon rank sum test. Read counts for all the experiments are included in Supplementary Table 5.

Gene expression analysis

For gene expression analysis by qPCR with reverse transcription (qRT–PCR), RNA was extracted using a RNeasy extraction kit (Qiagen) and complementary DNA was synthesized using ImProm-II Reverse Transcription system (Promega). qRT–PCR was performed on a Rotor-Gene Q two-plex High Resolution Melt Platform using SYBR Green with primers spanning across exon junctions to prevent the amplification of genomic DNA. All primers used are listed in Supplementary Table 6.

RNA-FISH protocol and imaging

smRNA-FISH was carried as described previously21. In brief, cells were trypsinized and fixed in 3.7% formaldehyde in suspension and then incubated in 70% ethanol at 4 °C for at least 1 h. Cells were then labelled in 2× SSC, 10% formamide and 20% dextran sulfate at 37 °C overnight with a suspension of 48 20–22-nucleotide probes (Stellaris) designed to be evenly distributed across exons or introns of the target transcript. Cells were then spun down and washed multiple times to ensure low non-specific signal. The cells were then incubated with 4,6-diamidino-2-phenylindole (DAPI) to label DNA and agglutinin–Alexa488 to label cell membranes. The cell suspension was mixed 1:1 with Vectashield H-1000 (Vectorlabs), distributed as a monolayer on glass slides and covered with microscopy-grade glass coverslips. Images were acquired using the same microscopy set up as described for live-cell transcription imaging except a 2× magnifying lens was used, resulting in 91.5-nm camera pixel size. To estimate mRNA half-life, transcription initiation was blocked with triptolide (500 nM) for 4 h and the mean numbers of transcripts in cell population were estimated using smRNA-FISH as described above. The experiment was performed in three biological replicates. A mono-exponential decay was assumed to represent the mRNA degradation rates upon transcription block and was used to extract mRNA half-life.

Live-cell transcription imaging

Transcription was imaged using an Olympus IX83 system fitted with humidified chamber with carbon dioxide atmosphere at 37 °C. The microscope was operated through CellSens software and was equipped with a ×63 1.4-numerical aperture (NA) oil objective lens and a 1,200 × 1,200 px scientific complementary metal-oxide semiconductor (sCMOS) camera (Photometrics). Additional magnifying 1.6× lens was used in front of the camera resulting in final pixel size of 114.4 nm. To image transcription, cells were plated on gelatinized 8-well microscopy µ-slide (IBIDI) 5 h in advance of imaging. At 1 h before imaging, the medium was changed to mouse ES cell medium with fluorobrite DMEM instead of phenol red DMEM without or with 500 µM auxin in neighbouring wells of the imaging chamber. The imaging conditions were 20 images at 0.7 µm z-step interval per frame, 8 h total duration with 4 min time interval. A 20% 490 nm exciting light and 70 ms camera exposure time were used. A minimum of n = 3 biological replicates of untreated and IAA-treated cells were recorded except for Hspg2, where two replicates were acquired.

Identification of active transcription sites in movies

Individual three-dimensional (3D) time-course movies were inspected for cells where there was appearance of transiently accumulating nuclear MCP–GFP signal corresponding to nascent transcription. These cells were cut out and saved as single-cell movies. For foci intensity read out, the following protocol was used: first, the custom-made ImageJ/FiJi script removed the background with rolling ball algorithm (5 px radius) leaving only punctate MCP–GFP signal. Next, 3D Objects Counter71 was applied to individual 3D time frames to identify active transcription sites in 3D (15 intensity threshold and 10–250 voxel objects). The resulting individual .csv files contained spot volume, intensity and centre of gravity in 3D in individual time frames. The extracted 3D positions were used to confirm correct spot identification in raw movies.

To create time-course fluorescence intensity trajectories for individual active transcription sites (see Fig. 1c for examples) a custom-made R script was used. Overall, the script used previously obtained .csv files with MCP–GFP spot detected in individual time frames to extract the fluorescence intensity of the nascent transcription site and created a combined fluorescence intensity trajectory. In the case of multiple spots detected in a single time frame, for example, when multiple active transcription sites or individual rapidly diffusing pre-mRNAs were identified within the same cell and time frame t, the algorithm follows the spot with the shortest 3D Euclidean distance to the spot it already followed in a preceding time frame t 1. If multiple spots were identified in the first time frame of the movie (t = 1), the spot to follow as the transcription site was assigned manually. Every single-cell movie and preliminary trajectory were manually inspected.

These preliminary fluorescence intensity trajectories were then corrected for photobleaching in the following manner: MCP–GFP-expressing cells were imaged with an identical imaging protocol to the one used for live-cell transcription imaging. The constant background intensity value was measured outside the cells and subtracted from every image. The resulting cell images containing only fluorescence signal were thresholded in 3D using ‘Huang’ settings and total cellular MCP–GFP signal intensity in each time frame was measured. The resulting normalized GFP photobleaching curve representing three biological replicates was approximated with a single exponential fit used next to correct active transcription site fluorescence trajectories through multiplying the extracted transcription site intensity in every time frame i by 1/−exp(0.05 × i), hence accounting for GFP photobleaching during the measurements (Extended Data Fig. 3b). Finally, corrected time course fluorescence trajectories of single active transcription sites were plotted and manually inspected through comparing to raw single-cell movies. A minimum of 250 cells were imaged per biological replicate of which a fraction underwent transcription as judged by MCP–GFP signal accumulation.

Single pre-mRNA intensity estimation

To capture individual pre-mRNAs reliably, a slightly altered imaging protocol was used. In brief, live cells were imaged in 3D using 20 images at 0.7 µm z-intervals with 70 ms camera exposure time (same conditions as used for live-cell transcription imaging); however, a 2× magnifying lens was used (image pixel size 91.5 nm), and resulted in less light arriving at the camera (0.5723 ± 0.006 (n = 3 measurements)), and this value was taken into account in single pre-mRNA fluorescence intensity calculation (see below). Exciting light was set at 3× the exciting light intensity used for live-cell transcription imaging of active transcription sites. For example, 490 nm excitation was set to 83% instead of 20%, which corresponded to 3× higher 490 nm excitation intensity as evident from calibration curve acquired with varying 490 nm excitation intensity and constant camera exposure time (Extended Data Fig. 3c). Candidate single pre-mRNA foci were detected using the 3D Objects Counter71 after subtracting the background with a rolling ball algorithm twice (radius of 10 px). Foci were identified in (1) two-dimensional maximal projections of 3D images for high-confidence identification and (2) raw 3D images for actual identification. Foci appearing in both approaches were used further. To filter out much brighter spots representing active transcription sites, a maximal volume threshold of 58.6 × 10−3 µm3 was applied, the remaining foci were confirmed to be nuclear and were assumed to represent single pre-mRNAs. Their intensity was measured and was further multiplied by 1/0.5723 = 1.747 (GFP intensity difference originating from using 2× instead of 1.6× magnifying lens, see above) and divided by 3 (to account for 3× the 490 nm excitation intensity used in comparison to actual live-cell transcription imaging protocol). Final single pre-mRNA intensity distributions followed normal distribution with mean (s.d.) of 323 (134), 335 (115) and 330 (116) for Zic2, E2f6 and Hspg2, respectively (Extended Data Fig. 3d).

Analysis of transcription parameters from fluorescence tracks

Transcription ON periods were directly identified in fluorescence trajectories of individual active transcription sites as signal intensity maxima using a custom-made algorithm in R. In brief, the algorithm starts through loading an individual trajectory and uses inflection point identification to attribute individual data points with local maxima or minima with three degrees of strength based on how pronounced they are with respect to surrounding data points. Timepoints where no spot was identified (intensity equal to 0) were automatically set as global minima. The algorithm then plots the trajectories with overlaid candidate preliminary maxima and minima for user inspection. Furthermore, every maximum identified in a fluorescence track was inspected. To identify an ON period, a given maximum is assigned a single nearest preceding minimum because every transcription ON period begins when the fluorescence signal of active transcription site sharply increases and ends when it reaches a maximum. In case no minimum preceding the scrutinized maximum is immediately found while another local maximum is reached, this ‘intermediate maximum’ is discarded from the analysis and the global minimum search continues until one is found. When a minimum–maximum pair is matched, the fluorescence signal intensity in time frames preceding the maximum is investigated to identify the true end of the ON period. This relies on the fact that the ON period ends when the fluorescence signal ceases to rapidly increase. However, often the global maximum is identified several time frames away due to fluorescence signal fluctuation and the noisy nature of these data. Therefore, to identify the time frame best representing the end of an ON period, the algorithm studies the local relationship of the identified maximum with five preceding frames and resets its position to the time frame where the steep signal increase stops. The final minimum–maximum pair represents an individual ON period. The following parameters are extracted from each ON period: (1) duration time (in minutes), (2) amplitude (in transcripts after converting the arbitrary units of fluorescence into single mRNAs), and (3) RNA Pol II re-initiation rate or time interval between initiating polymerases. To approximate the re-initiation rate, fluorescence signal between respective minimum and maximum within ON period is approximated using a linear fit where its slope represents the speed of transcript production within an ON period. The rate of polymerase re-initiation can only be estimated for ON periods greater than one transcript. Additionally, owing to the 4 min interval used in time course measurements, this analysis could only be reliably carried out for ON periods with amplitudes exceeding 2.5 transcripts (examples are presented in Extended Data Fig. 3e,f).

Measurements of the fraction of time a promoter spends in the permissive state

Permissive periods were identified from live-cell transcription trajectories as consecutive periods in which ON periods occurred within 60 min of each other. Periods outside of permissive periods were considered OFF periods. To account for the OFF periods that occurred in cells lacking detectable ON periods during the entire 8-h-long trajectory, we assumed that each cell contained on average three alleles, consistent with ES cells spending a large fraction of their cell cycle in S-phase. Assuming alleles are regulated independently of each other (as shown previously21) the number of alleles in a permissive period per cell should follow a negative binomial distribution of cells with three, two, one or zero alleles being transcriptionally permissive during the movie. Therefore, the fraction of the cells where no alleles were transcriptionally active was measured (such cells occurred in 8-h-long movies at 36.4(5)%, 40(5)% and 10(3)% for Zic2, E2f6 and Hspg2, respectively) and used to simulate a negative binomial distribution of alleles transcriptionally permissive during the movie recapitulating the abundance of the cells with zero alleles that are permissive to transcription (or all three alleles are in OFF state). These distributions (obtained at negative binomial probabilities of 0.284, 0.260 and 0.545 for Zic2, E2f6 and Hspg2, respectively) were then used to account for all the alleles in cell population that remained in the OFF state throughout the entire duration of the 8-h-long movie for untreated cells. For the IAA-treated condition, the following values were obtained: cells with zero alleles permissive to transcription comprised 11(2)%, 18(1)% and 9(9)% for Zic2, E2f6 and Hspg2, respectively, and the respective probabilities used to simulate negative binomial distributions were 0.65, 0.4355 and 0.555. Lastly, the total duration of permissive-periods for all the alleles was summed and divided by total measurement time (integrated time spent in OFF and permissive periods) to obtain a fraction of time promoter spends in permissive period.

RNA-FISH in cell colonies

The cells were plated on 8-well IBIDI µ-well chamber (IBIDI) 12, 24 and 48 h before fixation with 3% paraformaldehyde. Then, the cells were permeabilized at 37 °C using 0.5% Triton X-100 for 20 min. RNA-FISH proceeded overnight as described above. Colonies of varying size were manually identified and imaged in 3D using the microscope parameters described above. A custom-made Fiji/ImageJ script was used to manually segment the colonies and cut out maximal projections of individual cells that were then subject to transcript counting using ThunderSTORM72 as described previously21.

Stochastic simulations of transcript-per-cell distributions

The permissive period of the promoter was characterized and the number of ON periods and time between them was measured (Extended Data Fig. 4a,b). First, we simulated permissive periods assuming the number of ON periods follows a Poisson distribution. We further expected that our 8-h-long microscopy measurements may not be able to reliably capture all ON periods within a permissive period and instead can be expected to randomly sample it (Extended Data Fig. 4c). To interpret correctly this experimentally assessed number of ON periods per movie (Extended Data Fig. 4b) and account for the fact that our microscopy measurement may capture only a part of permissive period, we sampled the simulated permissive periods knowing the time interval between ON periods (Extended Data Fig. 4a) using an 8-h-long theoretical measurement sliding window recapitulating our microscopy measurements. The number of ON periods were then counted within that sliding window resulting in the number of ON periods that would be captured experimentally. We then performed this simulation for a range of hypothetical Poisson-distributed numbers of ON periods per theoretical permissive period (Extended Data Fig. 4c) and found a value of ON periods per permissive period (Extended Data Fig. 4d), resulting in a distribution best matching those obtained experimentally (Extended Data Fig. 4b). This was done through finding a minimum of third-degree polynomial fit (Extended Data Fig. 4c). This strategy allowed us interpret the experimentally measured number of ON periods in 8-h-long microscopy experiments and revealed that number of ON periods per movie measured experimentally for Zic2 and E2f6 (Extended Data Fig. 4a) corresponded to Poisson-distributed ON periods per permissive period with means of 8.95 and 9.33, respectively (Extended Data Fig. 4d).

To simulate dynamic transcription of Zic2 and E2f6, we directly measured ON-period amplitudes (Fig. 2b), time intervals between ON periods (Extended Data Fig. 4a) and inferred the number of ON periods per permissive period (Extended Data Fig. 4d). Hence, the simulation of the Polycomb gene was assumed to have three promoter states, that is, an allele may either be in (1) an OFF period (no transcription allowed) or (2) in a permissive period where transcription may take place during (3) ON periods with known amplitudes (Fig. 2b), approximated with a mixed negative binomial and Poisson model, which was then used to randomly draw number of transcripts produced per ON period. Similarly, time intervals between ON periods, were determined by the number of ON periods per permissive period drawn from Poisson distributions (Extended Data Fig. 4d). We simulated individual cells over a period of two 12-h-long cell cycles to allow transcript accumulation. For simplicity, each cell was assumed to have, on average, three alleles (due to relatively short G1 phase in mouse ES cells). Cell cycles were followed by a cell division resulting in random halving the transcript number with 0.5 probability (Extended Data Fig. 4f). Each allele was attributed either OFF or permissive period based on a fixed probability PO>P parameter; each allele drew either of the two and was allowed to repeat the draw once at the onset of the second simulated cell cycle. Then, a third cell cycle of randomly varying duration (0–12 h) was run to desynchronize the cells. At the end, the simulation was stopped and simulated cells containing transcripts accumulated over the full course of simulation were subject to transcript degradation with exponentially distributed survival probability dependent on individual transcript age estimated experimentally (Extended Data Fig. 4e), such that ‘old’ transcripts were more probable to be degraded. Finally, a transcript-per-cell distribution was obtained having simulated 500 cells.

Simulations were run for a range of PO>P probabilities and the most similar to the experimental mRNA/cell distribution was identified through minimizing the sum-difference between experimental smRNA-FISH and simulated transcript-per-cell distributions (Extended Data Fig. 4g). Using this approach, we identified PO>P values for Zic2 and E2f6 in their untreated state. To simulate de-repression following PRC1 depletion, we added an extra step to account for IAA treatment leading to transcript increase: we simulated transcription for an extra 4 h (Zic2) and 2.5 h (E2f6 as we previously noted it de-represses with a delay21) where the PO>P probability value was now increased while all the other transcription parameters were fixed and set to the same values for untreated simulations (ON-period amplitude distribution, duration between ON periods and number of ON periods per permissive period). We varied the number of alleles attributed to the cells to account for their different cell cycle stage (cells contained now either two, three or four alleles in OFF or permissive periods). This strategy allowed us to test whether increased PO>P probability can explain the shift in transcript-per-cell distributions following PRC1 depletion (Figs. 2d and 3f). By testing a range of PO>P values, we identified those that recapitulated experimental IAA-treated smRNA-FISH distributions best (Extended Data Fig. 4g, bottom).

SPT

Cells were plated the day before on gelatinized microscopy dishes with No. 1.5 (MatTek, P35G-1.5-14-C). On the day of measurement, the cells were labelled using 100 nM PA -JF549-Halo (gift from L. Lavis and J. Grimm)73 for 15 min at 37 °C, followed by washing three times with live-cell imaging medium where regular DMEM was replaced with fluorobrite DMEM (Thermo Fisher Scientific). After 30 min, the cells were washed twice before the live-cell imaging medium was supplemented with 30 mM HEPES.

SPT was performed using the previously described system61 equipped with an electron multiplying charge-coupled device (EMCCD) camera (Andor, resulting pixel size 96 nm), 100× 1.4 NA objective (Olympus) with objective collar and heated stage maintaining it at 37 °C, laser module (iChrome MLE MultiLaser engine, Toptica Photonics) and translational module (ASI) carrying the fibre optics output used to adjust the beam position between epi and HiLO illumination. For imaging at high camera rate 22 mW of 561 nm laser excitation was used with varied 405 nm excitation to maintain fluorescent signals at low density. A total of 4,000 15 ms frames were acquired per measurement, at least 20 independent measurements containing typically several cells each were acquired per biological replicate. A minimum n = 3 biological replicates were acquired for each protein studied.

For stable binding time measurements, after photo-activating sufficient molecules with a 405 nm laser, a long camera exposure time was used (0.5 s) and images were acquired with 0.1 mW 561 nm excitation at different rates for different proteins to adequately address their stable binding: 600 frames at 2 Hz for CDK7-HT, HT-CDK9, NELF-B-HT, T7-HT-TFIIB and HT-NC2β; 300 frames at 1 Hz for T7-HT-Med14 and 200 frames at 0.33 Hz for HT-RPB1, HT-TBP, HT-TAF11 and T7-HT-dTAG-TAF1. Experiments were acquired for a minimum of n = 3 biological replicates with a minimum of five movies each and an independent H2B-HT control was measured alongside each replicate to correct for photobleaching (see below).

SPT analysis

Single-molecule signals were localized with subpixel resolution using stormtracker software74 running in MATLAB (MathWorks), performing elliptical Gaussian point spread function fit to each single-molecule signal detected based on fixed intensity threshold (the same for all the experiments). Molecule localizations, when appearing in consecutive frames within 8 pixel distance (768 nm) were merged to form tracks (a single frame gap was permitted to account for molecule blinking). The resulting track files were converted to an evalSPT format recognized by the Spot-ON online analysis tool42 used to determine the molecule-bound fraction through assuming each protein exists in three dynamics states: freely diffusing, slowly diffusing and bound. The following Spot-ON parameters were applied: 0.01 µm length distribution bin width, 10 timepoints, 10 jumps permitted and maximum jump length of 5.05 µm. A localization error of 40 nm was assumed, z correction of 0.7 µm and cumulative density function fitting with three iterations. Diffusion coefficient D was estimated as previously described74 for tracks that spanned minimum four frames. The resulting log10(D) distributions were fitted with mix of two Gaussians (mixtools R package) and mobility fractions corresponded to their weights.

Stable molecule binding time estimation

To estimate stable protein molecule binding times, bound molecules were localized using stormtracker74. Subsequently, tracks representing bound molecules were created after identifying signals appearing in consecutive time frames no further away than 192 nm (2 Hz measurements) or 288 nm (0.33 Hz measurements). The distribution of track lengths of stably bound molecules was fit to estimate apparent dwell times τ:

$$y\,=\,\frac{{{A{\mathrm{e}}}}^{-t/{{\rm{\tau }}}_{1}}}{{\mathrm{e}}^{-{t}_{1}/{{\rm{\tau }}}_{1}}}+\frac{{\left(1-A\right){\mathrm{e}}}^{-t/{{\rm{\tau }}}_{2}}}{{\mathrm{e}}^{-{t}_{1}/{{\rm{\tau }}}_{2}}}$$

where y denotes the fraction of molecules remaining bound at time t, A represents the fraction of the first component of molecules with dwell time τ1, while τ2 is usually longer and represents dwell time of the second component extracted to estimate stable binding time (see below). The first timepoint is represented by t1. Each biological replicate was accompanied by a separate H2B-HT control measurement representing permanently bound molecules. H2B apparent binding time τH2B was assumed to be limited solely by dye photobleaching and exceeded that of any measured protein τdwell. The final corrected protein binding time was defined as follows:

$${{\rm{\tau }}}_{\rm{{bound}}}\,=\,\frac{{{\rm{\tau }}}_{\mathrm{H2B}}\,\times \,{{\rm{\tau }}}_{\mathrm{dwell}}}{{{\rm{\tau }}}_{\mathrm{H2B}}-{{\rm{\tau }}}_{\mathrm{dwell}}}$$

Statistics and reproducibility

Statistical tests were performed with RStudio 1.2.5019 and Microsoft Excel. Throughout the Article, P values < 0.05 were considered statistically significant. No statistical methods were used to predetermine sample sizes but our sample sizes are similar or greater to those reported in previous publications. No data were excluded from the analyses. The experiments were not randomized. Data collection and analysis were not performed blind to the conditions of the experiments.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.