Main

During transcription, nascent RNAs exiting the RNA polymerase II (Pol II) elongation complex can invade double-stranded DNA and rehybridize with template strands to form RNA–DNA duplexes known as R-loops7. R-loops are enriched at active promoters that contain high levels of paused Pol II8,9,10 and contribute to replication stress and genome instability due to the vulnerability of the exposed single-stranded DNA (ssDNA) coding strands to mutagens and nucleases, while also blocking replication fork progression11,12. R-loops can also have beneficial regulatory roles in transcription, DNA repair and the immune response13,14,15,16. Moreover, dynamic control of R-loops contributes to the kinetics of transcriptional program switches during cell differentiation and reprogramming17,18,19.

Biomolecular condensates formed through liquid–liquid phase separation (LLPS) have critical functions in various cellular processes, including transcriptional regulation, signal transduction and the DNA-damage response20,21. These membrane-less structures are typically enriched with proteins that contain repeated modular domains or long stretches of intrinsically disordered regions (IDRs). For example, the phase-separation behaviour of several R-loop regulatory factors has been reported to be linked to their IDRs22.

Here we find that the transcription regulator INTAC regulates R-loop levels by associating with the ssDNA binding complex SOSS to form SOSS–INTAC. The SOSS–INTAC subunit SSB1, through ssDNA recognition and a liquid-like condensate formation ability, localizes SOSS–INTAC at promoters and catalyses transcription termination to prevent aberrant R-loop accumulation to ensure genome stability.

INTAC and SOSS form a stable complex

The 1.59 MDa INTAC complex, comprising 15 subunits of the RNA cleavage complex Integrator and the PP2A core enzyme (Extended Data Fig. 1a), regulates transcription by inducing the termination of promoter-proximally paused transcripts4,5,6,23,24,25,26,27. SOSS—a heterotrimeric DNA damage sensing and repair complex—contains INTS3 (also known as SOSS-A), the ssDNA-binding protein SSB1 (also known as SOSS-B1; or its paralogue SSB2 (encoded by NABP1, also known as SOSS-B2)), and INIP (also known as SOSS-C)3,28,29 (Extended Data Fig. 1b). Given that both complexes contain INTS3, we posited that, together, they could mediate communication between transcription and genome stability machineries. To test this idea, we conducted immunoprecipitation (IP) followed by mass spectrometry analysis. Most subunits of SOSS and INTAC were retrieved after IP of INTS3 but not when using an IgG control (Fig. 1a). We next purified SSB1 and found that SSB1 interacts with other SOSS and INTAC subunits (Fig. 1a). Endogenous co-immunoprecipitation (co-IP) analysis confirmed that SSB1 associates with INTAC subunits (Fig. 1b).

Fig. 1: Identification and genome-wide profiling of the SOSS–INTAC complex.
figure 1

a, Mass spectrometry analyses of endogenous INTS3 and SSB1 IP using nuclear extracts. The values are intensity-based absolute quantification intensities for SOSS, INTAC and Pol II subunits. IgG was used as the binding control. b, Co-IP analysis of endogenous SSB1 and INTS3 followed by western blotting. Data represent two independent experiments. c, Coomassie staining of reconstituted human INTAC complex purified from HEK Expi293 cells, and GST-tagged human SSB1 and Strep-tagged human INIP proteins purified from E. coli. d, Immobilized GST or GST–SSB1 were incubated with purified INTAC in the presence or absence of INIP. The input and bound proteins were analysed by western blotting. Data represent two independent experiments. e, Gradient centrifugation using endogenous HEK Expi293 nuclear extracts. The fractionated samples were analysed using SDS–PAGE followed by western blotting. Data shown represent two independent experiments. f, The overlapping binding regions of INTAC (blue) and SOSS (red) in DLD-1 cells. g, The genomic distribution of SOSS–INTAC. h, ChIP–seq signals of SSB1, INTS3, INTS5, H3K4me3, H3K4me1 and H3K27ac in DLD-1 cells. The peaks are centred on the SSB1 peak summits. i, Correlation analysis for the genomic occupancy of SSB1, INTS3, INTS5, H3K27ac, H3K4me3 and H3K4me1. The numbers are Pearson correlation coefficients. The ChIP–seq results shown represent two biologically independent samples. j, Schematic of the SOSS–INTAC complex. On the basis of structural and biochemical information4,30,33,51,52, the complex can be divided into six modules, including the backbone (INTS1, INTS2 and INTS7), shoulder (INTS5 and INTS8), endonuclease (INTS4, INTS9 and INTS11), phosphatase (INTS6, PP2A-A and PP2A-C), auxiliary (INTS10/13/14/15) and SOSS (INTS3, SSB1/2, INIP) modules. The structural organization of the backbone, shoulder, endonuclease and phosphatase modules is illustrated on the basis of the structure of INTAC4. The organization of the SOSS module was placed according to the structures of SOSS30 and INTS3/633. The organization of the auxiliary module was estimated on the basis of structural and biochemical information of INTS10/13/1452. The structural placement of INTS12 is currently unclear.

To investigate associations of all SOSS subunits with INTAC, we overexpressed and purified protein-A-tagged SSB1, SSB2 and INIP in human embryonic kidney (HEK) Expi293 cells individually, followed by proteomics analysis. IP of each SOSS subunit successfully recovered most INTAC subunits (Extended Data Fig. 1c). The interaction between INIP and INTAC was further confirmed by Flag-tagged INIP overexpression followed by IP (Extended Data Fig. 1d). Our results suggest that the entire SOSS complex can be incorporated into INTAC.

To confirm the association between SSB1 and INTAC, we conducted in vitro pull-down assays using reconstituted INTAC complex from HEK Expi293 cells and purified SSB1 and INIP from Escherichia coli (Fig. 1c). INTAC subunits associated with GST-tagged SSB1 but not GST alone (Fig. 1d and Extended Data Fig. 1e). This interaction was not affected by the presence of INIP (Fig. 1d; compare lanes 8 and 9), consistent with previous data showing the lack of a direct association between SSB1 and INIP3,30. Gradient centrifugation of nuclear extracts demonstrated co-migration of endogenous SSB1, INIP and INTAC (Fig. 1e), suggesting the existence of a stable SOSS–INTAC complex in cells. The majority of SSB1 and INTAC subunits co-localize at higher-molecular-mass fractions, further confirming the existence of SOSS–INTAC (Extended Data Fig. 1f).

SOSS–INTAC targets active chromatin

To identify the genome locations of SOSS and INTAC, we performed chromatin IP followed by sequencing (ChIP–seq) analysis of SSB1, INTS3 and INTS5 in human colon adenocarcinoma DLD-1 cells (Extended Data Fig. 1g). To eliminate potential biases due to antibody efficiencies, only regions co-occupied by INTS3 and SSB1 were defined as reliable SOSS targets, whereas regions co-bound by INTS3 and INTS5 were considered to be faithful INTAC targets. A total of 21,619 loci co-bound by SOSS and INTAC comprise 97% of SOSS targets (Fig. 1f), mainly corresponding to promoter and intergenic regions (Fig. 1g and Extended Data Fig. 1h). Heat maps of SOSS–INTAC targets show a comparable occupancy of SSB1, INTS3 and INTS5 (Fig. 1h). Pearson correlation coefficient analysis shows that genomic distributions of SOSS–INTAC subunits are highly correlated with each other, in addition to their positive correlation with active chromatin marks of promoters and enhancers (Fig. 1i). Consistently, widespread binding of SOSS–INTAC at both active promoters and enhancers was observed (Extended Data Fig. 1i). The binding of SOSS–INTAC subunits on chromatin was further verified by ChIP followed by quantitative PCR (ChIP–qPCR) at promoters of example genes (Extended Data Fig. 1j). Together, these results reveal the formation of a stable SOSS–INTAC complex (Fig. 1j) that primarily localizes to promoter and enhancer regions.

Recognition of ssDNA by SOSS–INTAC

We hypothesized that SSB1 contributes to SOSS–INTAC recruitment to promoters due to its potent ssDNA-binding ability and ssDNA being a prominent feature of actively transcribed regions31. To test this idea, we first confirmed that SSB1 preferentially binds to ssDNA but not to double-stranded DNA (dsDNA) or ssRNA on the basis of an electrophoretic mobility shift assay (EMSA) (Extended Data Fig. 2a). Using a kethoxal-assisted single-stranded DNA sequencing (KAS-seq) protocol31, we found that SOSS–INTAC occupancy is positively correlated with ssDNA levels genome-wide, including at promoters and enhancers (Fig. 2a and Extended Data Fig. 2b–d). We next compared the ssDNA levels at promoters with and without SOSS–INTAC binding, which revealed greater enrichment of ssDNA at SOSS–INTAC-bound promoters (Fig. 2b).

Fig. 2: SSB1 facilitates SOSS–INTAC recruitment to chromatin.
figure 2

a, The correlation between ssDNA and SSB1 levels at SOSS–INTAC-bound regions in DLD-1 cells. P values were computed using two-sided t-tests with 95% confidence intervals based on the Pearson’s product moment correlation coefficient. P < 2.2 × 10−16. n = 29,128 peaks. b, ssDNA levels at promoters with or without SOSS–INTAC binding. For the box plots, the centre line indicates the median, the top and bottom hinges indicate the first and third quartiles, respectively, and the whiskers extend to the quartiles ± 1.5 × interquartile range. P values were calculated using two-sided Wilcoxon rank-sum tests. P < 2.2 × 10−16. c,d, EMSA using Cy3-labelled oligo (dT)48 incubated with INTAC alone (left) or with SSB1–INTAC proteins (right) (c), or with SSB1 alone (left) or with SSB1–INTAC proteins (right) (d). Data represent two independent experiments. e, Western blot analysis of whole-cell extracts from CTR (control, NABP1 knockout) and DKO (NABP2/NABP1 double-knockout) DLD-1 cells. Tubulin was used as the loading control. Data represent two independent experiments. f, Growth curves of CTR and DKO DLD-1 cells. Data are mean ± s.d. n = 4 biological replicates. P values were generated using two-way analysis of variance (ANOVA) performed for day 8. g, ChIP–qPCR experiments using SSB1 (red), INTS3 (blue) and INTS5 (purple) antibodies in CTR and DKO cells. Data are mean ± s.d. n = 3 biological replicates. Statistical analysis was performed using two-tailed t-tests. P values are shown at the top of the graphs. h, Representative browser tracks showing ChIP–Rx signals of SSB1 (red), INTS3 (blue) and INTS5 (purple) in CTR and DKO cells. i, ChIP–Rx signals of SSB1, INTS3, INTS5 in CTR or DKO cells. Peaks are centred on transcription start site (TSS) of SOSS–INTAC-bound genes. j, Pol II ChIP–Rx signals on SOSS–INTAC target genes in CTR and DKO cells. Peaks are centred on the TSS and ranked by decreasing occupancy in CTR cells. FC, fold change.

To confirm direct ssDNA-binding ability, we performed EMSA using synthesized oligo (dT)48 incubated with INTAC alone or SSB1–INTAC protein30. INTAC alone has a weak ssDNA-binding affinity, probably mediated by its INTS3 subunit32,33 (Fig. 2c (left)). Notably, adding SSB1 substantially boosts the interaction with the oligo (Fig. 2c (right)), indicating a key role of SSB1 in recognizing ssDNA. Compared with the migration of bands seen with the SSB1–ssDNA complex, supershifted bands were observed after incubation of SSB1 with INTAC, suggesting the co-migration of SSB1–INTAC with ssDNA (Fig. 2d). These results support the conclusion that SSB1 facilitates the recruitment of SOSS–INTAC by recognizing ssDNA.

SSB1 regulates SOSS–INTAC localization

In contrast to the ubiquitous expression of SSB1, its paralogue SSB2 is expressed tissue specifically and could have redundant roles with SSB1 in certain contexts34,35 (Extended Data Fig. 2e,f). To avoid this potential redundancy, we generated NABP1-null cells to be used as a control cell line (hereafter, CTR cells) for later experiments. CTR cells exhibit no defect in cell growth (Extended Data Fig. 2g). As measured by western blotting and ChIP–qPCR, SOSS–INTAC protein stability and occupancy at the tested genes were not affected by the deletion of NABP1 (Extended Data Fig. 2h,i). NABP1/NABP2 double knockout cells (hereafter, DKO cells) were generated by additionally deleting NABP2 in pooled cells to eliminate clonal variations and to minimize long-term culture-induced secondary effects (Fig. 2e). Compared with the CTR cells, DKO cells exhibit growth defects (Fig. 2f) and diminished INTAC occupancy at target genes (Fig. 2g). Induced expression of either SSB1 or SSB2 rescues the growth defects, corroborating the redundancy of these paralogues (Extended Data Fig. 2j).

To determine the genome-wide regulation of INTAC recruitment by SSB1, we performed calibrated INTS3 and INTS5 ChIP–seq analysis with reference exogenous genome as the spike-in (ChIP–Rx) in CTR and DKO cells. Track examples and genome-wide analyses show decreased INTS3 and INTS5 occupancies at promoters after SSB1 loss (Fig. 2h,i and Extended Data Fig. 3a–e). To determine how ssDNA recruits INTAC to chromatin, we generated SSB1 mutants that specifically compromise DNA binding (W55A/F78A) or disrupt the SSB1–INTAC interaction (E97A/F98A)30 (Extended Data Fig. 3f,g). As shown by ChIP–qPCR analysis of example genes, both mutants exhibit reduced recruitment of INTAC, indicating that both the DNA-binding ability of SSB1 and its ability to interact with INTAC are required for the optimal association of INTAC with promoters (Extended Data Fig. 3h).

SOSS–INTAC modulates Pol II occupancy

The INTAC complex is a major regulator of promoter-proximal termination of paused Pol II4,5,6,23,24,25,26,36. To evaluate whether the SOSS module of SOSS–INTAC regulates Pol II pausing, we conducted Pol II ChIP–Rx and observed a widespread increase in Pol II occupancy at promoters of SOSS–INTAC targets in DKO cells compared with in CTR cells (Fig. 2j, Extended Data Fig. 3i and Supplementary Fig. 2a–c). Using Pol II levels to normalize SOSS–INTAC subunit occupancy, SSB1, INTS3 and INTS5 were each markedly reduced in DKO cells, corroborating the notion that SSB1 recruits SOSS–INTAC to chromatin (Supplementary Fig. 2d–f). As previous reports described differential regulation of Pol II progression by Integrator depending on exon number, overall length and the coding or non-coding status of genes24,25,26,37, we grouped genes by these properties; this demonstrated a general accumulation of Pol II at promoters for all gene classes. Pol II occupancy changes in gene bodies varied between classes, with monoexonic, non-coding and shorter genes exhibiting a substantially greater increase in polymerase levels in DKO cells compared with at longer or multiexonic genes (Supplementary Fig. 2g–i), consistent with a loss of Integrator function in DKO cells. Moreover, the accumulation of Pol II at promoters was recapitulated by the depletion of INTS2, supporting the functional connection between SOSS and INTAC (Extended Data Fig. 3j and Supplementary Fig. 2j–l). The pausing index—the ratio of Pol II occupancy at promoters over gene bodies, indicating the extent of pausing—is evidently higher after the loss of SSB1 or INTS2 (Supplementary Fig. 2m,n).

To measure paused Pol II changes after transcription initiation, we next used precision run-on sequencing (PRO–seq) to quantify nascent transcripts at the single-base resolution. Loss of SSB1 induces the accumulation of paused Pol II at promoters (Extended Data Fig. 3k–m and Supplementary Fig. 3a,b), in agreement with the disruption of INTAC or Integrator leading to defects in promoter-proximal termination4,5,6,23,37. Corroborating these findings, the levels of Pol II phosphorylated at serine 5 of its C-terminal domain, representing paused Pol II, were substantially increased in DKO cells (Extended Data Fig. 3n). Assay for transposase-accessible chromatin with sequencing (ATAC–seq) analyses demonstrated increased chromatin accessibility at SOSS–INTAC targets in DKO cells, probably resulting from Pol II accumulation (Extended Data Fig. 3o,p and Supplementary Fig. 3c). Indeed, as shown at example genes, changes in Pol II occupancy and chromatin accessibility were comparable (Extended Data Fig. 3i,q and Supplementary Fig. 3d,e). Thus, SOSS–INTAC prevents the accumulation of paused Pol II and limits chromatin accessibility.

R-loops affect SOSS–INTAC localization

Owing in part to the higher thermodynamic stability of an RNA–DNA duplex compared with dsDNA, R-loops can accumulate at actively transcribed genomic regions, especially at promoters containing the highest levels of Pol II and associated short nascent transcripts8,9,10,38. To investigate whether promoter-associated R-loops modulate SOSS–INTAC recruitment to chromatin, we established a cell line with inducible expression of RNase H1, which degrades the RNA strand of RNA–DNA duplexes and can therefore resolve R-loops (Extended Data Fig. 4a). As shown at example genes, SSB1 levels decrease at promoters after doxycycline (DOX) treatment (Extended Data Fig. 4b,c). R-loop CUT&Tag followed by qPCR confirmed the decrease in R-loops at the corresponding promoter regions after the induction of RNase H1 expression (Extended Data Fig. 4d,e). Furthermore, RNase H1 overexpression induces a genome-wide attenuation of SSB1 occupancy at promoters (Fig. 3a,b and Extended Data Fig. 4f). INTS3 occupancy at SSB1-bound regions is similarly reduced after RNase H1 overexpression (Extended Data Fig. 4g–k). These results indicate that R-loops can be recognized by SSB1, leading to increased SOSS–INTAC at these promoters.

Fig. 3: SOSS–INTAC regulates R-loop levels.
figure 3

a, SSB1 occupancy over 6 kb regions centred on the TSS of SOSS–INTAC target genes in DLD-1 cells with DOX-inducible RNase H1 expression. b, Comparison of SSB1 occupancy at SOSS–INTAC target promoters for DMSO- and DOX-treated cells. For the box plots, the centre line indicates the median, the top and bottom hinges indicate the first and third quartiles, respectively, and the whiskers extend to the quartiles ± 1.5 × interquartile range. P values were calculated using two-sided Wilcoxon rank-sum tests. P < 2.2 × 10−16. n = 10,650 promoters. c, R-loop detection in CTR and DKO cells with DOX-inducible GFP–RNASEH1 expression. Scale bar, 10 μm. d, Quantification of nuclear R-loop signals for c. P values were calculated using two-tailed unpaired t-tests. n = 110 foci from one representative experiment, which was performed twice with similar results. The centre lines indicate the median values. e, R-loop CUT&Tag signals over 6 kb regions centred on the TSS of SOSS–INTAC target genes in CTR and DKO cells. CTR cells were treated with RNase H1 protein during CUT&Tag (lane 4) or incubated with IgG (lane 5) to confirm the specificity of detected R-loop signals. f, Immunofluorescence analysis of R-loop signals in DLD-1 cells with INTS11 or non-targeting (NT) shRNA and overexpression of wild-type (WT) or catalytically dead (E203Q) INTS11 and empty vector control. Scale bar, 10 μm. g, Quantification of the nuclear R-loop signals for f. Statistical analysis was performed using two-tailed unpaired t-tests; P values are shown above the graphs. n = 180 foci from one representative experiment, which was performed twice with similar results. The centre lines indicate the median values. h, Immunostaining of γH2AX signals in CTR and DKO cells with DOX-inducible RNase H1 expression. Scale bar, 10 μm. i, Quantification of the γH2AX focus number in h. Statistical analysis was performed using two-tailed unpaired t-tests; P values are shown above the graphs. n = 90 foci from one representative experiment, which was performed twice with similar results. The centre lines indicate the median values. j, Schematic of the DNA fibre assay. Cells were sequentially pulsed with two different thymidine analogues—IdU and CIdU. k, Representative images of stretched DNA fibres. CTR and DKO cells with DOX-inducible RNase H1 expression were treated with DMSO or DOX as indicated. Red tracks, IdU; green tracks, CIdU. l, Replication fork speed was measured by IdU (red) and CIdU (green) incorporation. P values were determined using two-tailed unpaired t-tests. n = 160 fibres were measured for each group.

SOSS–INTAC attenuates R-loop levels

On the basis of our findings that SSB1-mediated recruitment of SOSS–INTAC controls promoter-proximal termination and chromatin accessibility at promoters, we speculated that SOSS–INTAC could reciprocally influence R-loop levels. To examine this hypothesis, we measured cellular R-loop levels on the basis of immunofluorescence analysis using the S9.6 antibody, which recognizes RNA–DNA hybrids. Notably, a strong elevation in nuclear S9.6 signals was observed in SSB1- or INTS2-depleted cells (Extended Data Fig. 5a–d; DMSO conditions). Importantly, the accumulation of these nuclear signals could be suppressed by DOX-induced overexpression of wild-type RNase H1 (Extended Data Fig. 5a–d; DOX conditions), indicating that the S9.6 antibody is detecting nuclear R-loop increases after loss of SSB1 and INTS2.

As the S9.6 antibody detects dsRNAs in addition to RNA–DNA hybrids, we used a purified GFP-tagged catalytic-dead RNase H1 protein (GFP–dRNASEH1) as the R-loop sensor39,40. Although pretreatment with ssRNA endonuclease RNase T1 and dsRNA endonuclease RNase III greatly eliminates the signals detected by S9.6, it has no notable effect on the GFP–dRNASEH1 signal, suggesting that R-loop measurements made with GFP–dRNASEH1 are unlikely to be confounded by ssRNA and dsRNA binding39,40 (Supplementary Fig. 4). We therefore used GFP–dRNASEH1 to quantify cellular R-loop levels in further studies.

The loss of SSB1 induces the formation of R-loop foci and higher R-loop levels, which are eliminated by DOX-induced expression of wild-type RNase H1 (Fig. 3c,d). Quantitative analysis shows that R-loop levels in DKO cells are substantially higher than in CTR cells (Fig. 3d). R-loop CUT&Tag was used to evaluate genome-wide changes in R-loops, revealing a large-scale induction of R-loops in DKO cells that was eliminated by treatment with RNase H1, further indicating that the measured signals are bona fide R-loops (Fig. 3e and Extended Data Fig. 5e–g). Loss of INTS2 elicits a similar accumulation of cellular R-loops (Extended Data Fig. 5h,i). To determine whether R-loop regulation by SSB1 is mediated through the endonuclease activity of SOSS–INTAC, we depleted INTS11, the catalytic subunit of the endonuclease module, and rescued INTS11 loss with ectopic expression of wild-type or catalytically dead (E203Q) INTS11 (Extended Data Fig. 5j,k). INTS11 depletion alone induces substantial R-loop accumulation (Fig. 3f,g and Extended Data Fig. 5l). This accumulation was rescued by wild-type but not catalytically dead INTS11, and simultaneous INTS11 knockdown and expression of catalytically dead INTS11 gave rise to the greatest R-loop enrichment (Fig. 3f,g and Extended Data Fig. 5l). To corroborate the functional connection between SOSS and INTAC in R-loop regulation, we overexpressed wild-type SSB1 or SSB1(E97A/F98A), the mutant defective in INTAC interaction, in DKO cells. Notably, wild-type SSB1 but not the SSB1(E97A/F98A) mutant prevents R-loop accumulation in DKO cells (Extended Data Fig. 5m). These data reveal a function for SOSS–INTAC in preventing aberrant R-loop accumulation.

We next examined whether RNA exonucleases facilitate R-loop removal after RNA cleavage by SOSS–INTAC. The major 5′ and 3′ exonucleases responsible for RNA degradation in the nucleus are XRN2 and the exosome complex, respectively. We therefore depleted XRN2, two catalytic subunits of the exosome (DIS3 and EXOSC10) and the nuclear exosome-targeting (NEXT) complex MTR4 subunit that unwinds structured RNA substrates for exosomal degradation (Extended Data Fig. 6a,b). As shown by R-loop CUT&Tag–qPCR analysis, individual depletion of XRN2, DIS3 and MTR4 induces a small but significant upregulation of R-loops at promoters in CTR cells. Simultaneous loss of XRN2 and DIS3 leads to greater R-loop accumulation, indicating that both XRN2 and the exosome contribute to R-loop attenuation (Extended Data Fig. 6c (left)). Although the loss of SSB1 in DKO cells leads to upregulation of R-loops, additional disruption of XRN2 and the exosome does not augment this change (Extended Data Fig. 6c (right)). SOSS–INTAC-loss-induced R-loop accumulation could be epistatic to that caused by disrupting XRN2 and the exosome, whereby endonucleolytic cleavage of RNA by SOSS–INTAC could expose the 5′ and 3′ ends for exonucleolytic digestion by XRN2 and the exosome.

We next examined whether the recruitment of XRN2 and the exosome are regulated by SOSS–INTAC. Notably, the promoter occupancy of XRN2, but not exosome or NEXT subunits, is compromised in DKO cells (Extended Data Fig. 6d). Plotting XRN2 occupancy for genes of different classes indicated a highly similar pattern of XRN2 and Pol II (compare Supplementary Figs. 2g–i and  5a–c), as previously reported41. The importance of the endonuclease activity of SOSS–INTAC in these processes is demonstrated by the ability of wild-type but not catalytically dead INTS11 to reverse the changes in Pol II occupancy (Supplementary Fig. 5d,e).

SOSS–INTAC regulates genome stability

Unresolved R-loops can expose ssDNA to damaging agents and induce DNA damage by forming obstacles to replication fork progression, causing transcription–replication conflicts and DNA breaks11,12. We therefore measured γH2AX levels using immunofluorescence and found that the loss of SSB1 in DKO cells stimulates the accumulation of γH2AX, whereas DOX-induced RNase H1 overexpression suppresses γH2AX induction after SSB1 depletion (Fig. 3h,i). γH2AX CUT&Tag analysis further demonstrates elevated γH2AX levels at promoters in DKO cells (Extended Data Fig. 6e), consistent with the accumulation of R-loops at corresponding loci. Knockout of INTS2 induces a comparable change in γH2AX levels at the cellular or genome-wide scale (Extended Data Fig. 6f–h).

Flow cytometry analysis after propidium iodide staining and γH2AX labelling revealed an induction of γH2AX in both G1 and S phases (Extended Data Fig. 6i). We posited that SSB1 loss induces genome instability in part through impeding replication-fork progression. We therefore quantified replication-fork velocity by consecutive pulse labelling with thymidine analogues 5-iodo-2′-deoxyuridine (IdU) and 5-chloro-2′-deoxyuridine (CldU) (Fig. 3j). Disruption of SOSS–INTAC in DKO cells resulted in retarded replication fork progression, which was partially rescued by RNase H1 induction in DOX-treated cells (Fig. 3k,l). These results support that SSB1-mediated SOSS–INTAC recruitment is crucial for restraining R-loop levels and maintaining genome stability.

SOSS–INTAC forms nuclear puncta

The ability of SSB1, a relatively small (22 kDa) protein, to govern the recruitment of SOSS–INTAC, a complex that is around 70 times larger, motivated us to further investigate the biochemical features of SSB1. Comparing the distributions of reconstituted SOSS–INTAC (Extended Data Fig. 1f) with INTAC alone (Extended Data Fig. 7a) after fractionation by gradient centrifugation, we noticed that the association between SOSS and INTAC causes a substantial shift to higher-molecular-mass fractions that cannot be explained by the size of SOSS alone (Fig. 4a), suggesting SOSS-dependent multivalent interactions or oligomerization. E. coli SSB contains an IDR at its C terminus that drives LLPS42. Human SSB1 has an even more disordered C-terminal IDR compared with its E. coli counterpart (Fig. 4b), and the percentage of IDR regions and the disorder intensity of SSB1 are considerably greater compared with other SOSS–INTAC subunits (Extended Data Fig. 7b).

Fig. 4: SSB1 drives the formation of SOSS–INTAC condensates.
figure 4

a, Quantification of purified INTAC (all subunits) and SSB1–INTAC distribution after sucrose density-gradient centrifugation and western blotting. Five subunits were used for quantification (INTS5, INTS6, INTS11, PP2A-A and PP2A-C). b, The domain structure and the intrinsically disordered tendency of E. coli SSB (left) and human SSB1 (right). IUPred assigned scores of disordered tendencies between 0 and 1 to the sequences, and a score of higher than 0.5 indicates disorder. c, Representative images showing the relative locations of endogenous SSB1 and INTAC subunits along with the DAPI signal in DLD-1 cells. Representative curves (right) describe the distribution of relative fluorescence intensities for SSB1 (red) and INTAC subunits (green). Data represent two independent experiments. d,e, GFP–SSB1 (50 μM) was analysed using droplet formation assays with the indicated concentrations of NaCl (d), and the size of the droplets was quantified (e). Each dot represents a droplet. n = 100 foci from one representative experiment, which was performed twice with similar results. The red lines indicate the mean value in each population. f, NaCl concentrations in the GFP–SSB1 solution were changed sequentially as indicated and then examined under a fluorescence microscope. g,h, 1,6-Hex (5%) treatment disrupts droplet formation. GFP–SSB1 (50 μM) was analysed with 37.5 mM NaCl with or without 5% 1,6-Hex (g), and the size of droplets was quantified (h). Each dot represents a droplet. The red lines indicate the mean value in each population. i, Time-lapse imaging of GFP–SSB1 droplets undergoing spontaneous fusions as indicated by the arrows. j, Representative micrographs of GFP–SSB1 droplets before and after photobleaching (top). FRAP quantification of GFP–SSB1 droplets over a period of 100 s (bottom). n = 3 droplets analysed from 1 representative experiment, which was performed 3 times with similar results. k, GFP–SSB1 and Alexa Fluor 568 (AF568)-labelled INTAC (all subunits), either individually or mixed together as indicated, were analysed using a droplet formation assay and then examined under a fluorescence microscope. Scale bars, 5 μm (c, i and j), 20 μm (d, f and g) and 50 μm (k).

To examine the condensation ability of SSB1, we conducted immunofluorescence using an anti-SSB1 antibody and detected nuclear puncta (Fig. 4c). Lacking suitable antibodies for INTAC immunofluorescence, we knocked-in an N-terminal Flag tag at the endogenous loci of two INTAC phosphatase module subunits (INTS5 and INTS8) and two INTAC endonuclease module subunits (INTS4 and INTS11). The immunofluorescence results indicate the presence of INTAC puncta co-localizing with SSB1 nuclear foci (Fig. 4c).

To investigate the interdependency of SSB1 and INTAC for punctum formation, we first depleted SSB1 and SSB2 simultaneously in the cells expressing the Flag–INTAC subunit and performed Flag immunofluorescence analysis. Notably, the loss of SSB1 and SSB2 abolishes punctum formation of INTAC subunits (Extended Data Fig. 7c). However, depletion of INTS11 exerts no noticeable impact on the formation of SSB1 puncta (Extended Data Fig. 7d), indicating that SSB1/2 is the major driver of punctum formation.

SSB1 forms liquid-like condensates

We next examined whether human SSB1 has the ability to form condensates in vitro using protein purified from E. coli. Fluorescence microscopy analysis showed that GFP-tagged SSB1 readily self-associates as micrometre-sized spherical droplets in the absence of crowding reagents (Fig. 4d,e). This droplet formation is sensitive to increased ionic strength, indicating the requirement of electrostatic interactions for SSB1 condensation. Sequentially lowering and increasing salt concentration induces a rapid appearance and disappearance of SSB1 droplets, proving its liquid-like property (Fig. 4f). Moreover, 1,6-hexanediol (1,6-Hex), a compound that perturbs weak multivalent interactions and disassembles structures exhibiting liquid-like properties, hinders droplet formation (Fig. 4g,h). Without the assistance of crowding reagents, the number and size of SSB1 droplets increase gradually when increasing the SSB1 protein concentration (Extended Data Fig. 8a,b). In agreement with the liquid-like property, SSB1 droplets are highly dynamic and readily coalesce into larger ones that are immediately relaxed into a spherical structure (Fig. 4i). Fluorescence signals recover within 2 min after photobleaching in the centre of the droplet (Fig. 4j), consistent with liquid-like condensates.

To examine whether SSB1 can form liquid-like condensates in cells, we used the optoDroplet system fusing SSB1 with mCherry-labelled Arabidopsis photoreceptor cryptochrome 2 (CRY2)43. We found that droplet formation of SSB1, but not the control, was substantially increased after light induction (Extended Data Fig. 8c). Moreover, SSB1 puncta undergo frequent fusion and fission events (Extended Data Fig. 8d,e). The fluorescence signals of foci recover readily after photobleaching (Extended Data Fig. 8f), which is indicative of liquid-like behaviour. On the basis of these findings, we conclude that SSB1 forms liquid-like condensates in vitro and in cells.

SSB1 drives SOSS–INTAC condensation

In contrast to the clearly formed SSB1 droplets (green), no condensates were observed for labelled INTAC (red) alone in the same condition (Fig. 4k and Extended Data Fig. 8g,h), in agreement with predicted disorder intensities (Extended Data Fig. 7b). However, after mixing together, SSB1 and INTAC co-form droplets, suggesting that SSB1 drives the formation of SOSS–INTAC condensates (Fig. 4k and Extended Data Fig. 8g,h). To determine whether INTAC modulates SSB1 condensation formation, we incubated different concentrations of SSB1 with INTAC. Although increasing SSB1 concentrations stimulate INTAC droplet formation, the condensation capacity of SSB1 is at most marginally affected by the presence of INTAC (Extended Data Fig. 8i,j), further indicating that SSB1 drives the formation of SOSS–INTAC condensates.

SSB1 mutations impair condensation

To confirm whether the SSB1 IDR is required for droplet formation, we generated SSB1 lacking the IDR (Fig. 5a and Extended Data Fig. 8k), which did not form droplets alone (Extended Data Fig. 8l,m) or in the context of SOSS–INTAC (Fig. 5b,c) in vitro. To determine the essential amino acids within the IDR that mediate SSB1 droplet formation, we mutated all IDR-enriched residues, except for alanine and proline, to IDR-depleted residues bearing comparably sized side chains, and successfully purified three soluble mutants—SSB1(HY) (all histidine to tyrosine), SSB1(SI) (all serine to isoleucine) and SSB1(RY) (all arginine to tyrosine) (Fig. 5a and Extended Data Fig. 8k,n,o). As shown by fluorescence microscopy, the SSB1(HY) mutation does not affect in vitro droplet formation, whereas SSB1(SI) significantly compromises in vitro droplet formation (Fig. 5b,c and Extended Data Fig. 8l,m). Notably, the SSB1(RY) mutation completely abolishes condensate formation (Fig. 5b,c and Extended Data Fig. 8l,m), highlighting the essentiality of arginine within the C-terminal IDR in mediating the condensation ability of SSB1.

Fig. 5: SOSS–INTAC condensation regulates R-loop levels.
figure 5

a, Schematic of the SSB1 domains and SSB1 mutants. OB-fold, oligonucleotide/oligosaccharide-binding fold. b,c, Fluorescence microscopy analysis of purified GFP–SSB1 mutants mixed with Alexa-Fluor-568-labelled INTAC (all subunits) (b), and quantification of the GFP and Alexa Fluor 568 signal (c). n = 1,500 foci were analysed across two independent experiments. The red lines indicate the mean values. Scale bars, 50 μm (b). ND, not detected. d,e, Schematic of the generation of SSB1-dTAG DLD-1 cells (d) and verification of SSB1 degradation by treatment for 6 h with dTAG (100 nM) (e). f, The R-loop levels at promoters with or without SOSS–INTAC binding measured by R-loop CUT&Tag under the DMSO-treated condition in SSB1-dTAG cells. For the box plots, the centre line indicates the median, the top and bottom hinges indicate the first and third quartiles, respectively, and the whiskers extend to the quartiles ± 1.5 × interquartile range. P values were calculated using two-sided Wilcoxon rank-sum tests. g, R-loop CUT&Tag signals over 6 kb regions centred on the TSS of SOSS–INTAC target genes in SSB1-dTAG cells with dTAG time-course treatment. One sample was treated with RNase H1 protein during CUT&Tag to verify the specificity of R-loop signals. h, Representative browser tracks showing the R-loop signals in SSB1-dTAG cells with time-course dTAG treatment. i, Schematic of the R-loop CUT&Tag–qPCR workflow. j, R-loop CUT&Tag–qPCR analysis of example genes in SSB1-dTAG DLD-1 cells after 24 h treatment of DMSO or dTAG. The RNase H1 control was as shown in h. Data are mean ± s.d. n = 3 biological replicates. Statistical analysis was performed using two-tailed unpaired t-tests; P values are shown above the graphs. k, R-loop CUT&Tag–qPCR analysis of DMSO- or dTAG-treated SSB1-dTAG cells with overexpression of wild-type, mutant SSB1 or empty vector. Data are mean ± s.d. n = 3 biological replicates. Statistical analysis was performed using two-tailed unpaired t-tests; P values are shown above the graphs. l, R-loop CUT&Tag–qPCR analysis of DMSO- or dTAG-treated SSB1-dTAG cells with overexpression of wild-type SSB1 or fusion proteins comprising the N terminus of SSB1 and IDR from TAF15, EWS or YTHDF1. Data are mean ± s.d. n = 3 biological replicates. Statistical analysis was performed using two-tailed unpaired t-tests; P values are shown above the graphs. m, Working model demonstrating the proposed mechanism by which SOSS–INTAC attenuates R-loop accumulation and maintains genome stability. In wild-type cells, the SSB1 subunit of SOSS interacts with ssDNA to recruit SOSS–INTAC to promoters and drives condensate formation. RNA cleavage by SOSS–INTAC condensates permits RNA degradation by a combination of XRN2 and exosome activities, leading to premature promoter-proximal termination by RNA Pol II and R-loop attenuation. Cancer-associated mutations of SSB1 that impair condensation and disrupt SOSS–INTAC recruitment lead to the loss of premature promoter-proximal Pol II termination and aberrant accumulation of R-loops, with potential adverse consequences, such as DNA damage.

The SSB1 IDR contains three potential cancer mutation hotspots at Ser172, His173 and Arg206 (Extended Data Fig. 8p). To elucidate whether these affect SSB1 condensation, we generated two constructs SSB1(S172P/H173L) (Ser172 to proline and His173 to leucine) and SSB1(R206Q) (Arg206 to glutamine) based on cancer-derived mutations (Fig. 5a and Extended Data Fig. 8k). As confirmed by EMSA and co-IP, both mutant proteins retain ssDNA binding (Extended Data Fig. 8q) and INTAC association (Extended Data Fig. 8r). SSB1(S172P/H173L) forms droplets as readily as wild-type SSB1, whereas SSB1(R206Q) exhibits severely impaired condensate formation (Fig. 5b,c and Extended Data Fig. 8l,m). For all of the SSB1 mutant proteins tested, the condensation ability was not affected by the presence of INTAC (Fig. 5b,c and Extended Data Fig. 8l,m), corroborating that SSB1 drives the formation of SOSS–INTAC condensates.

Dynamic regulation of R-loops by SSB1

To investigate the dynamic change of R-loop levels after SSB1 depletion, we introduced the FKBP12F36V degradation tag N-terminally at the endogenous NABP2 locus in CTR cells44 (SSB1-dTAG cells; Fig. 5d). Addition of dTAG-13 (hereafter, dTAG) induces rapid depletion of endogenous SSB1 and induction of R-loop levels in SSB1-dTAG cells (Fig. 5e and Extended Data Fig. 9a–c). γH2AX signals are enhanced substantially after SSB1 depletion, recapitulating the dynamics in R-loop levels (Extended Data Fig. 9d,e). To determine the genomic features of R-loops, we performed CUT&Tag and quantified R-loop levels in SSB1-dTAG cells with dTAG treatment for 6 h and 24 h. Consistent with R-loops facilitating SOSS–INTAC recruitment, SOSS–INTAC-occupied promoters show higher R-loop levels (Fig. 5f). SSB1 degradation induces a pervasive accumulation of R-loops at SOSS–INTAC-bound promoters (Fig. 5g and Extended Data Fig. 9f), as also seen at example genes (Fig. 5h and Extended Data Fig. 9g). RNase H1 treatment eliminates the R-loop CUT&Tag signal (Fig. 5g,h and Extended Data Fig. 9f,g), confirming its specificity. Accumulation of R-loops was verified by R-loop CUT&Tag–qPCR at example genes (Fig. 5i,j), showing consistency with R-loop CUT&Tag–seq.

SSB1 condensation suppresses R-loops

To examine whether SSB1 condensate formation contributes to R-loop regulation, we conducted rescue experiments with wild-type or mutant SSB1 in SSB1-dTAG cells (Extended Data Fig. 9h). Consistent with in vitro results (Fig. 5b,c), punctum formation was abolished with the SSB1(ΔIDR) and SSB1(RY) mutants, and severely impaired with the SSB1(SI) and SSB1(R206Q) mutants in dTAG-treated cells (Extended Data Fig. 9i,j). Testing all of the mutant constructs described above, we found that SSB1(RY) and SSB1(SI) did not fully rescue R-loop levels compared with wild-type SSB1 (Fig. 5k). The cancer-derived mutant SSB1(S172P/H173L) with LLPS ability, but not droplet-impaired SSB1(R206Q) (Extended Data Fig. 9i–k), restricted R-loops to basal levels, as shown at example SOSS–INTAC targets (Fig. 5k). Immunofluorescence analysis of R-loop and γH2AX signals confirmed that SSB1(S172P/H173L), but not SSB1(R206Q), can attenuate cellular R-loop levels and maintain genome stability (Extended Data Fig. 9l–o).

The relationship between R-loop levels and SSB1 mutant status and pausing was revealed by SSB1-depletion-induced Pol II changes being fully rescued by the expression of wild-type SSB1, SSB1(HY) and SSB1(S172P/H173L), but not by the SSB1 ΔIDR, SI, RY or R206Q mutants that have an impaired condensation ability (Extended Data Fig. 10a). Increased pausing index, the ratio of Pol II occupancy at promoters to gene bodies, was observed at longer genes in dTAG-treated cells, and this was reversed by ectopic expression of wild-type SSB1, SSB1(HY) and SSB1(S172P/H173L), but not by ectopic expression of the SSB1 ΔIDR, SI, RY or R206Q mutants (Extended Data Fig. 10b).

To confirm the condensation ability of SSB1 for suppressing R-loop levels, we replaced its IDR with unrelated IDRs capable of forming liquid-like condensates. The chimeric proteins comprise the SSB1 N terminus and the C-terminal IDRs from TAF15, EWS and YTHDF145,46,47. We induced their expression in SSB1-dTAG cells and assayed the R-loop levels (Extended Data Fig. 10c). Notably, all chimeras suppressed R-loop levels, with the IDRs of TAF15 and EWS showing the greatest R-loop-restraining activity (Fig. 5l). These results establish a causal relationship between SSB1 condensation and the attenuation of R-loop levels at SOSS–INTAC targets.

Discussion

Here we identified a stable complex comprising the genome stability regulator SOSS and the transcription regulator INTAC. SOSS–INTAC targets active promoter and enhancer regions, relying in part on SSB1 recognition of ssDNA in the context of R-loops. SOSS–INTAC restrains aberrant accumulation of paused Pol II and prevents excessive chromatin accessibility to limit transcription-associated R-loops and maintain genome stability. SOSS–INTAC condensate formation in cells requires the SSB1 IDR, with residues mediating SOSS–INTAC condensate formation contributing to the suppression of R-loop accumulation to promote transcriptional regulation and genome stability (Fig. 5m).

Given the importance of transcription–replication conflicts for genome stability, efforts devoted to identifying transcriptional regulators involved in this process have identified known transcription initiation and elongation factors, but not transcriptional pausing regulators, despite paused polymerases being a major barrier to replication progression and contributing to genome instability2. Recent studies using rapid disruption of the endonuclease activity of INTAC have revealed pervasive roles of this activity in terminating paused Pol II26,27,48. Thus, the identification in this study of SOSS–INTAC connecting a general regulator of Pol II pausing with genome stability maintenance provides a basis for future investigations of pausing regulation in other contexts beyond transcription, such as replication and DNA damage and repair.

The N terminus of SSB1 recognizes ssDNA, whereas the conserved C-terminal IDR drives liquid-like condensate formation of SOSS–INTAC. We propose that condensation elevates the local concentration of SOSS–INTAC catalytic activity to promote promoter-proximal termination of transcription. Dysregulation of SSB1 is linked to cancer and developmental defects34,35,49,50. Cancer-derived mutations in SSB1 disrupting SOSS–INTAC condensation compromise its role in regulating R-loops and genome stability, which could potentially contribute to oncogenic programs. However, it is important to note that the IDR of SSB1 could possess condensation-independent functions, such that mutations disrupting condensation may also introduce additional impacts yet to be identified. Thus, future studies are warranted to systematically investigate the biophysical properties of SOSS–INTAC and their contributions to transcription, R-loop regulation and genome stability, and the degree to which the condensation ability of SOSS–INTAC contributes to these processes.

Methods

Reagents, materials and cell culture

Detailed information for reagents and materials, including antibodies and cell lines, used in this study is provided in Supplementary Table 1. Human DLD-1 cells were grown in McCoy’s 5A medium (BasalMedia) supplemented with 10% fetal bovine serum (FBS, Yeasen), 1× penicillin–streptomycin (Gibco). HEK293T cells and mouse embryonic fibroblasts (MEFs) were cultured with Dulbecco’s modified Eagle medium (DMEM, BasalMedia) supplemented with 10% FBS and 1× penicillin–streptomycin. HEK Expi293 cells were grown in suspension in serum-free medium. All cells were cultured at 37 °C and 5% CO2 and were negative for mycoplasma contamination.

Genome editing for CRISPR–Cas9 knockout and dTAG endogenous knock-in

NABP1-null single knockout cells (CTR) were generated using the CRISPR–Cas9 system from DLD-1 parental cells. In brief, the sgRNA targeting genomic regions of NABP1 were designed using CHOPCHOP (http://chopchop.cbu.uib.no), cloned into PX458 vector and then mixed with 1 × 106 DLD-1 cells followed by electroporation (Neon). The pool of transfected cells was allowed to recover for 2 days before fluorescence-activated cell sorting of GFP-positive cells. Cells were seeded into 96-well plates by limited dilution at a density of one cell per well. After culturing for 10–14 days, cell clones were picked followed by clonal expansion. Western blotting of SSB2 was used to screen knockout clones. All oligonucleotide information for cloning and qPCR is included in Supplementary Table 2.

NABP2/NABP1 DKO cells were generated by additionally deleting NABP2 in pooled NABP1-null (CTR) cells. sgRNAs targeting NABP2 exon 1 were cloned into lentiCRISPR v2 vector for lentivirus packaging. CTR cells were infected with lentivirus containing NABP2 sgRNAs supplemented with 10 μg ml−1 polybrene (Yeason) for 24 h. The infected cells were selected with 2 μg ml−1 puromycin (Meilunbio) for an extra 48 h. The cells were then switched into growth medium without antibiotics and grown for an additional 24–36 h before being collected for further analysis.

The clones for the dTAG assays were performed according to previously described criteria44. CTR cells were used as parental cells to generate SSB1-dTAG cells. For endogenous knock-in of dTAG cassettes, CTR cells were seeded to 1 × 106 cells per well of the six-well plates the day before transfection to ensure exponential growth. The next day, cells were transfected with PITCh plasmids containing the sgRNAs targeting and cutting the genomic region of NABP2 (PX459-sgSSB1), the dTAG repair template plasmids (pCRISPR-PITChv2-SSB1) as microhomology, and general sgRNAs (sg-PITCh) targeting the upstream of the 5′ and downstream of the 3′ ends of the microhomology region by electroporation. The cell suspension was immediately carefully transferred to 2 ml of pre-equilibrated, warm antibiotic-free DMEM in six-well plates. The cells were allowed to recover for 5 days before starting antibiotic selection of the pools in 10 ml DMEM in 10 cm dishes. Recovered cells were expanded to several 10 cm dishes by limited dilution and cultured with DMEM supplemented with 1 μg ml−1 puromycin. After 10–14 days of selection, the surviving clones were picked and cultured in 96-well plates without antibiotics for 5–7 days. Positive clones were screened by PCR analysis of the integration site followed by verifying the protein degradation efficiency using western blotting. One working clone and up to two backup clones were selected and retained for further experiments.

RNA interference, the generation of stable cell lines and gene-rescue experiments

To generate lentivirus for gene knockdown assays, HEK293T cells were co-transfected with shRNAs targeting genes of interest (or non-targeting shRNA as the control), psPAX2 and pMD2.G with a ratio of 3:2:1 in Opti-MEM medium using the polycation polyethylenimine (PEI) (Sigma-Aldrich) transfection reagent. The culture supernatant containing virus particles was collected at 48 h after transfection and filtered using a 0.45 μm filter. The cells were infected with lentivirus in the presence of 8 mg ml−1 Polybrene (Sigma-Aldrich) for 24 h. The infected cells were treated with 2 mg ml−1 puromycin for an extra 48 h before collection. The knockdown efficiency was examined qPCR with reverse transcription and western blotting.

To generate stable cell lines with the inducible overexpression of RNase H1, DLD-1 cells were initially infected with lentivirus expressing pLVX-Tet3G-rtTA and selected with G418 (Meilunbio, 500 μg ml−1) for 2 weeks. These cells were then infected with virus expressing Flag–RNASEH1 cloned into pLVX-Tet-On vector and cultured in the presence of blasticidin (10 μg ml−1) for an additional 2 weeks. The induction of Flag–RNASEH1 was determined by western blotting using cellular extracts from cells treated with DMSO or DOX for 24 h.

For the RNAi rescue experiments, the cells were simultaneously transduced with shRNAs targeting genes of interest (or non-target shRNAs as the control) and vectors expressing the cDNAs of corresponding genes (or empty vector as the control). At 24 h after infection, antibiotics were administered to select the cells stably expressing the resistance genes from the shRNA and overexpressing vectors for additional 2 days before further analysis. For rescue experiments in SSB1-dTAG cells, the cells were first transduced with vectors expressing wild-type or mutant NABP2 (or empty vector as the control). At 24 h after infection, the cells were cultured under the appropriate antibiotics for an additional 2 days. The cells were then treated dTAG-13 for 12 h before further analysis. Detailed information of shRNAs, qPCR primers and cDNAs used in this study is provided in Supplementary Table 2.

Nuclear extracts and density-gradient sedimentation

HEK Expi293 cells were collected by centrifugation and washed twice with 5 ml of ice-cold phosphate-buffered saline (PBS) and once with 2 ml of ice-cold buffer A (10 mM HEPES pH 7.4, 5 mM MgCl2, 250 mM sucrose, 0.5 mM dithiothreitol (DTT), 1× protease inhibitor). The cell pellets were resuspended with 2 ml of ice-cold buffer A supplemented with 0.1% NP40 and incubated on ice for 15 min followed by centrifugation for 5 min at 4 °C and 1,000g. The nucleus fraction was collected by resuspending the pellet with buffer A (twice the volume of the original cell pellet) and centrifugation. The nuclei were next suspended with 0.75 ml of buffer B (20 mM HEPES pH 7.4, 1.5 mM MgCl2, 20% glycerol, 0.5 mM EDTA, 0.5 mM DTT, 0.42 M NaCl, 1× protease inhibitor) and incubated for 30 min rotation at 4 °C. Finally, the mixture was centrifuged in the Beckman SW40 Ti rotor at 40,000 rpm for 90 min at 4 °C, and the supernatants were saved as the nuclear extract for further density-gradient sedimentation.

The HEK Expis293 nuclear extracts or purified INTAC proteins were layered on top of 4 ml of an 8–40% (v/v) glycerol gradient in buffer containing 20 mM HEPES pH 7.4, 200 mM NaCl, 0.05% CHAPS, 2 mM DTT and centrifuged at 34,000 rpm for 16 h. The samples were collected manually from the top of the gradient with each 200 μl as a fraction and analysed by western blotting.

Co-IP assays

For co-IP assays, DLD-1 cells were collected by scraping followed by washing twice with ice-cold PBS. The cell pellet was suspended with 900 μl of ice-cold lysis buffer (20 mM Tis-HCl pH 8.0, 150 mM NaCl, 1 mM EDTA, 0.5% NP40, 10% glycerol, 1× protease inhibitor) and rotated at 4 °C for 1 h. The lysate was cleared by centrifugation for 20 min at 4 °C and 20,000g. The supernatant was incubated with 2–5 μg of antibody for each IP reaction (including IgG as negative control) followed by 9.5 h of rotation at 4 °C. Protein A/G magnetic beads (Smart Lifesciences, blocked with 1 mg ml−1 BSA for 1 h) were added to the samples and the mixture was rotated for 3 h at 4 °C. After incubation, the samples containing the beads were collected using a magnetic rack and the beads were washed four times with lysis buffer. Finally, the samples were collected by adding 100 μl of 1× SDS loading buffer followed by western blotting or mass spectrometry analysis.

Protein expression and purification

Expression and purification of the INTAC protein complex was performed as described previously4. In brief, the full-length INTS1 to INTS14 open reading frames were separately cloned into a modified pCAG vector and INTS2, INTS3, INTS4 and INTS10 were tagged with N-terminal Flag–4×protein A. Plasmids were cotransfected into HEK Expi293 cells using PEI (Polysciences) to a final concentration of 3 mg l−1. After being cultured at 37 °C for 72 h, cells were collected for lysis and purification. Cell pellets from 16 l of HEK Expi293 cells were resuspended and lysed in lysis buffer containing 50 mM HEPES pH 7.4, 200 mM NaCl, 0.2% CHAPS, 5 mM MgCl2, 5 mM adenosine triphosphate (ATP), 10% glycerol, 2 mM DTT, 1 mM phenylmethylsulfonyl fluoride (PMSF), 1 mg ml−1 aprotinin, 1 mg ml−1 pepstatin and 1 mg ml−1 leupeptin for 30 min and cleared by centrifugation for 30 min at 16,000 rpm to collect the supernatant. After incubating with immunoglobulin G (IgG) resins for overnight, the mixtures were washed with buffer containing 50 mM HEPES pH 7.4, 200 mM NaCl, 0.1% CHAPS, 10% glycerol and 2 mM DTT followed by on-column cleavage for 4 h. The immobilized proteins were then eluted out and concentrated for further purification by density-gradient sedimentation. The concentrated proteins were layered on top of a 4 ml 8–40% (v/v) glycerol gradient in buffer containing 20 mM HEPES pH 7.4, 200 mM NaCl, 0.05% CHAPS, 2 mM DTT and centrifuged at 34,000 rpm for 16 h. The fractions were collected manually from the top of the gradient for each 200 μl and analysed using a 4–12% Bis-Tris gel followed by Coomassie blue staining. Peak fractions corresponding to the INTAC complex were pooled and concentrated to 1 to 2 mg ml−1 accompanied with the removal of glycerol.

For proteins used for the in vitro droplet assay, plasmids encoding proteins tagged with GFP–Strep were transformed and expressed in E. coli BL21 (DE3) cells after induction overnight with 0.25 mM IPTG at 16 °C. The cells were collected by centrifugation at 6,200g for 25 min and then resuspended in 20 ml lysis buffer containing 50 mM Tris-HCl pH 7.5, 500 mM NaCl, 1 mM EDTA, 20 mM BME and 1 mM PMSF and stored at −80 °C for further protein purification.

All of the purification steps were performed at 4 °C to prevent protein degradation. After two rounds of freeze and thaw, the suspensions were lysed by sonication and centrifuged at 11,500 rpm for 1 h. The soluble fractions containing the GFP–Strep fusion proteins were loaded onto the Streptactin Beads 4FF (Smart Lifesciences) for purification. The eluted proteins were then dialysed overnight at 4 °C in 1 l dialysis buffer containing 10 mM Tris-HCl pH 7.5, 150 mM NaCl, 1 mM PMSF and 1 mM BME, and concentrated using Amicon Ultra Centrifugal Filters (Millipore). The protein concentration was measured using the Bradford Protein Quantification Kit (Vazyme) and then flash-frozen in liquid nitrogen and stored at −80 °C.

GST pull-down assay

GST or GST–SSB1 immobilized on the glutathione-Sepharose beads were preblocked with 1% BSA and then incubated with recombinant INIP or INTAC proteins overnight at 4 °C. The next day, the beads were washed extensively with wash buffer containing 50 mM Tris-HCl pH 7.5, 100 mM NaCl, 1 mM EDTA and 0.05% NP-40 and then directly boiled in 40 μl SDS–PAGE sample-loading buffer. The samples were analysed by Coomassie Blue staining and western blotting.

EMSA

The purified SSB1 and INTAC alone or mixed as indicated were incubated with 100 nM Cy3-labelled ssDNA, dsDNA or ssRNA on ice for 30 min in binding buffer containing 20 mM Tris-HCl pH 7.5, 50 mM NaCl, 5 mM MgCl2, 0.2 mM EDTA and 1 mM DTT. The DNA–protein complexes were loaded onto a 6% native polyacrylamide gel in 0.5× TBE buffer and run for 30 min at 150 V in a cold room. After electrophoresis, the gels were scanned using the RGB channel of an Azure C400 instrument.

ChIP–Rx and ChIP–qPCR

The ChIP–Rx experiments were performed as described previously53. In brief, for each IP, 1 × 107 cells were cross-linked with 1% formaldehyde at room temperature for 10 min and consequently quenched with 125 mM glycine for 5 min at room temperature. Cells were scraped and centrifuged with 1,000g for 10 min. The cell pellets were washed twice with ice-cold PBS and resuspended in lysis buffer containing 50 mM HEPES pH 7.4, 150 mM NaCl, 2 mM EDTA, 0.1% Na-deoxycholate, 0.1% SDS, 1× protease inhibitor, 1× phosphatase inhibitor, followed by sonicating (Qsonica) to appropriate fragment (200–700 bp). After sonication, the lysate was centrifuged at maximal speed for 15 min to collect the supernatant and mixed with 20% of lysate from MEFs processed identically as spike-in for normalization.

The chromatin samples were incubated with specific antibodies overnight at 4 °C. After incubation, the protein–DNA complex was immobilized on pre-blocked (BSA, 2 mg ml−1 for 2 h) magnetic protein A/G beads for 3 h at 4 °C. Immobilized, the bound fractions were washed three times with high-salt wash buffer (20 mM HEPES pH 7.4, 500 mM NaCl, 1 mM EDTA, 1.0% NP40, 0.25% Na-deoxycholate, 1× protease inhibitor, 1× phosphatase inhibitor), twice with low-salt wash buffer (20 mM HEPES pH 7.4, 150 mM NaCl, 1 mM EDTA, 0.5% NP40, 0.1% Na-deoxycholate, 1× protease inhibitor, 1× phosphatase inhibitor) and once with Tris-EDTA (TE) buffer supplemented with 50 mM NaCl. Elution and re-cross-linking were performed in elution buffer (50 mM Tris-HCl pH 8.0, 10 mM EDTA, 1% SDS) supplemented with protease K at 65 °C for overnight. The DNA samples were purified using the phenol–chloroform DNA extraction method. The precipitated DNA sample was either analysed by qPCR or subjected to library preparation using the VAHTS Universal Plus DNA Library Prep Kit for Illumina (Vazyme). The library was then sequenced using the NovaSeq 6000 platform (Mingma Technologies).

PRO–seq

PRO–seq library preparation was performed as previously described54,55, and all of the procedures below were carried out on ice. In brief, the cells cultured in 15 cm dishes were collected by washing twice with 5 ml ice-cold PBS and scraping with 5 ml permeabilization buffer (10 mM Tris-HCl pH 8.0, 5% glycerol, 250 mM sucrose, 10 mM KCl, 5 mM MgCl2, 1 mM EGTA, 0.5 mM DTT, 0.1% NP40, 0.05% Tween-20, 1× protease inhibitors (Roche), 4 U ml−1 RNase inhibitor (SUPERaseIN)), followed by incubating on ice for up to 5 min. Permeabilized cells were collected by centrifugation (800g, 4 min, 4 °C) and washed twice with ice-cold cell wash buffer (10 mM Tris-HCl pH 8.0, 5% glycerol, 10 mM KCl, 5 mM MgCl2, 0.5 mM DTT, 4 U ml−1 RNase inhibitor). Washed nuclei were resuspended in freezing buffer (50 mM Tris-HCl pH 8.0, 40% glycerol, 5 mM MgCl2, 1 mM EDTA, 0.5 mM DTT, 4 U ml−1 RNase inhibitor) at a density of 3 × 106 cells per 50 µl and immediately frozen in liquid nitrogen. Cells were stored in −80 °C until use.

A total of 3 million permeabilized cells (mixed with 3 × 105 MEFs as a spike-in) were added to the same volume of 2× nuclear run-on mixture (10 mM Tris-HCl pH 8.0, 300 mM KCl, 1% Sarkosyl (Sigma-Aldrich), 5 mM MgCl2, 1 mM DTT, 40 mM Biotin-11-C/GTP (Perkin Elmer), 0.8 U ml−1 RNase inhibitor) and incubated at 30 °C for 5 min. Nascent RNA was extracted using TRIzol LS (Ambion) followed by ethanol precipitation. Extracted RNA was fragmented by base hydrolysis in 0.25 N NaOH for 10 min on ice and immediately neutralized with 1× volume of 1 M Tris-HCl pH 6.8, followed by passing through a calibrated RNase-free P30 column (Bio-Rad, 732-6251). Fragmented RNA was dissolved in H2O and incubated with 10 pmol of reverse 3′ RNA adapter and treated with T4 RNA ligase (NEB) for 1 h at 25 °C. After 3′ RNA ligation, fragmented nascent RNA was bound to 25 µl of prewashed Streptavidin Magnetic Beads (NEB) in binding buffer (10 mM Tris-HCl pH 7.4, 300 mM NaCl, 0.1% Triton X-100, 1 mM EDTA) for 20 min at 25 °C. The bound beads were washed once with high-salt wash buffer (50 mM Tris-HCl pH 7.4, 2 M NaCl, 0.5% Triton X-100, 1 mM EDTA) and once with low-salt wash buffer (5 mM Tris-HCl pH 7.4, 0.1% Triton X-100, 1 mM EDTA). The on-bead reaction of RNA 5′ hydroxyl repair was performed in PNK mix (1× PNK buffer, 1 mM ATP, 10 U PNK (NEB)) at 37 °C for 30 min. For nascent RNA 5′ de-capping, the RNA products were incubated with RppH mix (1× ThermoPol buffer, 5 U RppH (NEB)) for 1 h at 37 °C. The RNA 5′ adapter ligation was performed using the ligation mix (1× T4 RNA ligase buffer, 1 mM ATP, 15% PEG8000, 10 U T4 RNA ligase) at 25 °C for 1 h. Adapter-ligated nascent RNA was enriched with biotin labelled products by another round of Streptavidin bead binding, once with high-salt wash buffer and once with low-salt wash buffer, followed by TRIzol extraction of the RNA product. The air-dried RNA pellet was resuspended in RT resuspension mix (3 μM RP1, 0.74 mM dNTP mix) and denatured at 65 °C for 5 min and snap-cooled on ice, followed by the addition of 6.5 µl of RT master mix (3× RT buffer, 15.4 mM DTT, 10 U RNase inhibitor) to each sample. Reverse transcription was performed using the 200 U superscript III enzyme (Invitrogen). The reverse-transcription products immediately underwent PreCR treatment, test amplification and full-scale library amplification using the Q5 DNA polymerase (NEB). The libraries were then sequenced using the NovaSeq 6000 platform (Mingma Technologies).

R-loop CUT&Tag

R-loop CUT&Tag was optimized according to a previously published protocol8,56. DLD-1 cells were collected by Accutase (Thermo Fisher Scientific) to avoid overdigestion. For a single R-loop CUT&Tag, half a million cells were typically used to obtain sufficient DNA extraction for library construction. The cells were centrifuged (600g, 3 min) at room temperature, washed twice with 800 μl of wash buffer (20 mM HEPES pH 7.5, 150 mM NaCl, 0.5 mM spermidine, 1× protease inhibitor) and finally resuspended with 100 μl of wash buffer in low-retention PCR tubes. The concanavalin-A-coated magnetic beads (Smart-Lifesciences) were activated in advance and resuspended with the same volume of the binding buffer (20 mM HEPES pH 7.5, 10 mM KCl, 1 mM CaCl2, 1 mM MnCl2). A total of 10 μl of activated concanavalin A beads was added to 5 × 105 cells with incubation for 10 min under gentle rotation. The bead-bound cells were magnetized to remove the liquid with a pipettor and resuspended in 50 μl of antibody buffer (20 mM HEPES pH 7.5, 150 mM NaCl, 0.5 mM spermidine, 1× protease inhibitor, 0.05% digitonin, 0.01% NP-40, 2 mM EDTA). Next, 1 μg of S9.6 (Active Motif) was added to combine the DNA–RNA hybrid by rotating at 4 °C overnight. A total of 10 μg of RNase H1 (Thermo Fisher Scientific) was added with S9.6 to cleave the DNA–RNA hybrid as a negative control. For the IgG control, mouse IgG was used instead. After successive incubation with rabbit anti-mouse IgG (Solarbio, 1:100 dilution) and mouse anti-rabbit IgG (Solarbio, 1:100 dilution) in 100 μl of antibody buffer for 1 h at room temperature, the bead-bound cells were washed three times with dig-wash buffer (antibody buffer without 2 mM EDTA) to remove the unbound antibody.

The pAG-Tn5 adapter complex was mixed in dig-300 buffer (20 mM HEPES pH 7.5, 300 mM NaCl, 0.5 mM spermidine, 1× protease inhibitors, 0.01% digitonin, 0.01% NP-40) to a final concentration of 0.2 μM. The bead-bound cells were resuspended in 100 μl of pAG-Tn5 mix and incubated at room temperature for 1 h followed by removing the supernatant. After adequate washing, the tagmentation reaction was performed in 40 μl of tagmentation buffer (10 mM TAPS-KOH pH 8.3, 10 mM MgCl2, 1% DMF) at 37 °C for 1 h. Next, 1.5 μl of 0.5 M EDTA, 0.5 μl of 10% SDS and 1 μl of 20 mg ml−1 protease K were added to stop the reaction. After incubation for 1 h at 55 °C, DNA purification was performed using VAHTS DNA Clean Beads (Vazyme), and eluted in 10 μl of 0.1% Tween-20. The eluent was mixed with 10 U of Bst 2.0 WarmStart DNA polymerase (NEB) and 1 × Q5 polymerase reaction buffer (NEB) in a 20 μl reaction system. The reaction was completed at 65 °C for 30 min and then at 80 °C for 20 min to inactivate the Bst 2.0 WarmStart DNA polymerase. The purified DNA was amplified by Q5 high-fidelity DNA polymerase (NEB) with a universal i5 primer and a uniquely barcoded i7 primer. The exact PCR cycles were estimated by qPCR before amplification. PCR amplification with 13–14 cycles yielded enough quantity of library for sequencing. After library size-selection with 0.56–0.85 VAHTS DNA Clean Beads, with library sizes ranging from 200 to 700 bp, the products were next either analysed using qPCR or sequenced on the NovaSeq 6000 platform (Mingma Technologies).

KAS–seq

KAS–seq was performed as described previously with minor modifications57. A total of 1 million DLD-1 cells was labelled with 2.5 mM N3-kethoxal for 10 min at 37 °C. The gDNA was isolated using the PureLink genomic DNA mini kit (Thermo Fisher Scientific). The extracted gDNA was biotinylated with 1 mM DBCO-PEG4-biotin (Sigma-Aldrich) through a click cycloaddition reaction. After sonication, the biotinylated gDNA was fragmented into sizes of ~300 bp before mixing the fragments with 10 μl of Dynabeads Myone Streptavidin C1 beads (Thermo Fisher Scientific). After incubation and brief washes, the beads were resuspended in nuclease-free water at 95 °C for 15 min to facilitate the dissolution of N3-kethoxal-modified gDNA fragments. Next, the DNA fragments were repaired with the phi29 DNA polymerase (NEB) and purified using VAHTS DNA Clean Beads. Library preparation was performed using the VAHTS Universal Plus DNA Library Prep Kit for Illumina (Vazymes). The library was then sequenced on the NovaSeq 6000 platform.

Immunofluorescence analysis

DLD-1 cells were seeded on coverslips at least 24 h before the experiment. After washing with PBS, cells were incubated with 4% paraformaldehyde (PFA) for 10 min. After washing three times with PBS, cells were permeabilized with 0.5% Triton X-100 in PBS for 10 min and blocked with 4% BSA in PBS for 30 min. Primary antibodies were dissolved in ice-cold 4% BSA with the dilution ratio recommended by producers, and the cells were then immersed in the primary antibody buffer for overnight incubation at 4 °C. After three washes in PBS, cells were incubated with the appropriate secondary antibodies for 1 h. Next, cells were mounted in ProLong Gold Antifade Mountant with DAPI (Invitrogen) before imaging. For rapid R-loop immunofluorescence, GFP–RNASEH1 was used as the primary sensor, and the protein was purified as previously described39. Cells were incubated with 2 μg of GFP–dRNASEH1 in 4% BSA overnight at 4 °C. After washing three times with PBS, cells were directly mounted before imaging. The presented images were obtained using the Leica TCS SP8 laser-scanning confocal microscopy. Unless otherwise indicated, all procedures were performed at room temperature.

γH2AX FACS assay

Single-cell suspensions of CTR and DKO cells were incubated with 70% ethanol at −20 °C for 2 h. After two washes with PBS, cells were fixed with 4% PFA for 15 min. Next, cells were permeabilized with 0.25% Triton X-100 in PBS for 15 min and blocked with 2% BSA in PBS for 30 min. For intracellular γH2AX staining, 1 × 106 cells were incubated with 1 µg γH2AX antibodies (Thermo Fisher Scientific) overnight at 4 °C, followed by incubation with Alexa-Fluor-488-conjugated secondary antibodies for 30 min at room temperature. After washing three times with PBS, cells were treated with propidium iodide staining buffer (Sangon Biotech) according to the manufacturer’s protocol. Data were acquired using FACSDiva Flow Cytometry Software (BD Biosciences) and analysed using FlowJo (TreeStar).

OptoDroplet assay

Hela cells expressing SSB1–mCherry–CRY2 or empty mCherry–CRY2 vector were imaged using two laser wavelengths (488 nm for mCry2 activation and 560 nm for mCherry imaging). To examine droplet formation, mCherry-positive cells were subjected to repetitive on/off cycles, whereby they were first exposed under a 488 nm laser for 1 s, and then an image was captured for the mCherry signal.

DNA fibre assay

DLD-1 cells were sequentially labelled with 10 mM IdU (Sigma-Aldrich) and 100 mM CldU (Sigma-Aldrich) for 30 min each. After labelling, cells were placed on ice immediately to stop DNA replication and subsequently centrifuged (300g, 5 min at 4 °C). After washing three times in PBS, 1 × 106 cells were placed onto a microscope slide and incubated with the spreading buffer (200 mM Tris-HCl pH 7.5, 0.5% SDS and 50 mM EDTA) for 1 min. The slides were tilted 15° to extend the DNA fibres. After fixation using methanol/acetic acid (3:1), the DNA was denatured using 2.5 M HCl and blocked with 1% BSA for 2 h before staining with primary (rat anti-BrdU for CldU and mouse anti-IdU) and secondary antibodies conjugated with Alexa Fluor 488 or 546. Images were acquired using a confocal microscope (Lecia TCS SP8) and analysed using the ZEN 2.3 SP1 (ZEISS) software. Statistical analysis was performed using Prism 8 (GraphPad software).

Analyses for protein disorder and amino acid sequence features

Disordered regions were identified using IUPred and IUPred3 (http://iupred.elte.hu/). Amino acid composition was analysed using Composition Profiler (http://www.cprofiler.org/cgi-bin/profiler.cgi). The net charge per residue was analysed using CIDER 40 (http://pappulab.wustl.edu/CIDER/analysis/).

In vitro droplet assay

Recombinant proteins were diluted to the indicated salt concentrations with buffer containing 10 mM Tris-HCl pH 7.5 to induce phase separation. A total of 8 μl of phase-separation solution was loaded onto a glass slide, covered with a coverslip and images were acquired using the Zeiss LSM880 microscope. For identifying droplet fusion events, glass slides loaded with protein solutions were inverted on the microscope lens, and images were acquired at 1 s intervals and further analysed using ImageJ. For FRAP assays, droplets containing fluorescent proteins were bleached with the desired laser intensity and 100 post-bleach frames were recorded with a time interval of 1 s. The fluorescence intensity at bleached region was corrected with an unbleached region and normalized to the pre-bleaching fluorescence intensity. For the co-phase separation assay of wild-type or mutant GFP–SSB1 with INTAC, INTAC was labelled using the Alexa Fluor 568 protein labelling kit (Thermo Fisher scientific) according to manufacturer’s protocols. The labelled INTAC proteins were diluted with unlabelled ones to a desired concentration and then mixed with GFP fusion proteins to induce phase separation.

Quantification and statistical analysis

ChIP–Rx analysis

Raw ChIP–Rx reads were trimmed using Trim Galore v.0.6.6 (Babraham Institute) in paired-end mode. Trimmed reads were aligned to human hg19 and mouse mm10 genome assemblies using Bowtie (v.2.4.4)58 with the default parameters. All unmapped reads, low mapping quality reads (MAPQ < 30) and PCR duplicates were removed using SAMtools (v.1.12)59 and the MarkDuplicates function of Picard Tools v.2.25.5 (Broad Institute). Peaks were called using MACS2 (v.2.2.7.1)60 with the option ‘nomodel’ and peak annotation was performed with R package ChIPseeker (v.1.28.3)61.

For quantitative comparison, read counts were normalized to the corresponding total reads aligned to spike-in genome in previous ChIP–Rx studies53,62. However, the number of reads mapped to spike-in genome could be influenced by the actual mixing ratio of chromatin samples before IP, which should also be scaled. To better compare the ChIP–Rx datasets, we derived a new scale factor α for each IP experiment as described in Supplementary Note 1.

Normalized bigwig files were generated with the bamCoverage function from deepTools (v.3.5.1)63 using scale factors calculated according to Supplementary Note 1. Reads mapping to the ENCODE blacklist regions64 were removed using bedTools (v.2.30.0)65. Heat maps (10 bp per bin) and metagene plots were generated using the computeMatrix function followed by the plotHeatmap and plotProfile functions of deepTools (v.3.5.1)63. Spike-in normalized occupancy at per promoter (1 kb upstream and 1 kb downstream of the TSS) was calculated using getCountsByRegions function from R package BRGenomics66, which can get the sum of the signal in normalized bigwig that overlaps defined regions. Pearson correlations of ChIP–Rx samples were calculated using deepTools (v.3.5.1)63 (multiBamSummary followed with plotCorrelation) with the read counts split into 10 kb bins across the genome. The pausing index was defined as the ratio of Pol II occupancy at promoters (from 100 bp upstream to 300 bp downstream of the TSS) to Pol II occupancy over gene bodies (from 300 bp to 2 kb downstream of the TSS). Pol II occupancy was also calculated using getCountsByRegions function from R package BRGenomics.

KAS–seq analysis

Raw reads of KAS–seq were trimmed as described for ChIP–Rx above. Trimmed reads were aligned to the human hg19 and mouse mm10 genomes using Bowtie (v.2.4.4)58 with the option ‘-X 1000’. Removal of low mapping quality reads and duplicated reads, peak calling and annotation were performed in the same manner as described for ChIP–Rx. The scale factor for normalizing ssDNA signals was calculated as 1 over the number of reads mapping to spike-in genome (mm10) per million as previously described. Normalized bigwig files were generated using the bamCoverage function from deepTools (v.3.5.1)63 and reads mapping to the ENCODE blacklist regions64 were removed using bedTools (v.2.30.0)65.

PRO–seq analysis

Raw PRO–seq reads were processed as described for ChIP–Rx above, with reads longer than 15 bp retained. Ribosomal RNA reads were removed using Bowtie (v.2.4.4)58 with ‘--un-conc-gz’. The remaining reads were aligned to human hg19 and mouse mm10 genome assemblies using Bowtie (v.2.4.4)58 with the parameters ‘--local --very-sensitive-local --no-unal --no-mixed --no-discordant’. Removal of low mapping quality reads and duplicated reads and calculation of scale factor were performed in the same manner as described for KAS–seq. Single-base-pair resolution, normalized, stranded read coverage tracks were generated using the bamCoverage function of deepTools (v.3.5.1)63 with the parameters ‘--Offset 1 --samFlagInclude 82’ and ‘--Offset 1 --samFlagInclude 98’ for the forward and reverse strand, respectively. TSSs of sense and antisense transcription were determined using published PRO–Cap data of DLD-l cells and according to a previously published protocol67.

ATAC–seq analysis

After trimming the adapters and low-quality reads as described for ChIP–Rx above, the remaining reads were aligned to human hg19 using Bowtie (v.2.4.4)58 with the parameters ‘-N 1 -L 25 -X 2000 --no-mixed --no-discordant’. For spike-in normalization, the reads were also aligned to the E. coli genome by Bowtie (v.2.4.4)58 with the options ‘--end-to-end --very-sensitive --no-overlap --no-dovetail --no-mixed --no-discordant -I 10 -X 700’. Mitochondrial reads and PCR duplicates were then filtered using SAMtools (v.1.12)59 and Picard Tools (v.2.25.5; Broad Institute). Finally, the reads were shifted to compensate for the offset in tagmentation site relative to the Tn5 binding site using the alignmentSieve function of deepTools (v.3.5.1)63 with the ‘--ATACshift’ option. Read counts were adjusted to total reads aligned to E. coli genome using deepTools (v.3.5.1)63.

CUT&Tag analysis

Adapters and low-quality reads were trimmed as described for ChIP–Rx above and the resulting reads were aligned to human hg19 genome using Bowtie (v.2.4.4)58 with the default parameters. For quantitative comparison, the reads were also aligned to the E. coli genome using Bowtie (v.2.4.4)58 with the options ‘--end-to-end --very-sensitive --no-overlap --no-dovetail --no-mixed --no-discordant -I 10 -X 700’. Duplicated reads were removed with Picard Tools (v.2.25.5; Broad Institute) and the reads were shifted as described for ATAC–seq. Read counts adjusted to total reads were aligned to E. coli genome using deepTools (v.3.5.1)63.

Statistics and reproducibility

Wilcoxon rank-sum tests were used throughout this study unless otherwise specified. Unless otherwise indicated, each experiment was performed with three independent replicates.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.