A transcriptome-wide antitermination mechanism sustaining identity of embryonic stem cells

Eukaryotic gene expression relies on extensive crosstalk between transcription and RNA processing. Changes in this composite regulation network may provide an important means for shaping cell type-specific transcriptomes. Here we show that the RNA-associated protein Srrt/Ars2 sustains embryonic stem cell (ESC) identity by preventing premature termination of numerous transcripts at cryptic cleavage/polyadenylation sites in first introns. Srrt interacts with the nuclear cap-binding complex and facilitates recruitment of the spliceosome component U1 snRNP to cognate intronic positions. At least in some cases, U1 recruited in this manner inhibits downstream cleavage/polyadenylation events through a splicing-independent mechanism called telescripting. We further provide evidence that the naturally high expression of Srrt in ESCs offsets deleterious effects of retrotransposable sequences accumulating in its targets. Our work identifies Srrt as a molecular guardian of the pluripotent cell state.


Supplementary Figure 1. High expression of Srrt in mouse ESCs is required for maintenance of their undifferentiated status
(a) Immunoblot showing that Srrt protein is downregulated in mouse ESCs following a 96hour incubation in a medium lacking 2i inhibitors and LIF (i.e. compounds required to maintain ESC in an undifferentiated state 1 ), as compared to the full medium containing 2i and LIF. Expression levels of an Srrt-interacting partner, nuclear cap-binding protein 1 (Ncpb1/Cbc80), and a housekeeping protein, Gapdh, remain unchanged under these conditions.
(b) Quantifications of immunoblot data in (a) for Srrt (left) and Ncbp1 (right). Data were normalized to Gapdh expression levels, averaged from 3 experiments ±SD, and compared by a two-tailed t-test.
(c-e) ESCs treated with siSrrt or siCtrl for 24 hours were dissociated and re-plated at 1000 cells per well of a 6-well plate to examine the long-term effect of Srrt knockdown. Colonies formed 7 days post plating were stained for alkaline phosphatase (AP) and imaged. (c) Most of the colonies in the siCtrl-treated cultures were dome-shaped and strongly AP-positive, characteristic for undifferentiated ESCs (orange arrowheads). (d) In contrast, the siSrrt cultures were dominated by flat differentiated colonies expressing relatively little AP (cyan arrowheads). (e) Close-ups of the colonies marked by arrowheads in (d). Scale bars: (c, d) 1 mm; (e) 100 µm.
(f) Fisher's exact test confirming that the difference in the undifferentiated/differentiated colony ratio between the siCtrl and siSrrt samples in (c, d) is significant.
(g, h) Flow cytometry analyses showing that a 48-hour treatment with siSrrt results in a detectable decrease in the expression of the ESC-enriched surface markers SSEA1 and Pecam1 in comparison with the siCtrl-treated samples. Two biological replicates were analyzed in each case and the distributions were compared by a two-tailed Wilcoxon rank sum test. Box bounds, the first and the third quartiles; thick lines inside the boxes, the medians. Whiskers extend from the first and the third quartile to the lowest and highest data points or, if there are outliers, 1.5× of the interquartile range. Open circles, outliers.
Source data are provided as a Source Data file.

Supplementary Figure 2. Srrt knockdown in ESCs induces large-scale changes in the ESC transcriptome
(a) Venn diagrams showing significant overlaps between genes regulated by siSrrt and those changing their expression in mouse ESCs undergoing spontaneous differentiation 2 with FC≥1.5 and FDR<0.05. Only genes passing minimum expression cutoffs in both RNA-seq datasets are shown in this comparison.
(b) Heat maps showing that Srrt knockdown does not change the expression of many general and naïve pluripotency factors [3][4][5] . However, it clearly shifts the gene expression pattern towards a more differentiated state by downregulating some ESC-enriched markers (Nr0b1, Pecam1 and Zic2) and upregulating a subset of postimplantation and gastrulation markers and a number transcripts characteristic for terminally differentiated cells. Data shown are from replicated RNA-seq experiments with significantly regulated genes typeset in bold.
(c) RT-qPCR validation of siSrrt-upregulated somatic cell-specific markers selected from the corresponding heatmap in (b). These examples include Calb2 (calbindin 2/calretinin expressed in a subset of cortical interneurons and retinal neurons), Cbln1 (cerebellin 1, a protein enriched in postsynaptic structures of Purkinje cells and associated with depression), Col4a1 (a type IV collagen found in epithelial cell basement membranes), Ctgf (a growth factor secreted by vascular endothelial cells), Des (muscle-specific intermediate filament desmin), Flnc (filamin C, a muscle-specific actin cross-linker), Krt8 (a type II keratin expressed in epithelial cells), Rims1 (a regulator of synaptic vesicle exocytosis associated with cone-rod dystrophy 7 and retinitis pigmentosa), Scn3b (sodium voltage-gated channel beta subunit 3 expressed in neurons and muscle cells), and Shank1 (a scaffold protein required for synapse development and function, and associated with diabetic encephalopathy and autism spectrum disorders) (https://www.genecards.org/). The data are averaged from 3 experiments ±SD and compared by a two-tailed t-test. Expression levels in siCtrl-treated samples are set to 1.
(d) Volcano plot showing that many genes become either down-or upregulated in response to Srrt knockdown, but the number of downregulated genes noticeably exceeds the number of upregulated ones when using the most stringent cutoffs (FC≥2 and FDR<1E-50).
Source data are provided as a Source Data file. (c) Fisher's exact tests confirming the relationship between increased RIC in the first intron and gene downregulation. The plots show incidence of first introns with significantly increased RIC (FC≥1.5 and FDR<0.01) among all genes qualifying for the analysis and genes regulated ≥1.5-fold with FDR cutoffs ranging from 0.05 to 1E-50. Note that first introns with increased RIC are significantly enriched among downregulated genes and this effect becomes more prevalent as the cutoff stringency is increased. On the other hand, introns with increased RIC are depleted among upregulated genes. The lack of statistical significance for genes upregulated with FDR<1E-50 is due to the fact that relatively few upregulated genes pass this stringent cutoff (see Supplementary Fig. 2d).
(d) Metaplots showing that the increase in RNA-Seq coverage is visibly skewed towards the 5' end of first introns for genes characterized by increased RIC (FC≥1.5 and FDR<0.01) and reduced expression (FC≥1.5 and FDR<0.05) (red line), but not for other gene categories (yellow and green lines). Also note a prominent drop-off of the red line in the second exon consistent with possible termination of transcription in the first intron. (b) PAS hexamers AATAAA and ATTAAA precede siSrrt-activated CSs in first introns (fold upregulation ≥2, FDR<0.05) with frequency comparable to that observed for CSs in 3'UTRs of the same genes.

Supplementary
(c) Fisher's exact test for Fig. 2c showing that siSrrt-upregulated CSs are significantly overrepresented in first introns of downregulated genes.
(d) Scatter plot showing that siSrrt-induced activation of CSs in first introns strongly correlates with reduced activity of CSs in the corresponding 3'UTRs (lower right quadrant). Red dots, genes were relative CS efficiency changes in both first introns and 3'UTRs (FDR<0.05); gray dots, other genes. Numbers of significant data points in each quadrant are shown in red.

Supplementary Figure 5. Examples of genes downregulated by siSrrt through activation of CSs in first introns
(a) Complete plots for the 5'-and 3'-proximal close-ups in Fig. 2d showing read-per-million (rpm)-normalized RNA-Seq coverage in gray and rpm-normalized 3'RNA-Seq data in red. Note that activation of CSs in first introns coincides with a reduction in downstream RNA-Seq coverage and 3'RNA-Seq signal intensity in the 3'UTRs. RT-qPCR primers used to analyze intronic CS readthrough efficiency in Fig. 2e  (b) The three most potent siRNAs (siSrrt#1, siSrrt#2 and siSrrt#3) were assayed for their ability to antagonize iCS readthrough in the Ammecr1, Cdyl2 and Dcaf6 genes as explained in Supplementary Fig. 5c. (c) Gene expression changes brought about by individual Srrt-specific siRNAs are similar to those induced by the siSrrt mixture, with siSrrt#1 and siSrrt#2 generally showing the strongest performance.
(d) Immunoblots showing that even the best Srrt-specific siRNA, siSrrt#1, does not alter the expression levels of the pluripotency factors Pou5f1/Oct4, Sox2 and Nanog.
(e) Quantitation of the band intensities in (d) normalized to Gapdh and siCtrl. The RT-qPCR (a-c) and the immunoblot quantification data (e) were averaged from 3 experiments ±SD and compared by a two-tailed t-test.
Source data are provided as a Source Data file.

Supplementary Figure 7. Ammecr1 is an important Srrt target showing expected expression dynamics during neuronal differentiation of mouse ESCs
(a-c) Longitudinal changes in Srrt and Ammecr1 expression in previously published RNA-seq analysis mouse ESC differentiation time course 6 resolving the ESC, NSC and radial glial cell (RGC) stages, as well as 5 progressive stages of glutamatergic neuronal maturation (GN1-GN5). The overall expression of (a) Srrt and (b) Ammecr1 decrease during the course of differentiation. (c) Conversely, exon 1-normalized RNA-seq coverage for the region of the Ammecr1 first intron between the 5'ss and the Srrt-regulated iCS shows an increasing trend. Black circles, experimental data. Red dashed lines, trend curves fitted using (a, b) the decr or (c) the incr routines of the cgam R package 7 . The routines were selected based on the sign of the Kendall's rank correlation statistics (τ) indicated at the top of each graph along with the corresponding p-values. The R 2 goodness-of-fit statistics are shown in red.
(d) RNA-seq coverage plots for the 5'-proximal region of Ammecr1 showing that the relative abundance of iCS-terminated transcripts tends to increase during neuronal differentiation 6 .
(e, f) RT-qPCR data averaged from 3 experiments ±SD and compared by a two-tailed t-test suggest that siRNA knockdown of Ammecr1 (siAmmecr1) partially recapitulates gene expression effects induced by siSrrt.
Source data are provided as a Source Data file. (d) Mouse ESCs were transfected with siCtrl, siSrrt, an siRNA against Exosc3 (siExosc3) or a mixture of siRNAs against catalytic subunits of the exosome, Dis3 and Exosc10 (siDis3+siExosc10). Knockdown efficiencies of the corresponding targets were analyzed 48 hours later by RT-qPCR.

Supplementary Figure 8. The exosome complex does not play a major part in Srrtmediated readthrough of iCSs in mouse ESCs
(e) RT-qPCR analyses of gene expression effects of siSrrt, siExosc3 and siDis3+siExosc10. Note robust accumulation of 5'-proximal Ammecr1 transcripts and downregulation of the fulllength Ammecr1 mRNA in response to siSrrt but not exosome-specific siRNAs. Conversely, exosome-specific knockdowns lead to stronger upregulation of TSS-proximal upstream antisense transcripts (uaKlf3 and uaTcea1) compared to siSrrt. Data in (d, e) were averaged from 3 experiments ±SD and compared by a two-tailed t-test.
Source data are provided as a Source Data file.

Supplementary Figure 9. Dicer-dependent small RNAs do not play a major part in Srrtmediated readthrough of iCSs in mouse ESCs
(a) Gene expression changes induced by Srrt knockdown (siSrrt v siCtrl) do not correlate with those triggered by knockout of a key microRNA biogenesis factor, Dicer, in mouse ESCs 9 (Dicer1 KO v WT).
(b) Dicer1 KO does not generally alter the expression of the Srrt-dependent genes (first intron CS upregulation ≥2-fold, FDR<0.05 and gene upregulation ≥1.5-fold, FDR<0.05). Violin plot outlines, kernel density estimates of probability densities; open circles, the medians; bounds of the black boxes, the first and the third quartiles. Whiskers extend from the first and the third quartile to the lowest and highest data points or, if there are outliers, 1.5× of the interquartile range.
(c) RNA-seq coverage plots for Ammecr1 showing that Dicer1 KO fails to recapitulate the premature termination effect induced by siSrrt.

Supplementary Figure 10. Repression of iCSs by Srrt depends on its interaction with CBC
(a) Changes in gene expression induced by Ncbp1 knockdown in mouse ESCs correlate (Pearson's r=0.85, p=0) with gene expression effects of siSrrt. Red dots, genes regulated in response to both siSrrt and siNcbp1 (FC≥1.5 and FDR<0.05). Gray dots, the rest of the genes.
(b) Fisher's exact test for Fig. 4c showing that siNcbp1-upregulated CSs are significantly over-represented in first introns of downregulated genes.
(c) Scatter plot showing that siNcbp1-induced upregulation of CSs in first introns strongly correlates with downregulation of CSs in the corresponding 3'UTRs (lower right quadrant). Red dots, genes were relative CS efficiency changes in both first introns and 3'UTRs (FDR<0.05); gray dots, other genes. Numbers of significant data points in each quadrant in (a, c) are shown in red.
(d) Quantification of the Srrt and Ncbp1 immunoblots in Fig. 4d averaged from 3 experiments ±SD and compared by a two-tailed t-test.
(e) Co-immunoprecipitation experiment indicating that Srrt interacts with Ncbp1 in mouse ESCs in a nucleic acid-independent manner. Proteins were pulled down with or without benzonase using either Srrt-specific or non-immune antibodies and analyzed by immunoblotting with Srrt, Ncbp1 or Gapdh-specific antibodies. (b-d) U1 RAP-Seq metaplots showing expected bias of U1-binding sequences towards the 5' end of (b) all and (c) first introns containing Srrt-regulated iCSs in both the siCtrl and the siSrrt datasets. (d) Inspection of the iCS-adjacent region suggests that the U1 occupancy immediately upstream of the iCSs is somewhat higher than downstream of these sites in the siCtrl samples and that the siSrrt treatment reduces U1 binding in the iCS-proximal region.
(e) Input-normalized RAP-Seq coverage profile and Piranha clusters (U1-1 and U1-2) for the first intron of Ncbp2, a control gene not regulated by siSrrt. Primers used in the RT-qPCR validation experiment in (Fig. 5e) are shown at the bottom.
Source data are provided as a Source Data file.

Supplementary Figure 12. Srrt knockdown in ESCs has no detectable effect on U1 snRNP biogenesis
(a) RT-qPCR assays indicating that siSrrt has no major effect on the abundance of mature U1 snRNA or its 3'-extended precursors. We used primer pairs targeting two different genomic variants of U1.
(b) Northern blot confirming that Srrt knockdown has no effect on U1 snRNA expression. Three biological replicates (Exp1, Exp2 and Exp3) were analysed side by side. The U1 snRNA Northern signal and the positions of the 5.8S and 5S rRNAs deduced from the methylene blue-stained membrane are marked on the right. Methylene blue-stained 5.8S rRNA bands are also shown at the bottom to provide a lane loading control.
(c) Immunoblot analysis and (d) its quantification showing that siSrrt does not alter the abundance of the U1 snRNP-specific proteins Snrpa/U1-A and Snrnp70/U1-70K. Data in (a, d) are averaged from 3 experiments ±SD and compared by a two-tailed t-test.
(e) Fisher's exact test showing that iCSs activated by treating ESCs with a U1-specific antisense morpholino oligonucleotide (amoU1) v scrambled control (amoCtrl) 8 lack the strong bias towards the first introns observed for Srrt knockdown (siSrrt v siCtrl).
(f) 2P-Seq/3'RNA-seq data showing that amoU1 upregulates the iCS in the first intron of Ammecr1 but differs from siSrrt by additionally activating iCSs in non-first introns. To facilitate comparison, the morpholino and the siRNA data were normalized to the height of the Ammecr1 iCS peak in the amoU1 and the siSrrt samples, respectively. Positions of regulated iCSs are marked by arrowheads.
Source data are provided as a Source Data file.

Supplementary Figure 13. Examples of Srrt-regulated iCSs conserved in evolution
Multiple sequence alignments showing considerable interspecies conservation of the Srrtregulated iCS region in Ammecr1, Cdyl2 and Dcaf6 genes. Sequences were downloaded from the Multiz tract of UCSC Genome Browser (https://genome.ucsc.edu) and the iCS-proximal regions were realigned using Clustal Omega (https://www.ebi.ac.uk/Tools/msa/clustalo). Invariant positions are marked by asterisks.

Supplementary Figure 14. Relationship between Srrt regulation and retrotransposition
(a) Members of the B2 family are enriched amongst iCS-associated SINE repeats.
(b) The overall RTE density is significantly higher in Srrt-regulated first introns than in nonregulated first or non-first introns.
(c) Predicted strengths of splice donor sites (5'ss) in Srrt-regulated and non-regulated first introns are statistically indistinguishable. In (b, c), box bounds, the first and the third quartiles; thick black lines, the medians. Whiskers extend from the first and the third quartile to the lowest and highest data points or, if there are outliers, 1.5× of the interquartile range. Outliers are not shown.