CLK-dependent exon recognition and conjoined gene formation revealed with a novel small molecule inhibitor

CDC-like kinase phosphorylation of serine/arginine-rich proteins is central to RNA splicing reactions. Yet, the genomic network of CDC-like kinase-dependent RNA processing events remains poorly defined. Here, we explore the connectivity of genomic CDC-like kinase splicing functions by applying graduated, short-exposure, pharmacological CDC-like kinase inhibition using a novel small molecule (T3) with very high potency, selectivity, and cell-based stability. Using RNA-Seq, we define CDC-like kinase-responsive alternative splicing events, the large majority of which monotonically increase or decrease with increasing CDC-like kinase inhibition. We show that distinct RNA-binding motifs are associated with T3 response in skipped exons. Unexpectedly, we observe dose-dependent conjoined gene transcription, which is associated with motif enrichment in the last and second exons of upstream and downstream partners, respectively. siRNA knockdown of CLK2-associated genes significantly increases conjoined gene formation. Collectively, our results reveal an unexpected role for CDC-like kinase in conjoined gene formation, via regulation of 3′-end processing and associated splicing factors.


MRPS10-GUCA1B
Supplementary Figure 10: HCT116 and 184hTERT cells were treated with T3 or KH-CB19 for 6 h at the indicated concentrations. Quantitative RT-PCR analyses were performed for the expression of S6K canonical isoform mRNA (exons 6-7), exon 7 skipped isoform (exons 6-8), and conjoined genes as shown in the figure.
The data represents means ± SD from three independent analyses.
Supplementary Figure 11: Venn diagrams illustrating the number of unique overlapping and dataset-specific differentially spliced MISO events. Supplementary Figure 14: Biological process enrichment maps calculated with the Cytoscape enrichment map tool. Biological process enrichment map for differentially spliced genes in the a HCT116 stranded and b 184hTERT stranded RNA-Seq datasets. Each node represents a GO biological process gene set. Node cores are coloured red when that gene set is enriched among genes differentially spliced in the 0.05 µM sample, and the outer ring is coloured red when enriched in the 1.0-5.0 µM samples. Edge thickness indicates the level of overlap between two gene sets, considering the set of differentially spliced genes in the 0.05 µM (green edges) or 1.0-5.0 µM (blue edges) samples.    Supplementary Figure 19: Gene expression responses to T3 from FPKM analysis of RNA-seq libraries. a Standardized gene expression profiles from the HCT116 unstranded RNA-Seq dataset. Genes have been clustered using WGCNA [2] based on FPKM profiles. b Biological process enrichment map for differentially expressed genes in the HCT116 unstranded RNA-Seq dataset. Each node represents a GO biological process gene set. Red nodes represent biological processes enriched among up-regulated genes, likewise blue for down-regulated genes. Node cores are coloured blue when that gene set is enriched among genes in cluster 1, red for cluster 2. The outer ring is coloured blue when that gene set is enriched among genes in cluster 3. Edge thickness indicates the level of overlap between two gene sets, considering the set of up-or down-regulated genes.

−log10(Adjusted p−value) Top GO Biological Processes by Fold Enrichment
Supplementary Figure 35: Bar chart showing the top biological processes by fold enrichment as annotated by GeneOntology, for CLK2 IP-MS. Fold enrichment as annotated by GeneOntology (http://www.pantherdb.org/) [3] using the statistical overrepresentation test of the interactors from the IP-MS experiment against the statistical significance of the test as represented by the Bonferroni corrected P -value (in -log10 scale). The fractional value on the top of each bar is the number of genes in the dataset annotated belonging to the biological process over the total number of annotated genes in the human reference in that same category. The graph is filtered by the score of the functional enrichment testing [3] for each of the biological processes (indicated in white font on the bar respective bars; cut-off, Fold Enrichment >23.5) but sorted by their statistical significance value.  Additional assay(s) Radioactive kinase activity assay was followed by first screening to remove false positives in Kinase Glo assay. ATP competitive enzyme assay and CLK1 kinase assay were conducted to profile mechanism of action and selectivity. Confirmation of hit purity and structure Purity of compounds was checked by LC/MS. Global RNA-seq strongly shows the expected canonical splicing event pattern for CLK function in exon recognition as the major event type -ie. skipped exons and retained introns. This is the main biological point of the paper.

Level of knockdown
Sample siRNA targets CLK1 CLK2 CLK3 CLK4 Continued on next page

Supplementary Note 1
In the past, several inhibitors have been developed to target CLK proteins each with limited interactions with the different CLK isoforms and inhibitory effects achieved with relatively high compound concentrations [4,5]. TG003, a permeable benzothiazole compound [5], does not inhibit CLK3 and shows cross reactivity with casein kinase (CK1d and CK13), dual-specificity tyrosine phosphorylation-regulated kinase (DYRK1B), Yeast Sps1/Ste20-related kinase (YSK4) and proviral insertion site in Moloney Murine Leukemia Virus (PIM) kinase isoforms [6]. Recently, KH-CB19 (dichloroindolyl enaminonitriles) has been described [4] and although more potent than TG003, still has sub-optimal potency, selectivity and physiochemical properties.
To arrive at compound T3 ( Supplementary Fig. 1a) we investigated the SAR surrounding a reference scaffold compound identified by a high throughput screen (see Supplementary Tables 1, 2 for the summary conditions of the screen). Compound 1 (Cpd-1) (Supplementary Fig. 1b) was identified by HTS of our library and has been previously described [1]. Our effort was focused on improvement of CLK2 inhibitory activity. Introduction of pyridine ring into the 6-position (R2) of imidazopyridine core boosted inhibitory activity ( Supplementary Fig. 1b) likely because the nitrogen atom of the pyridine may tightly bind Lys193 (compound 2). Considering R1, a piperazine moiety was tolerated (compound 3) and N-alkylation increased activity. T3 and compound 4 showed significant CLK2 inhibition (IC50: 15 nM). T3 was selected as a tool compound due to its high solubility in water. The ATP competitive nature of the compound is revealed by the shift in CLK2 inhibition curves under different concentrations of ATP ( Supplementary Fig. 1c) and by the structural docking model for T3 (Supplementary Fig. 2).
For the docking model, T3 was docked to the crystal structure of CLK2 (PDB ID: 3NR9) using the Induced Fit Docking protocol (Schrödinger, LLC, New York). The crystal structure was protonated and minimized using the Protein Preparation Wizard, and then the ligand and water molecules were removed. T3 structure was prepared using the LigPrep (Schrödinger, LLC, New York). The docking study was performed with two hydrogen bond constraints aimed at interactions with Leu246. The pose with the top rank of the IFDScore was selected.
The structural features of T3 result in high specificity and potency of this molecule for CLK family inhibition. The specificity of T3 to CLK family members was further confirmed using other dual specificity kinases such as DYRK1A and DYRK1B as substrates in the kinase enzymatic assay which showed 200-300 times weaker inhibition compared with CLK inhibition (Fig. 3a). Due to the very high potency of T3 against CLK protein kinases, a higher ATP concentration of 1 mM was used in the kinase assay to obtain measurable inhibition in comparison with the previously published data. To further confirm specificity, the inhibition spectrum of this compound was also measured across an available panel of 71 kinases involved in multiple signaling pathways critical for proliferation and cell homeostasis. In order to directly measure the inhibition spectrum of T3 on CLK kinase activities in comparison with KH-CB19, a LANCE R Ultra kinase enzymatic assay was utilized. While percent inhibition of activity for each kinase was measured with two concentrations of T3 (100 nM and 1000 nM) in duplicate data points in an ATP competitive assay at 100 µM, CLKs, DRYKs and SRPKs kinase activity was measured in a 10 point assay with two of the reference points reported (1110 and 123 nM). No additional kinases beyond the CMGC family were inhibited from this panel (Supplementary Data 1), further confirming the selectivity of the T3 inhibitor (Fig. 3a).
Several features of T3 likely contribute to its CLK inhibitory properties. The hinge binder of T3 is aminoimidazo[1,2-a]pyridine moiety to allow for an improved binding to the catalytic subunit of the CLK proteins ( Supplementary Fig. 2). Additionally, the intermolecular hydrogen bond between the main-chain carbonyl of Leu246 and amide NH allows for abundant protein interaction. Protein-inhibitor interaction is also further stablilized by the bond between Lys193 and the nitrogen atom of the terminal pyridine. The solvent exposed piperazine moiety and hydrophobic phenyl ring could potentially be the reason for the improved potency and selectivity (Fig. 3b) in a head to head comparison with KH-CB19 (the most potent previously described CLK small molecule inhibitor) under the same assay conditions.

Supplementary Note 2
The T3 compound reduces CLK phosphorylation activity. Therefore, we hypothesized that artificially reducing CLK expression may have similar effects on RNA splicing. To test this notion, we knocked down CLK expression by transfecting CLK siRNA into HCT116 cells and sequencing the resulting transcriptomes with RNA-Seq (Supplementary Table 4).
We compared each CLK knockdown library and the reagent-only library to the NT3 siRNA control, and produced a list of AS events found to be differentially spliced in any of the CLK siRNA libraries but not in the reagent-only library. We then compared this event list to lists of differentially spliced events from the T3-treated HCT116 datasets (Supplementary Fig. 11b).
In total, 1580 unique AS events were differentially spliced in any of the CLK knockdown libraries. Of these events, 875 (55%) were found in at least one of the two T3-treated HCT116 AS event lists, confirming that many of the effects of T3 treatment are due to loss of CLK phosphorylation activity as opposed to offtarget effects. Almost half of the events resulting from CLK knockdown were not found to be differentially spliced in the T3 treated datasets, which can be partially explained by differences in biological response to depleting CLK RNA versus inhibiting CLK phosphorylation activity and possible off target effects of RNAi.
Genes differentially spliced in both T3 treated cells and cells transfected with CLK siRNA are likely to be specifically affected by loss of CLK activity. Biological processes likely to be affected by splicing changes in this common set of genes were identified by constructing a gene interaction network with the ReactomeFI Cytoscape plugin [7]. Functional enrichment analysis was then performed using the genes in the network (Table 13). Biological processes enriched among genes differentially spliced in both T3 treated and CLK siRNA transfected cells included "gene expression", "mitotic cell cycle", "chromatin modification", and "nuclear mRNA splicing, via spliceosome". CLK activity likely plays a crucial role in these biological processes in particular.

Supplementary Note 3
The RNA binding protein (RBP) class of SR proteins are the canonical effectors of CLK function in splicing. To assess the consequence of CLK inhibition by T3 in a cellular assay (MDA-MB-468, 3 hours exposure to T3), SR protein phosphorylation was measured by western blotting using anti-pan-phospho-SR antibody, which selectively recognizes phosphorylated variants of multiple classical members of the SR family. Treatment of non-stimulated HCT116 cells with T3 led to a dramatic reduction in the basal level of phosphorylated SRSF4 (SRp75), SRSF6 (SRp55), and SRSF1 (SF2) in a concentration-dependent manner ( Supplementary  Fig. 5). The reduced phosphorylation of SR proteins is clearly evident at subnanomolar concentrations with T3 treatment (log -7.4 M). In contrast and consistent with the differences in the IC 50 values, KH-CB19 showed a lesser decrease in the basal phosphorylation level of these SR proteins evident only at the highest experimental concentration of (log -5.0 M = 10 nM). These data indicate a significantly greater inhibitory effect of T3 on CLK activity compared to KH-CB19 which is a different chemical scaffold, but also of lower potency.

Supplementary Note 4
Next, we examined the effectiveness of CLK inhibition by T3 on alternative splicing of a few target mRNAs in comparision with KH-CB19. Alternatively spliced transcripts of S6K have previously been shown to be expressed in response to the modulation of CLK activity [1]. Therefore, to assess the potency of T3 treatment, we quantified the abundance of the novel transcripts of S6K as well as three novel conjoined gene events identified in our RNA-seq data sets in a real time quantitative RT-PCR (qRT-PCR) assay. We designed primer sets spanning the identified event junctions and measured their expression level in HCT116 and 184-hTERT cell lines treated with increasing concentrations of both T3 and KH-CB19 ( Supplementary  Fig. 10). The expression level of the alternatively spliced isoform of S6K was only evident in T3 treated cell lines while this transcript was undetectable with KH-CB19 treatment in both HCT116 and 184-hTERT lines. Additionally, the relative abundance of all three conjoined gene events examined in this assay were significantly higher in T3 treated cell lines. While the abundance of MRPS10-GUCA1B transcript was increased in response to both T3 and KH-CB19, indicating some affects with both inhibitors on the splicosome machinery, the other two conjoined gene transcript variant tested in this assay were only detected with T3 treatment in both HCT116 or 184-hTERT cell lines (Supplementary Fig. 10). This data further confirms T3 to be a more efficacious inhibitor with a more profound effect on alternative splicing machinery than KH-CB19.

Supplementary Note 5
We validated the detection of CG events with two orthogonal methods, first using genome-wide PacBio long read sequencing [8] and second by targeted PCR-sequencing of selected events.
The proportion of CG events with PacBio read support was 117/205 (57%) in HCT116 stranded libraries and 344/988 (35%) in HCT116 unstranded libraries. PacBio sequencing suffers from low throughput and lower sensitivity compared to RNA-Seq which may result in a large number of false negative validations. Therefore, a set of 52 conjoined gene events were selected from the short read RNA-seq libraries, for targeted sequencing (Supplementary methods) in HCT116 and 184-hTERT RNA. A total of 37 of 52 (71.2%) CG isoforms were validated by targeted sequencing (Supplementary Data 23). Interestingly, 5 apparently new CG isoforms were detected in the validation dataset. Upon inspection, 4 were found to be alternative isoforms of other CGs selected for validation (2 of which were previously detected in the RNA-Seq datasets); the other is similar to another validation input isoform except that it involves a paralog of the upstream gene. This CG isoform is likely due to reads misaligned to the paralog gene. Considering CG parent genes only, and ignoring specific splice sites, 40 (76.9%) of the CG events targeted for validation were confirmed as present.
Of the 52 CGs selected for validation, 40 were found only in the HCT116 RNA-Seq CG lists, 2 were found in only the 184-hTERT RNA-Seq CG list, and 10 were found in both cell types. All but one detected CG in the targeted sequencing dataset were found in both HCT116 and 184-hTERT cell types. One CG, which was not chosen for validation, was found in just HCT116 cells.

Supplementary Note 6
Since transcription and splicing are coupled, we also sought to determine how CLK inhibition affects overall transcription in the genome. To address this question we elucidated gene expression trends in an analgous manner to ∆AS events, by clustering FPKM ranked gene expression (Supplementary Information). This resulted in 6 clusters for the unstranded HCT116 RNA-Seq dataset, 5 clusters each for the stranded HCT116 RNA-Seq and 184-hTERT datasets (Supplementary Data 17, 18). All three datasets exhibit similar FPKM profile clusters ( Supplementary Fig. 19a). The dominant effect is monotonic T3 induced gene downregulation, as seen in the largest cluster 1, and cluster 3 ( Supplementary Fig. 19a, 72% and 11% of clustered genes) with a smaller proportion of genes up-regulated (clusters 2 and 4, 11% and 5% of clustered genes). A small number of genes exhibit non-monotonic responses (clusters 5 and 6) similar to a minority pattern observed for AS events, implying these genes may be affected by the secondary consequences of blocking exon recognition with T3.
To determine the biological processes represented in FPKM gene clusters, we performed functional enrichment analysis separately for each set of clustered genes ( Supplementary Fig. 19b, Supplementary Data 19). For the HCT116 datasets, only analysis of clusters 1-3 resulted in a list of enriched biological processes; For the 184-hTERT dataset, only clusters 1-4 produced significantly enriched biological process terms (FDR < 0.05). In both HCT116 and 184-hTERT cells, treatment causes the down-regulation of RNA splicing and RNA processing genes. Additionally genes involved in cell cycle regulation and spindle assembly checkpoints, as well as DNA repair related genes (e.g. BRCA1) were significantly down regulated. Down-regulation of cell cycle regulators upon T3 treatment suggests that CLK inhibition may disrupt cell cycle regulation and is consistent with the observation of cell cycle arrest and DNA damage response signals with T3 treatment. RNA splicing is inhibited during mitosis [9] and appears to involve the dephosphorylation of SRSF10 proteins [10]. In addition, down-regulation of SRSF3 induces G1 cell cycle arrest in HCT116 colon cancer cells [11]. Splicing repression via CLK inhibition may thus have a similar effect. Up-regulated genes were many fewer than down-regulated genes and thus affected fewer biological processes. Histone genes were found to be enriched among only statistically significant up-regulated gene expression clusters in all three datasets.

Supplementary Note 7
CGs apparently occurring exclusively in 184-hTERT cells may be partially explained by cell-type specific gene expression profiles (e.g. low gene expression in HCT116 vs. 184-hTERT at a given locus). To investigate this possibility, we calculated FPKM values for genes involved in 184-hTERT-specific CGs for each of the HCT116 and 184-hTERT datasets. The FPKM distributions of 184-hTERT-specific CG partner genes reveal a pattern of higher expression in 184-hTERT samples (Supplementary Fig. 27). Therefore, the presence of a large number of 184-hTERT-specific CGs may be at least partially explained by reduced expression of participating genes in HCT116 cells.

Supplementary Note 8
NMD (non-sense mediated mRNA decay) is an inherent cytoplasmic cellular surveillance mechanism triggered in response to the formation of transcripts with premature translation termination codons (PTC) [12]. To evaluate the effect of T3 on NMD we transfected a previously described [13] NMD reporter plasmid in HeLa cells 48 hours before T3 treatment. Inhibition of CLK activity by T3 exhibited no effect on NMD pathway based on the luciferase reporter assay. There was only a minimal increase in luciferase activity observed at the 50.0 µM concentration ( Supplementary Fig. 8). Inhibition of NMD regultors such as UPF1 and SMG1 with siRNA is reported as a positive control for this NMD assay.