Intron retention is a hallmark and spliceosome represents a therapeutic vulnerability in aggressive prostate cancer

The role of dysregulation of mRNA alternative splicing (AS) in the development and progression of solid tumors remains to be defined. Here we describe the first comprehensive AS landscape in the spectrum of human prostate cancer (PCa) evolution. We find that the severity of splicing dysregulation correlates with disease progression and establish intron retention as a hallmark of PCa stemness and aggressiveness. Systematic interrogation of 274 splicing-regulatory genes (SRGs) uncovers prevalent genomic copy number variations (CNVs), leading to mis-expression of ~68% of SRGs during PCa development and progression. Consequently, many SRGs are prognostic. Surprisingly, androgen receptor controls a splicing program distinct from its transcriptional regulation. The spliceosome modulator, E7107, reverses cancer aggressiveness and inhibits castration-resistant PCa (CRPC) in xenograft and autochthonous PCa models. Altogether, our studies establish aberrant AS landscape caused by dysregulated SRGs as a hallmark of PCa aggressiveness and the spliceosome as a therapeutic vulnerability for CRPC.


Supplementary Note 1: Global splicing dysregulation impacts PCa biology
Our systematic splicing mapping analysis revealed an association of increasing DSEs with the degree of PCa malignancy and progression (Fig. 1). We next explored the potential impact of AS dysregulation on PCa biology (Supplementary Fig. 2 and 3). By overlapping the splicing-affected genes (SAGs) and differentially expressed genes (DEGs), we observed only 9~20% of 'overlapped' genes ( Supplementary   Fig. 2a), suggesting that the majority of AS events minimally changed the bulk gene expression but may functionally tune transcriptomes 1,2 . Gene ontology (GO) analysis (http://metascape.org) indicated that the SAGs were enriched for many cancer-associated functional categories with both convergence (e.g., splicing, cell cycle and proliferation, cytoskeleton) and specificity identified at each PCa stage (Supplementary Fig. 2b-d). For instance, GO terms linked to 'muscle and ion transport', 'lipid metabolism', and 'cell polarity' were pri-PCa specific ( Supplementary Fig. 2b) whereas GO terms 'DNA damage', 'immunity', and 'nuclear pore' were enriched in CRPC ( Supplementary Fig. 2c), consistent with recent reports 3 . Interestingly, and as expected, GO terms 'SCs and development' and 'neuron and cell projection' were greatly enriched in CRPC-NE ( Supplementary Fig. 2d), in line with its stem-like and neural-like properties.
We further evaluated the potential functional consequences of AS dysregulation on PCa transcriptome by identifying transcript-level expression profiles using an isoform-specific alignment algorithm 4 . As shown in Supplementary Fig. 3a, PCa at different stages exhibited distinct splice isoform signatures. For instance, the widely studied ARv7 was only slightly upregulated in pri-PCa but, together with several other AR variants, was dramatically overexpressed in CRPC-Ad (but not in CRPC-NE due to loss of AR expression in NE tumors) ( Supplementary Fig. 3b). CD44, a cancer stem cell (CSC) marker, plays versatile roles in metastasis with CD44-standard (CD44s; CD44-201) suppressing and CD44 variants (CD44v) promoting cancer cell colonization 5 . Consistently, we observed a shift from no change in pri-PCa (vs. N) to a specific dysregulation of CD44 isoforms in mCRPC, with CD44s being downregulated in both CRPC-Ad and CRPC-NE and CD44v dramatically upregulated in CRPC-Ad ( Supplementary Fig.   3c). The splicing program driving CRPC-NE emergence is scantly explored. Recently, an SE event leading to unique upregulation of a MEAF6 isoform containing exon 6 (i.e., MEAF6-204), but not the bulk mRNA, was reported in CRPC-NE 6 . We observed similar results ( Supplementary Fig. 3d), thus validating our approaches.
In addition to these well-annotated genes in impacting PCa biology and modulating PCa phenotypes, many other differentially expressed isoforms (DEIs) may also play important roles in regulating stemness and tumor progression, although the majority of them lacked functional characterization in cancers. In pri-PCa (vs. N), a protein-coding isoform of ADAM2 (ENST00000265708.8) was significantly upregulated (FC=16.86). Despite a lack of direct study of ADAM2 in PCa, information on ADAM family generally supported a positive role of ADAM proteins in promoting cell adhesion, migration and invasion 7 . We also observed 4 distinct protein-coding isoforms of SERPINA5 (ENST00000556775.5, ENST00000554866.5, ENST00000329597.11, ENST00000554276.1) markedly down-regulated in pri-PCa. Loss of SERPINA5 expression has been reported in advanced-stage serous ovarian tumor 8 . In CRPC (vs. pri-PCa), the level of a transcript (ENST00000430799.7) encoding the long isoform of ARID1A protein (ARID1A-L) was dramatically increased (FC>23602). ARID1A belongs to the ATP-dependent chromatin remodeling BAF complex and the ARID1A-L is critical for Ewing sarcoma (ES) maintenance and for oncogenic transformation 9 . Of interest, in CRPC-NE (vs. CRPC-Ad), multiple DLK1 splice variants (ENST00000341267.8, ENST00000331224.10, ENST00000556051.1; Supplementary Fig. 3e) were found significantly upregulated. These isoforms have been reported to be expressed at early mouse embryogenesis 10 and DLK1 + cells in human prostate have been shown to localize, preferentially, to the basal cell layer close to the urethra and exhibit primitive SC properties 11 , suggesting a functional link of DLK isoforms with stemness. Experimentally, by using SYT7 as an example ( Supplementary Fig. 4), we demonstrated that knocking down clinically relevant isoforms of SYT7 in both PC3 and DU145 cells modulated PCa cell biology. It should be noted that many of studies in this project focused on PC3 cells because this model resembles CRPC both transcriptionally 12 and genomically (Supplementary Fig. 13a; below).
Collectively, these analyses indicate that splicing abnormalities impact many genes enriched in cancer-related pathways, and isoform switching of key genes may contribute to PCa aggressiveness and progression.

Positive correlation of IR with stemness
The aggressive PCa subtypes ( Fig. 1g-1j), prostatic basal/stem cells (Fig. 1k) and several other SC systems ( Supplementary Fig. 5a-c) all displayed a higher level of IR. To further reveal the potential role of IR in stemness, we focused on IR-affected genes in CRPC-Ad (vs. pri-PCa), as more IR events were identified in this comparison. We excluded the CRPC-NE from comparison because both CRPC-Ad and CRPC-NE are CRPC subtypes. We have shown that basal/stem-specific AS profile is enriched in CRPC-Ad ( Supplementary Fig. 1f). Overlapping of CRPC-and basal/stem-IR genes revealed 53 shared genes ( Supplementary Fig. 5d), among which HMGN2 is highly expressed in ESCs and plays a key role in the establishment and maintenance of cell identity 13 . Overexpression of NRBP2 has been shown to increase the stemness of hepatocellular carcinoma cells via AKT signaling 14 . Comparison of IR genes in CRPC-Ad vs. hESC (vs. differentiated Fibroblasts) demonstrated that 32.8% of CRPC-IR genes also displayed IR events in hESC, directly suggesting a functional link of IR with stemness ( Supplementary Fig. 5e). CENPT, encoding a centromere protein, is one of these IR-affected genes, and a defect in CENPT causes a syndrome of severe growth failure 15 . KLF4, recently reported to be overexpressed in murine prostate SCs, regulates their homeostasis and also controls the self-renewal of tumor-initiating cells 16 . Similarly, overlapping of IR genes in CRPC-Ad and CD4 + T cells indicated that nearly 50% of the CRPC-IR genes also harbored IR in stem-like resting CD4 + T cells ( Supplementary Fig. 5f). Among them was ING4, which was previously proposed as a tumor suppressor in PCa with ability to inhibit stemness and promote differentiation of basal epithelial cells into luminal cells 17 . IR-affected genes in different PCa stages show limited overlapping ( Supplementary Fig. 5g), suggesting context specificity. This is expected as the cancer transcriptome changes dramatically along with progression. Among the shared 8 genes ( Supplementary   Fig. 5g), CCDC28B is associated with Bardet-Biedl syndrome (BBS) and participates in the development of cilia. As a developmentally regulated gene, DRG2 is known to regulate cell growth and differentiation of stem cells (information extracted from NCBI-Gene). An IR event of AP1G2 was also found in human CD34 + progenitor cells isolated from the myelodysplastic syndromes (MDS) patients 18 . NME4 is recently reported to behave as an oncogene promoting non-small cell lung cancer stemness and progression 19 .
These discussions suggest that IR represents a conserved phenomenon associated with development and stemness with many IR-affected genes shared by different stem-like and cancer cell entities.

Potential functional role of IR
The average ΔPSI values for upregulated IR in different PCa stages were moderately low (ranging from 0.14 to 0.18) ( Supplementary Fig. 6g), consistent with recent reports 20,21 . It is noteworthy, though, that these IR events were indeed expressed in clinical samples as visualized by RNA-seq reads mapping to intron regions ( Supplementary Fig. 6g, bottom). To further establish the potential functional relevance of IR in PCa biology, we analyzed the lengths of the retained introns and observed that ~30% of retained introns were multiples of 3 in length cross all conditions ( Supplementary Fig. 6h, upper), suggesting that these introns might have the potential to encode peptides without disrupting the coding capability of the parent genes. We also analyzed the features of the intron sequences by using ORF-Predictor 22 to predict whether the intron harbors a peptide-coding region (based on the presence of a start codon 'ATG' in a DNA sequence minimal of >75nt in length). The results indicated that about 50~80% of IRs could potentially encode micro-peptides ( Supplementary Fig. 6h, lower). IR has been reported to serve as a source of neoepitopes in melanoma 23 .  Fig. 5 and 6), and that AR regulates a splicing program, but not IR specifically, distinct from its transcriptional regulation, suggesting IR as a PCa-regulating mechanism independent of AR axis. In fact, we have observed a generally negative association of AR activity with IR level in multiple clinical datasets ( Fig. 2 and 3). Together, our results establish IR as a common mechanism of cellular stemness, as supported by studies in mouse ESCs 30 . The IR prevalent in PCa is not associated strongly with cis-genomic features but seems to be regulated by trans-regulatory mechanisms involving the combinatorial effects of multiple SRGs. In support, candidate RBPs modulate not only the IR but also other splicing types as well (Fig. 2f). Alternatively, besides altered spliceosome activity, IR might also be modulated by other molecular alterations. For example, loss-of-function mutations in SETD2 (a H3K36 methyltransferase) and subsequent loss of H3K36 trimethylation at target exons are associated with increased IR in renal cancers 31 . Our work expands the view of molecular complexity underlying, and justifies further exploration on the role of IR in, PCa etiology and progression.
There are many ways by which RNA splicing can be dysregulated in cancer. Previously, recurrent point mutations in core spliceosome genes (e.g., SF3B1, U2AF1, SRSF2, ZRSR2) have been reported to drive splicing dysregulation in hematological cancers 32 . Our genomic analyses of SRGs reveal CNVs as the main driver of AS alterations in PCa, which alter the expression of affected SRGs and illustrate cancer type-specific differences in mechanisms of splicing dysregulation. Remarkably, the majority of the top altered SRGs are located in regions containing either TS genes or oncogenes ( Supplementary Fig. 10a), and have not been highlighted in previous large-scale DNA sequencing studies. This raises an interesting question of whether these alterations are just passenger mutations or causally contribute to PCa pathogenesis. Our experimental data supports the latter, as knocking down two amplified, clinically relevant SRGs twists splicing landscape and inhibits PCa cell behavior (Fig. 7). The involvement of these genes in other types of cancer has been reported 2,5,24 . Splicing dysregulation has been recently proposed as a 'driver' of transformation independently of oncogenic processes 33 . Therefore, these mutated SRGs may bear some of the 'driver' properties, and it would be interesting, in future studies, to dissect whether deletion or amplification of CNV-associated SRGs with or without collateral alterations in RB1 or MYC loci, or vice versa, could change cancer phenotypes.
Another potential mechanism that may cause splicing abnormality is the mutations in splice sites 32 .
However, mutations in splice sites constitute the minority of all somatic mutations (as low as ~0.6%) in     The FDR for GSEA is the estimated probability that a gene set with a given NES (normalized enrichment score) represents a false positive finding, and an FDR < 0.25 is considered to be statistically significant. A3, alternative 3'-splice sites; A5, alternative 5'-splice sites; MX, mutually exclusive exons; SE, exon skipping; IR, intron retention.   (a) Overlap between differentially expressed genes (DEGs) and genes showing significant SAGs (i.e., splicing-affected genes) identified in the indicated pair comparisons. The number in parentheses denotes the percentage of overlapped genes proportional to all aberrantly spliced genes. Note increased overlap in DSEs and DEGs in CRPC vs. pri-PCa (19.8%) compared to pri-PCa vs. N (9.1%). (b-d) GO (Gene Ontogeny) analysis of aberrantly spliced genes (i.e., splicing-affected genes or SAGs) reveals an impact of global splicing alterations on PCa biology by regulating genes associated with cancer pathways. Genes showing DSEs in pri-PCa (vs. normal tissues) (b), CRPC (vs. pri-PCa) (c), and CRPC-NE (vs. CRPC-Ad) (d) were used as input for Metascape analysis. Due to a large number of MX events detected in CRPC and a limit of maximal 3,000 genes that could be used for Metascape, we combined genes with A3, A5, SE and IR first and then used genes with top ranked MX to make up a list of 3,000 genes. Terms with p-value <0.01, minimum count 3, and enrichment factor >1.5 (the ratio between observed count and the count expected by chance) were considered significant. The terms with similar descriptions and functions were grouped into different functional categories, and top categories were schematically projected on the network of enriched terms. Categories specific to each PCa stage were colored. Results indicate that cancer-stage-specific SAGs are enriched in GO terms that are functionally connected into well-linked interaction networks, suggesting that splicing dysregulation has a broad impact on PCa biology.   siES1 or SE3 denotes knockdown by siRNAs against alternative exon 1 or 3.
All P values were calculated using two-tailed unpaired Student's t-test.
Source data are provided as a Source Data File.  (a-c) IR is a hallmark of undifferentiated stem/progenitor cells. Shown are bar graphs comparing the level of IR in human ESC or iPS cells relative to the respective fibroblasts (eFibroblasts, ESC-derived fibroblasts; iFibroblasts, iPS cell derived fibroblasts) (GSE73211, a), in spermatocytes vs. spermatids (GSE95138, b), and in resting vs. activated CD4 T cells (SRP058500, c).    (a and b) IR code. Shown are splice site strength (a) and GC content and intron length (b) analyses of the introns spliced either aberrantly (Up and Down) or constitutively (Control). Asterisks indicate *p<0.05 and **p<0.001 using one-sided Mann-Whitney U tests after Bonferroni correction. (c) Genes with IR tend to be expressed at higher levels. The log2(RPKM+0.00001) values were used to compare the expression of genes (number indicated) with upregulated IR events during PCa development and progression. Genes exhibiting both up-and down-regulated IR events were removed. In the plots, the center lines represent median values, box edges are 75th and 25th percentiles, and whiskers denote the maximum and minimum values, respectively. Significance was calculated using two-tailed paired Student's t-test.    . Results indicate that the pattern of luciferase activities across indicated contexts was not affected by insertion of introns from either non-AR target gene (e.g., β-globin) or AR target gene (e.g., PSA), suggesting that modulation of AR activity does not regulate IR specifically. ORF, open reading frame. Results (mean ± SD) were representative data of 3 independent experiments with 4 technical repeats for each experiment. The p values were calculated using twotailed unpaired Student's t-test test.

AR MYC
Although not all amplified genes were overexpressed significantly in many cases due to a small sample size, the trend of increased expression was observed. In the plots, the center white dots represent median values, box edges are 75th and 25th percentiles, and whiskers denote the maximum and minimum values, respectively. The significance was calculated by two-tailed Student's t-test.  ELAVL3  A1CF  IGF2BP3  CELF3  SRRM4  KHDRBS2  HNRNPH1P1  PCBP3  CELF4  ELAVL4  CPEB4  RBM28  QKI  IGF2BP2  PTBP1  NOVA2  PTBP2  MBNL3  PTBP3  YBX2  RBM38  SAMD4A  MSI1  SRRM2  SNRNP70  ANKHD1  DDX23  SART1  TIA1  HNRNPH1  EFTUD2  RBM5  MBNL2  SRSF2  NOVA1  SNRPA  SART3  PRPF3  TRA2B  PRCC  CELF5  DHX16  SF3A2  SF3B3  RBM10  CWC25  SRSF4  MBNL1  SF1  SF3A1  HNRNPM  SNIP1  CNOT4  U2SURP  CXorf56  PPWD1  WDR77  HNRNPK  PPIE  TARDBP  SNRPE  LSM5  BCAS2  ENOX1  G3BP2  HNRNPH2  FAM32A  HNRNPA1  WBP4  ELAVL2  CTNNBL1  ZCRB1  HNRNPL  SMU1  HNRNPC  HNRNPA1L2  HNRNPA1P10  HNRNPUL2  HNRNPA3  SNU13  SRSF8  PABPC3  LIN28A  HNRNPA3P6  SNRPD1  SAP18  HNRNPA1P33 RBFOX3 HNRNPA1P48     (a-d) In vivo effects of E7107 treatment on tumor growth (left) and body weight of tumor-bearing mice (right) in the indicated CRPC models. The pink lines in the right panels indicate the treatment regimens (see Fig. 9b-e). For a and b, n=4 for each group. For c, n=5 and 6 for vehicle and treatment group, respectively. For d, n=5 for each group. (e) E7107 treatment promoted more differentiated morphologies in the CRPC models. Shown are hematoxylin and eosin (H&E) staining in endpoint tumors as above. All tumors (n indicated above) were examined. Boxed regions are enlarged. V, vehicle; E, E7107. (f) E7107 treatment reshapes the AS landscape in treated CRPC models. Shown are Sashimi plots of the indicated AS events (with mapped reads from two replicates of each group) and RT-PCR validation in indicated models treated with E7107. The RNAs derived from all tumors (n indicated above) were used for RT-PCR with 3 (left panel) and 2 (right panel) samples from each group loaded in the gel.