Broad misappropriation of developmental splicing profile by cancer in multiple organs

Singh, Arashdeep; Rajeevan, Arati; Gopalan, Vishaka; Agrawal, Piyush; Day, Chi-Ping; Hannenhalli, Sridhar

doi:10.1038/s41467-022-35322-1

Download PDF

Article
Open access
Published: 12 December 2022

Broad misappropriation of developmental splicing profile by cancer in multiple organs

Nature Communications volume 13, Article number: 7664 (2022) Cite this article

4249 Accesses
3 Citations
7 Altmetric
Metrics details

Subjects

Abstract

Oncogenesis mimics key aspects of embryonic development. However, the underlying mechanisms are incompletely understood. Here, we demonstrate that the splicing events specifically active during human organogenesis, are broadly reactivated in the organ-specific tumor. Such events are associated with key oncogenic processes and predict proliferation rates in cancer cell lines as well as patient survival. Such events preferentially target nitrosylation and transmembrane-region domains, whose coordinated splicing in multiple genes respectively affect intracellular transport and N-linked glycosylation. We infer critical splicing factors potentially regulating embryonic splicing events and show that such factors are potential oncogenic drivers and are upregulated specifically in malignant cells. Multiple complementary analyses point to MYC and FOXM1 as potential transcriptional regulators of critical splicing factors in brain and liver. Our study provides a comprehensive demonstration of a splicing-mediated link between development and cancer, and suggest anti-cancer targets including splicing events, and their upstream splicing and transcriptional regulators.

Non-canonical functions of spliceosome components in cancer progression

Article Open access 02 February 2023

Olga M. Ivanova, Ksenia S. Anufrieva, … Victoria O. Shender

Alternative splicing and cancer: a systematic review

Article Open access 24 February 2021

Yuanjiao Zhang, Jinjun Qian, … Ye Yang

RNA splicing dysregulation and the hallmarks of cancer

Article 10 January 2023

Robert K. Bradley & Olga Anczuków

Introduction

Cancer onset and progression results in the dedifferentiation and gradual loss of lineage-specific phenotypes and echoes multiple facets of early embryonic development including rapid proliferation, epithelial-mesenchymal transition (EMT), cellular migration, and angiogenesis. The mechanistic details of these cancer-associated changes in cellular function and physiology, termed as ‘hallmarks of cancer’¹, are not completely understood. Past studies have shown that a core set of transcription factors (TFs) and signaling pathways, which maintain pluripotency in embryonic stem cells (ESCs) and orchestrate normal embryonic development, are reactivated in cancer and thus underlie physiological reversal in cancer progression^2,3,4,5. For instance, the core pluripotency markers OCT3/4 and SOX2, are important biomarkers of several cancers^6,7,8. Likewise, the Myc module of ESCs gets reactivated in mouse models of mixed-lineage leukemias and is a predictor of patient outcome in many human cancers⁵. Consistent with these anecdotes, a universal signature of stemness accurately predicts the tumor infiltration by leukocytes and response to immunotherapy⁹. In addition to TFs, various signaling pathways involved in embryonic development, such as Wnt, Notch, and Hippo, also get reactivated in cancer and their associated genes accumulate oncogenic mutations^3,10,11.

In addition to gene expression, alternative splicing (AS), wherein multiple isoforms of the same gene are expressed, affects >95% of the multi-exonic genes in humans^12,13 and underlies diverse biological processes such as stemness, differentiation, development, and ageing^14,15,16,17. A plethora of gene-centric studies have demonstrated the critical role that AS plays in cancer¹⁸. For instance, long and short isoforms of Bcl-x protein have anti-apoptotic and pro-apoptotic roles respectively^19,20. Several members of the receptor tyrosine kinase family express multiple isoforms enhancing the proliferative or metastatic ability of cancer cells. For example, the FGFR2 isoform, FGFR2III-b, is mainly expressed in epithelial cells while FGFR2III-c is expressed in mesenchymal cells²¹. This isoform switching is involved in epithelial-mesenchymal transition (EMT)²² and is linked to invasiveness and metastasis of colorectal^23,24 and breast cancers²⁵. Likewise, alternatively spliced isoforms of genes such as P63, Cyclin D1, CD44, HRAS, RAC1, and PKM can modulate proliferative, apoptotic, metabolic, and invasive properties of cancer cells^18,26,27. Recent comparative transcriptomic analyses across multiple organs showed the prevalence and cross-species conservation of alternative splicing events during development²⁸. Despite the established importance of AS in development and cancer, as well as broad phenomenological links between development and cancer, an unbiased and comprehensive investigation of the links between development and cancer AS events in a tissue-specific fashion is still lacking and can have major implications on our broader mechanistic understanding of oncogenesis and cancer therapies.

In this work, leveraging the human developmental transcriptome across multiple time points in three organs²⁹ as well as the transcriptomic data of the corresponding cancer from The Cancer Genome Atlas (TCGA) (https://www.cancer.gov/tcga), we chart the landscape of embryonic splicing events that are reactivated in the organ-specific cancer, and investigate their upstream regulators and downstream functional implications. Focusing on the most common type of AS event type, namely, exon skip events, we show that embryonic AS events associate with key oncogenic processes such as rapid proliferation, migration, and angiogenesis, and are significantly reactivated in tumors. The reactivation of embryonic AS events predicts the patient’s survival and is associated with the proliferation rate in cancer cell lines. Among ‘embryonic positive’ (EP) and ‘embryonic negative’ (EN) exons, the nitrosylation domain (ND), transmembrane-region domain (TRD), and WD40 domain are significantly enriched in all three tissues. Detailed molecular and functional analysis reveals that NDs and TRDs respectively affect retrograde cellular transport by coordinately regulating the activity of Arf and Ras family GTPases and N-linked glycosylation by regulating the transmembrane localization of oligosaccharyl transferase subunits. We further train a splicing regulatory model based on the developmental gene expression data of splicing factors which accurately predicts the inclusion of embryonic AS events in cancer patients and identifies critical splicing factors (CSFs) potentially regulating embryonic AS events. The identified CSFs are upregulated in cancer, often accompanied by copy number amplifications. Leveraging tumor single cell RNA-seq data, we show that the CSFs are specifically activated in the malignant epithelial cells, further supporting their role in malignancy. Based on multiple complementary approaches, we identify key transcription factors (TFs) predicted to regulate the identified CSFs, including MYC and FOXM1 in the brain and liver, respectively, and can be targeted using known FDA-approved drugs. Overall, our work establishes, through multi-modal data integration, reversal to developmental AS in cancer, and suggests therapeutic avenues directly targeting the regulators of such a reversal.

Results

Identification of exons associated with human fetal development

To identify the AS events associated with fetal development, we implemented a two-step approach where we first identified fetal development associated pathways, and then obtained the AS events correlated with those pathways (Fig. 1a); the rationale and advantages of this approach are discussed in the Methods section and Supplementary Note 1. Based on organ-specific transcriptomic data across multiple stages (Supplementary Data 1) of pre- and post-natal development²⁹, we first estimated the activity for each of the 332 KEGG pathways³⁰, quantified as the median expression of the pathway genes, in each sample, independently in brain, liver and kidney tissues. Principal component analysis (PCA) of the pathway activity clearly separates the pre- and post-natal stages along the first principal component (Supplementary Fig. 1a). Clustering of pathways in the PCA space (“Methods”) revealed two mutually exclusive sets of pre- or post-natal pathways which were correspondingly assigned as ‘embryonic positive’ or ‘embryonic negative’ (Fig. 1b).

**Fig. 1: Detection of AS events relevant to development of organs.**

As expected, genes constituting embryonic positive pathways are enriched in several gene ontology (GO) terms related to the processes which are crucial for embryonic development such as EMT, extracellular matrix (ECM) remodeling, cellular proliferation, and angiogenesis, providing additional validation of our approach used to detect embryonic pathways (Fig. 1c, Supplementary Data 2). Next, we used PEGASAS³¹ to identify alternative exons whose sample-specific inclusion is significantly correlated with the activity of embryonic positive pathways across developmental timepoints (“Methods”). We defined an exon as embryonic positive (EP) or embryonic negative (EN) based on the fraction of embryonic positive pathways whose activities are respectively significantly positively or negatively correlated with the exon’s inclusion level (Fig. 1d, e, “Methods”). We thus identified on average ~2000 EP as well as EN exon skip events in each tissue (Supplementary Data 3); as expected, EP and EN exons exhibit opposite inclusion patterns in the pre- and postnatal stages (Fig. 1e, and Supplementary Fig. 1b).

We found that the EP and the EN exon inclusion levels are broadly uncorrelated with the expression of their host genes, suggesting that these AS events vary independent of their host gene’s expression (Fig. 1d, Supplementary Fig. 1c). This independence is further supported by our observation that ~20-30% of the host genes of EP/EN exons in fact contain both EP and EN events (Supplementary Fig. 1d). Moreover, in almost all cases (>99%) when an exon’s inclusion correlates with an embryonic positive pathway, the exon’s host gene is not member of that pathway. Collectively, these data suggest that AS provides an additional regulatory layer to gene expression programs for controlling developmental pathways.

The host genes of EP and EN exons are significantly enriched in tissue specific processes in the case of brain and liver (Supplementary Fig. 1e, Supplementary Data 4). For example, GO terms for neuronal activities, such as synapse organization, dendrite development, neuron death, cell polarity, regulation of neurotransmitters, are enriched in the host genes of EP/EN exons specifically in brain. Likewise, liver EP/EN exons are involved in the regulation of many key metabolic processes as well as regulation of cell junctions and cytokinesis. EP/EN exons in all three tissues are enriched for autophagy, consistent with the emerging role of AS in the regulation of autophagy³². Overall, we identify numerous exons that, independent of the expression of their host gene, are preferentially utilized during fetal development and repressed postnatally, and strongly associate with key developmental and oncogenic processes.

Embryonic AS events are recapitulated in cancer and are associated with cancer stage and patient survival

We next assessed the extent to which the organ-specific EP events are recapitulated in the corresponding cancer types. First, we found that in all three organs the genome-wide profile of the AS events clearly distinguishes tumor samples from their non-malignant counterparts in TCGA (Supplementary Fig. 2a), as observed previously^31,33. Next, we identified the cancer-associated AS events in each organ by comparing the splicing profiles in tumors with healthy GTEx counterparts (“Methods”, Fig. 1a) and assessed their overlap with organ-specific EP and EN events. In all three organs we found that the EP events are significantly enriched among the AS events frequently increased in cancer, while the EN events are enriched among the AS events frequently decreased in cancer (Fig. 2a). These enrichment values correspond to the reactivation of almost 50% of the embryonic events in brain, 20% in kidney and 15% in liver, implying that several hundred (in liver and kidney) to thousands (in brain) of alternative splicing events in cancers revert back to their embryonic counterparts (Supplementary Fig. 2b, c). The observed enrichment may simply be because EP events have lower inclusion level in healthy postnatal tissues as shown in Fig. 1e and are therefore more likely to increase in cancer (analogously EN events might be more likely to decrease). We ruled out this potential confounder by randomly sampling alternatively spliced exons with low (psi < 0.3) and high (psi > 0.7) inclusion level in healthy GTEx samples of liver and testing their enrichment among the events frequently increased and decreased in liver cancer, respectively (nominal false positive rate <0.01; Supplementary Fig. 2d; Methods). Additionally, removing the exons with ~0 inclusion in healthy GTEx tissues did not affect the enrichment of EP events among the cancer-specific events (Supplementary Fig. 2e). Further, the ΔPSI values for between pre- and post-natal stages were strongly correlated with the ΔPSI values between TCGA and GTEx, in brain and liver, hinting at the broad and global similarity in the pattens of alternative splicing during embryonic development and cancer (Supplementary Fig. 2f). Using an alternative approach to quantify cancer-specific events or filtering EP events based on stringent ΔPSI criteria (prenatal – postnatal > 0.2) did not affect the significance of embryonic splicing in cancer (Supplementary Fig. 2g and Supplementary note 1). We observe an even greater enrichment of EP and EN events in advanced tumors compared with early-stage tumors (“Methods”; Supplementary Fig. 2h, I), linking embryonic splicing to not only oncogenesis but also to cancer progression. Furthermore, in all three organs, the EP (respectively EN) inclusion levels across samples are positively (respectively, negatively) correlated with cancer hallmark signature gene set scores (Fig. 2b), indicating a possible direct link between oncogenic processes and embryonic splicing. Unlike other signatures, apoptosis and DNA damage gene sets, whose activity is known to inversely correlate with tumor aggressiveness¹, are negatively correlated with EP events.

**Fig. 2: Embryonic splicing events in cancer.**

Next, we directly assessed whether the EP and EN inclusion level is associated with patient survival using Cox regression (“Methods”). In all three tissues, EP inclusion had significantly higher (and positive) hazard ratios and EN inclusion had respectively lower (and negative) hazard ratios compared to the rest of the exons (Fig. 2c); we ensured that the observed trends were not confounded by the expression level of the host genes (Supplementary Fig. 2j).

Since early embryonic development shares several molecular programs across organs²⁹, we derived a set of EP events (197 events) common to all three organs and assessed its association with survival across 20 cancer types (Fig. 2D). We further hypothesized that the shared set of EP events were more likely to result in worse prognosis across multiple cancer types, and found that indeed, a greater fraction of EP events resulted in poor prognosis of in multiple cancer types (Fig. 2e, single-tailed p value <0.05), further underscoring the embryonic roots of splicing changes in cancer.

Alternatively spliced transmembrane-region and nitrosylation domain may regulate N-linked glycosylation and retrograde cellular transport during development and cancer

To get insights into the functions potentially affected by dynamic inclusion of EP and EN exons, we performed molecular functional enrichment analysis of the genes containing the EP and EN events. In all organs, we observed a significant enrichment of Ras GTPase binding, cell adhesion, and cytoskeleton binding classes such as cadherin, actin, and microtubules (Supplementary Fig. 3a). Brain and Liver EP/EN genes were additionally enriched for dynactin and clathrin binding (Supplementary Fig. 3a). These processes promote tumorigenesis by modulating the cytoskeleton and cellular transport during the proliferation and migration of cancer cells^34,35. A more detailed discussion is provided below in the “Discussion” section.

To gain further insights into the molecular role of EP and EN exons and investigate their link with oncogenesis, we identified protein domains from PFAM database³⁶ enriched among the EP/EN exons (“Methods”). Three domains— transmembrane-region domain (TRD), nitrosylation domain (ND), and WD40—are enriched among EP and EN exons in all three organs (Fig. 3a), leading us to speculate their potential role in some of the functions performed by the host genes of EP and EN exons. To explore this potential link, we identified the gene subsets whose EP/EN exons contained these domains (total 6 gene subsets per tissue: 3 domains × 2 EP/EN gene sets) and performed molecular function enrichment analysis for each subset (Fig. 3b). As expected, enriched molecular functions in a gene set could be unambiguously attributed to the corresponding domain. For instance, gene subsets of WD40 domains were enriched for ubiquitin binding, consistent with the established role of WD40 as binding interfaces for ubiquitin proteins³⁷. Likewise, the genes containing the transmembrane region domain were indeed enriched for various kinds of transmembrane transporters (Fig. 3b). Further, the assessment of overlap among the host genes of EP and EN exons harboring these domains across tissues indicates that the observed enrichment of protein domains is not driven by the same set of genes but instead, multiple host genes of EP and EN exons coordinately splice in and out these domains across tissues (Fig. 3b). To probe the interplay between these enriched molecular functions and biological processes affected by dynamic inclusion of these domains, we performed biological processes enrichment analysis on the same gene sets and assessed the overlap of genes having a specific enriched molecular function with those having a specific enriched biological process.

**Fig. 3: Functional assessment of EP and EN exons.**

The observed correspondence between molecular function and biological processes among the host genes of EP and EN exons is well supported. For instance, in brain, host genes of EN exons with a transmembrane domain and encoding various types of transporters (molecular function) are predominantly involved in cross-membrane transport (biological process) (Fig. 3c).

Moreover, in brain and liver EP exons, the molecular function oligosaccharyl transferase activity significantly overlapped with biological processes related to N-glycosylation of proteins (Fig. 3c, and Supplementary Fig. 3c), a modification which typically takes place in the phospholipid bilayer of ER and Golgi bodies through the multi-subunit oligosaccharyl transferase complex (OST). We observed that four subunits of OST showed a coordinated reduction in the inclusion of TRD from pre- to post-natal stages, which increased again in cancer patients in brain (Fig. 3e), with TUSC3 and RPN2 undergoing greatest change. This suggests that modulation of transmembrane localization of OST through alternative splicing of TRD during embryogenesis might directly impact the process of N-glycosylation. Notably, N-glycosylation of several proteins have been implicated in cellular proliferation and migration by modulating the cell-matrix interactions³⁸. Therefore, increased inclusion of TRD among the subunits of OST might help the cancers (Fig. 3e) to upregulate the increased demand for N-glycosylation. To the best of our knowledge, the role of alternatively spliced TRDs among the subunits of OST complex in regulation of N-glycosylation has not been reported so far. To support this conclusion that removal of TRD can affect the function of OST by affecting its localization, we highlight the example of an integrin gene, ITGA2B, which contains an EN exon encoding TRD (Supplementary Data 7) in developing liver. Past research has shown that ITGA2B is alternatively spliced in melanoma, prostate cancer, and leukemia producing a truncated isoform lacking the transmembrane and cytoplasmic domain^39,40. This truncated isoform, instead of integrating into the plasma membrane, is secreted into the extracellular matrix, unscrewing the adhesion, and promoting the migration of cells. Our analysis suggests that a similar mechanism is used in the case of OST complex, where the removal of TRDs would result in its dissociation from ER membrane, impeding the process of N-glycosylation of proteins.

Similar analysis for ND revealed that host genes of EN exons containing this domain in the brain were significantly enriched for the molecular functions related to GTPase activity and its regulators (Fig. 3b). Previous studies have implicated the role of nitrosylation modification in the upregulation of GTPase activity^41,42. Our result thus suggests the role for alternatively spliced ND in the modulation GTPase activity during embryogenesis and cancer. In fact, few of the genes containing nitrosylation domain among brain EN exons, such as RAB6A and RAB6B, are GTPases belonging to RAS oncogene family, hinting at autoregulation of their GTPase activity through dynamic inclusion and exclusion of nitrosylation domain. Interestingly, one of the small GTPases, the RHOA, was previously shown to be inactivated through alternative splicing in diffuse-type gastric carcinoma cells⁴³. We found that the exon involved in this splicing event (3^rd exon) indeed encoded a ND and was embryonic negative (EN) in liver and kidney. This supports the broader role of the alternatively spliced ND in regulating the activity of the various small GTPases and cellular transport during development and cancer (Fig. 3d).

As for transmembrane domain, we obtained the genes having a ND among the EN exons in brain and identified the correspondence between the enriched molecular functions and biological processes (Fig. 3d). We observed that genes having GTPase activity were involved in Rab protein signal transduction and retrograde vesicle transport from endosomes to Golgi bodies to endoplasmic reticulum (Fig. 3d), the processes where GTPases are known to play a critical role^44,45. Among the GTPases having a ND in their EN exons, ARL1 gene had the greatest change in the inclusion of ND from pre-natal to post-natal stages and then in cancer (Fig. 3f). Our analysis thus suggests the underappreciated role of alternatively spliced ND in the regulation of the cytoplasmic transport by modulating the activity of GTPases. Additionally, some of the genes containing a ND among brain EN exons were enriched for the molecular functions related to BH-domain binding, death-domain binding and MAP-kinase signaling, which corresponded to the processes related to intrinsic apoptotic signaling pathways (Fig. 3d), potentially implicating exclusion of ND in modulating apoptosis⁴⁶.

Overall, our results implicate recapitulation of embryonic alternative splicing patterns of transmembrane and nitrosylation domains in several key oncogenic processes.

Splicing regulatory model of EP events reveals key splicing factors dysregulated in cancer

Splicing factors (SF) control the choice and inclusion level of alternatively spliced exons⁴⁷. To identify potential SFs regulating embryonic splicing, we trained a partial least squares regression (PLSR) model to predict the median inclusion level of EP events based on the expression levels of 442 annotated SFs (“Methods”, Supplementary Data 5). In each organ, trained solely on the developmental data, our model predicted the median inclusion level of EP events in independent tumor samples (TCGA) as well as normal samples (GTEx) with a high accuracy (average correlation between predicted and observed EP levels ~0.88 for TCGA and 0.84 for GTEx; Fig. 4a and Supplementary Fig. 4a). Further, the predicted EP inclusion values can distinguish GTEx normal samples from their corresponding TCGA cancer samples with a high accuracy in brain and medium accuracy in liver and kidney (Supplementary Fig. 4b), underscoring that the model can predict the cancer-associated changes in the EP splicing.

Next, we obtained the list of splicing factors that were significant positive predictors of median EP splicing during embryonic development based on their regression coefficients in the PLSR model (“Methods”) and termed those as critical splicing factors (CSFs, Supplementary Data 5). As expected, CSFs in each organ had higher expression during the prenatal stage of development and underwent significant upregulation in their corresponding cancer (Fig. 4b). Though our focus is only the positive regulators of EP splicing as those are upregulated in cancers relative to normal tissues, we confirmed that splicing factors with negative regression coefficients in the PLSR model undergo downregulation in cancers relative to the normal tissues and are potential negative regulators of the EP events (Supplementary Fig. 4c). Further, the deletion of orthologous genes of brain CSFs results in defective nervous system development in mice, and CSFs from all three tissues are much more likely to result in preweaning lethality as compared to the other splicing factors (Supplementary Fig. 4d, Supplementary Data 6), further supporting the developmental role of CSFs.

We observe that cancer patients with higher expression of CSFs and correspondingly higher inclusion level of EP events have a significantly greater number of copy number amplifications in CSFs (Fig. 4c). In addition, a gain in CSF expression is significantly associated with worse patient survival in cancer (Fig. 4d).

To assess whether CSFs play a causal role in regulating EP events, we tested if the EP inclusion level is decreased in tumor samples bearing nonsense (inactivating) mutations in CSFs. We first identified all SFs whose mutant samples have lower and higher EP inclusion than the wildtype samples and found that potentially causal SFs (i.e., SFs whose mutant samples have lower EP inclusion relative to WT samples, Methods) have significantly higher (and positive) regression coefficients as compared to the other SFs in the PLSR model of EP splicing (Fig. 4e), establishing a potentially causal role of CSFs in regulation of EP events. We ensured that our results are not confounded by SFs expression differences between the mutant and wildtype samples (“Methods”).

We further ascertained that PLSR can identify the causal factors underlying the inclusion of EP events by using shRNA knock-down followed by RNA-seq data for RNA binding proteins in HepG2 (liver cancer) cell line from ENCODE database⁴⁸. Following an identical procedure as above, we learned the CSFs critical for the inclusion of liver-specific EP events in HepG2 cell line. We observed that knocking out these CSFs is much more likely than other splicing factors to decrease the inclusion of EP events (“Methods”; Fig. 4f), providing a strong support for the causal role of CSFs in EP splicing.

Some of the CSFs identified in developing human tissues are known drivers of various solid and hematological malignancies. For instance, CDC5L and PCBP2 (CSFs in brain) are reported to promote the growth of gliomas^49,50 and bladder cancers⁵¹. Additionally, SF3B1 (a CSF in kidney and liver) and U2AF2 (CSF in liver) are frequent drivers of lung and pancreatic adenocarcinomas^52,53.

Besides the aforementioned examples, the pooled set of CSFs from all three tissues identified in our work was significantly enriched for 119 RNA binding proteins which were previously identified as the driver genes in one or more cancer types⁵⁴ (Fig. 4g, Methods). Further, a greater fraction of CSFs in brain, liver, and kidney had mutational hotspots in their corresponding cancers as compared to non-critical splicing factors (Supplementary Fig. 4f), further underscoring the role of CSFs in promoting malignancy.

Overall, these results reveal potentially causal SFs underlying the EP events and link the induction of such SFs, potentially via copy number amplification, to cancer. In the TCGA cancer samples, a median of 47%, 32%, and 16% of the CSFs were respectively upregulated (fold-change > 1.5) in brain, liver, and kidney cancers as compared to normal samples (Supplementary Fig. 4g). Considering this along with the mutational and shRNA analysis presented above, it appears that, although the deletion of a single CSF could have a small (albeit significant) effect on the inclusion level of a subset of EP exons, the broad reprogramming of splicing observed in cancers is achieved by activation of several CSFs, possibly driven by upstream transcription factors as we investigate in the next sections.

Embryonic splicing events are associated with proliferation rates in cancer cell lines

Our results above (Figs. 1b, 2a) suggest that increased inclusion of EP events in tumors might be involved in mediating oncogenic processes such as rapid proliferation, EMT, and angiogenesis. Leveraging the DepMap database (https://depmap.org/portal/) that includes RNA-seq data and proliferation rates in multiple cancer cell lines, we find that in liver and brain, there is a negative (respectively positive) association between the doubling time and the median EP (respectively EN) inclusion levels across cell lines derived from the organ-specific cancer type (Supplementary Fig. 5a), hinting at a possible link between EP/EN usage and proliferation rate of cancer cell lines.

To further consolidate this link, we calculated the proportion of EP and EN exons among all the splicing events that were strongly correlated with the doubling time of cancer cell lines (“Methods”). We observed that the brain and liver EP and EN events were strongly enriched among the exons which were respectively negatively (PCC < −0.5) and positively (PCC > 0.5) correlated with the doubling time of their corresponding cell lines in CCLE data (Fig. 5a, Supplementary Fig. 5a). This enrichment implies that exons linked with proliferation rates of cancer cell lines were more likely to be embryonic in nature. The lack of association between embryonic events and doubling times of cancer cell lines for kidney could be the result of heterogeneity as discussed below.

**Fig. 5: Embryonic splicing events and their regulators in cancer cell lines.**

Further, splicing regulatory models learned from the developmental data could accurately predict the EP event inclusion in the corresponding cell lines (Fig. 5b). Collectively, these observations further validate the links between CSFs and proliferation, mediated by EP events. Given the links between CSF activity and proliferation, we expect that inactivation (by CRISPR or RNAi) of the CSFs will have an adverse effect on the proliferation rates of the cell lines. Indeed, we found that in liver cancer-derived cell lines, the more critical a SF (based on PLSR coefficient), the greater was the dependency of the cell line on that SF (negative dependency scores, Supplementary Fig. 5b), supporting a functional role for CSFs; however, we did not see this trend in brain and kidney, as discussed below. Further supporting the role of CSFs in malignant transformation, we found that in the single-cell transcriptome of liver and brain (“Methods”) tumor microenvironment, CSFs were specifically expressed in the malignant but not in non-malignant cells (Fig. 5c). Collectively, these observations link the role of CSFs in tumor cells with cellular proliferation rates through regulation of specific AS events, which might serve as potential therapeutic targets.

CSFs are potentially regulated by MYC, FOX, and BRD family transcription factors

Next, we investigated potential upstream transcriptional regulators of CSFs, as targeting them may have a broader effect on CSFs, with the resulting changes in EP inclusion potentially improving patient prognosis. We applied four criteria to identify high-confidence upstream transcriptional regulators of CSFs (Fig. 6a). First, as an initial filtering step, we utilized a large collection of ChIP-seq datasets across multiple cell lines curated in the TFEA.ChIP database⁵⁵ and shortlisted TFs whose binding was significantly enriched within the promoter regions of CSF as compared to non-critical splicing factors (nCSFs) of EP events (first column in Fig. 6b; “Methods”). Next, we used the KnockTF database⁵⁶, which details transcriptome changes upon TF deletion, to calculate the enrichment of CSFs relative to nCSFs among the downregulated targets following TF deletion and retained significant hits (second column in Fig. 6b, Methods). A major limitation of KnockTF is low coverage of TFs. We therefore applied two additional computational approaches to filter the TFs shortlisted based on TFEA.ChIP.

**Fig. 6: Potential TFs regulating CSFs in three organs.**

First, for each factor shortlisted based on TFEA.ChIP, we inferred its in-silico targets using the ARACNe software tool⁵⁷ and selected TFs whose in-silico targets were more significantly enriched for CSFs relative to nCSFs (third column in Fig. 6b, Methods). Secondly, among the list of ChIP-seq filtered regulators, we identified TFs whose expression was more strongly correlated with CSFs as compared to the nCSFs in cancer transcriptomic data (fourth column in Fig. 6b, Methods). Overall, we retain in each organ, the TFs that (after the ChIP-seq-based filtering) either qualified the experimental KnockTF-based criterion or both of the computational filters. Collectively, these results implicate MYC, FOX (specifically FOXM1), and BRD family of TFs in the regulation of EP events through upregulation of CSFs and may represent master regulators of broad splicing changes associated with development and cancer. Such master regulators and key CSFs of EP splicing could be plausible druggable targets (Supplementary Methods & Supplementary Table 1) to halt cancer progression by impeding the processes mediated by EP splicing events.

Discussion

The availability of transcriptomic datasets of tumors from TCGA and PCAWG consortia have facilitated the genome-wide analysis of alternative splicing changes in cancer elucidating their prognostic value⁵⁸, genetic basis^59,60, and the discovery of tumor neoantigens generated by alternative splicing⁶¹. However, none of these studies analyzed the broader developmental context of splicing changes in cancer. Leveraging recently available temporal developmental transcriptomic data in three human organs, in this work, we have shown that the genome-wide splicing landscape of cancers significantly reverts to the early embryonic developmental stage of their tissue of origin, strongly implicating developmental splicing events in oncogenesis and tumor progression.

Similar to gene co-expression modules, inclusion of multiple exons across genes is coordinated to affect specific cellular functions during differentiation^62,63, cell state transition⁶⁴, apoptosis⁶⁵, and hormonal induction⁶⁶. Our results suggest that coordinated programs of EP and EN splicing events are involved in embryonic processes, such as cellular proliferation, apoptosis, EMT, migration, and that cancers seem to misappropriate these coordinated exon inclusion events to revert to an embryonic-like state. Evolutionary comparisons in past have shown that alternative splicing results in neo-functionalization and increases proteome complexity of genes^67,68 which is often driven by the divergence of exonic structure of genes. Changes in exonic structure of genes have been observed in cancers as well via mutations that create splice sites⁶⁹. Therefore, we speculate that alternative splicing can promote carcinogenesis through two distinct routes, either through the re-activation of multiple aspects of the embryonic physiology, or by fueling the functional novelties and proteome complexity driven by creation of new splicing events, or a combination thereof.

Further, EP and EN events ascertained based on developmental context alone are significantly prognostic in the corresponding cancers in TCGA. For instance, the inclusion level of EP and EN events respectively predicted worse and better survival of cancer patients, underscoring the value in studying fetal development to better understand cancer mechanisms. Moreover, the enrichment of the EP exons among the splicing events which had a negative correlation with doubling time (equivalently, positive correlation with proliferation rate) of brain and liver cancer cell lines provides an independent functional validation for the role of these splicing events in mediating cellular proliferation, which is relevant to both development and cancer. However, these associations do not hold true for the case of kidney cancer cell lines. While cell lines are standard choice to model several diseases, they do not entirely capture the in vivo complexity. In our analysis, although we derived the EP and EN exons from the developing human embryos and yet, rapidly proliferating CCLE cancer cell lines indeed have higher usage of EP exons and lower usage of EN exons in brain and liver, suggesting a conserved cell-intrinsic links between splicing and proliferation.

Previous research has shown that AS can affect cytoskeleton, enzymatic properties, and membrane localization of proteins⁷⁰. Here we observe that the molecular functions related to cytoskeleton binding and regulation of GTPase activity and cellular transport were highly enriched among EP and EN exons across all three organs we studied (Supplementary Fig. 3a). These molecular functions are central to cellular proliferation through regulation of cell cycle^{71,72,73,74,75} and cellular migration^76,77,78 and, consequently, have emerged as important players in cancer progression and metastasis^35,79.

The analysis of protein domains enriched among EP and EN events further suggested their functional coordination in regulating diverse cellular processes such as proliferation, migration, neuronal physiology, and stress resistance. For instance, proliferation and migration of cells relies on alterations in the cytoskeleton, extracellular matrix, and cell adhesion, which are modulated by N-glycosylation of proteins like actin, cadherins and integrins³⁸. Our observation that subunits of OST (including TUSC3 and RPN2) undergo coordinated splicing of their TRDs among EP events suggests the role of this splicing in the regulation of N-glycosylation during organogenesis (Fig. 3c). Further TRDs in vesicle trafficking (PAQR3, PRAF2, SEC22) and mitochondria (ABCB6, ABCB8, and SDHC), WD40 in E3 ligase involved in protein degradation (DTL, FBXW9, WDR48), and NDs in GTPases for membrane signaling (ARF4, TESK1) were coordinately spliced among the brain EP exons. This suggests the functional coordination in energy metabolism and protein synthesis/processing during neuronal development is, in part, mediated via alternative splicing. In accordance, knocking down CSFs that regulate brain EP events result in the defects in the nervous system development in mice (Supplementary Fig. 4d), supporting the essential role of coordinately spliced EP events in organ development.

The EP and EN-mediated functional coordination is further illustrated in the case of coordinately spliced protein domains among the EN events in brain (Fig. 7b). Coordinately spliced ND in the GTPases involved in vesicle trafficking (ARL1 and RAB family genes), TRD in the endoplasmic reticulum-associated proteins involved in the ceramide/inositol synthesis (CDITP, CERS2, KDSR, etc.) and synaptic proteins involved in neuronal signaling (DAGLB, KCNN2, MCTP1, etc.) suggests the coordination in post-natal neuronal function such as setting up and firing rapid action potentials (Fig. 7b, right panel) and loss thereof in cancer (Fig. 7b and Supplementary Fig. 6b).

**Fig. 7: Coordinated splicing in brain development and cancer.**

Moreover, for a vast majority of the EP domains, their inclusion level, which is higher in pre-natal stages, switches back to pre-natal stages in cancer (Fig. 7a and Supplementary Fig. 6a). This suggests that host genes containing these domains drive cancer progression and EP exons of these genes can be potential therapeutic anti-cancer targets.

We note that most of the protein domains are enriched among EN exons (Fig. 3a), implying that a relatively larger fraction of annotated domains is involved in processes that are active postnatally. A general bias in functional roles of alternatively spliced domains to be involved in development-related functions has been noted previously⁸⁰ but the differential functional underpinnings of this observation relative to EP and EN are currently unclear and will require further investigation.

Many of the EP/EN events are previously reported and experimentally validated to be alternatively spliced in various diseases including cancer (Supplementary Data 8), For instance, APAF1 gene encodes an apoptotic protein and hosts an EN exon encoding WD40 domain in developing brain. Interestingly, a previous report has shown that APAF1 is alternatively spliced in prostate cancer cell lines, producing a shorter isoform called APAF1-ALT lacking WD40 domain⁸¹. Moreover, this shorter isoform impeded the induction of DNA-damage-induced apoptosis in cells, thereby allowing cells to acquire DNA-damage-induced resistance against treatment. Thus, the change in the apoptotic roles of APAF1 via alternatively spliced WD40 domain appears to be general mechanisms employed during embryogenesis as well as cancer. Additionally, the gene FLVCR1 encodes a heme transporter and hosts an EN exon encoding TRD domain in brain. Previous work has shown that various alternatively spliced isoforms of this gene lacking the TRD are expressed in the case of Diamond Blackfan anemia (DBA). Importantly, the patients with DBA have an elevated risk of neoplastic growth⁸². This example implies that the regulation of iron metabolism by controlling its transport by alternatively spliced FLVCR1 gene could be a crucial mechanisms to regulate iron levels in developing human brains⁸³ as well as cancers. The truncated isoform of integrin ITGA2B lacking the transmembrane domain is another example, which was previously reported to be secreted into the ECM in various cancers, breaking adhesion and facilitating cell migration⁴⁰.

Further, many of the tetraspanins, which are scaffolding proteins present at the membrane of the cell, and mediate various cellular functions such as proliferation, adhesion and signaling⁸⁴ contained an EP or EN exon encoding TRDs across the tissues (Supplementary Data 7). Alternatively spliced TRD in these proteins are reported to generate isoforms having alterations in the tetraspanin-enriched microdomain functions, which includes cell signaling and cell adhesion^84,85. These examples (and Supplementary Data 8) support that the EP and EN events can indeed change the function of proteins and contribute to the broad functional convergence observed between embryogenesis and cancer.

Several single-cell RNA-seq studies in the recent past have noted a general similarity in development and cancer^86,87,88. Therefore, our results indicate that these similarities are hinged upon a much broader and coordinated reprogramming of splicing in cancer cells back to their embryonic counterparts.

Critical splicing factors, which were inferred to regulate the inclusion of EP events based solely on the embryonic developmental data, are upregulated in cancer, and confer poor prognosis to the patients. Furthermore, inactivating mutations of critical splicing factors in cancer patients and shRNA knock-down in HepG2 cell line result in the decreased level of embryonic splicing, strongly supporting their causal role in regulating embryonic splicing events. The causal role of CSFs also supported by their significantly higher, and experimentally quantified, dependency scores in the DepMap dataset in liver cancer-derived cell lines. However, we did not see this trend in the brain and kidney, which may be attributed to divergent physiology and regulatory networks in cell lines as compared to tumors in the context of the tumor microenvironment. Although we observed that significant numbers of CSFs were drivers in multiple cancer types (Fig. 4g and Supplementary Fig. 4e), we did not observe a progressive increase in the mutation load (defined as total no. of mis-sense mutations per sample) among CSFs in the late-stage cancers as compared to the early stages (Supplementary Fig. 4h). This suggests that the mutational perturbation of CSFs and the corresponding change in splicing profile is involved in tumor initiation; however, it is not clear if different sets of CSFs are involved in initiation and progression of cancer and will require longitudinal data. Notably, consistent with the previous reports implicating the role of alternative splicing in the regulation of splicing factors⁸⁹, we too observed that a significant fraction of splicing factors hosted an EP and EN events, with CSFs having relatively higher proportion of EP and EN events (Supplementary Fig. 4i). Therefore, we speculate that the action of CSFs in promoting malignancy is mediated through their specific isoforms requiring in depth investigation in future.

Together, these observations highlight the inferred critical splicing factors as potential therapeutic targets against cancer progression.

Further we found that CSFs in each developing organ, as well as the corresponding cancer, were likely regulated by FOX (FOXM1), MYC, and BRD family of transcriptional regulators. Regulation of splicing factors and splicing events by MYC has been previously noted^31,90. A recent report shows that MYC-driven splicing factors regulate ~4000 splicing events across cancers⁹¹. Consistent with our findings, FOX and the MYC family of regulators control growth, proliferation, and survival of cells in multiple contexts during embryogenesis as well as cancer^92,93. Our work extends the previous studies by showing the regulation of splicing factors and functionally coordinated embryonic splicing events by MYC, BRD, FOX family of TFs in the developmental context, thus providing further mechanistic links between development and cancer. These observations hints that the embryonic reversal of cancer splicing drives cancer in conjunction with much broader transcription and epigenetic reprogramming mediated via perturbations in various master regulators (such as MYC and FOXM1) as well as critical splicing factors.

Although gene regulation is best studied experimentally using gene knockouts followed by RNA-seq experiments to reconstruct transcriptome-wide gene regulatory networks⁹⁴, such datasets do not always exist for desired transcription factors in every cell line/model system in humans. In our analysis presented in Fig. 6, we have used KnockTF, which is one such database, along with three other computational filters to identify the key master regulators of CSFs (Fig. 6b, 2^nd column). Our results suggest that the broad changes in the expressed isoforms of key genes driven by the upregulation of CSFs is likely a major mechanism by which these TFs exert their physiological effects. Therefore, targeting the upstream regulators of CSFs might result in broader changes in genome-wide splicing and improve the survival rates of patients. But such an approach is likely to suffer from unintended side effects owing to the lack of specificity and pleiotropic nature of transcription factors. Therefore, direct targeting of EP exons, through recently developed CRISPR-based techniques^95,96, as opposed to their upstream regulators, might result in specific lethality in the tumor cells. In the future, transcriptomic experiments following the deletion of CSFs or their upstream regulators would be necessary to establish the proposed mechanistic links and explore their therapeutic potential.

Collectively, our multi-pronged investigation not just conceptually enhances the understanding of broad functional roles and regulation of alternative splicing in the context of development and cancer, but also suggests putative cancer therapeutic targets. Our work also provides a framework to study the cellular mechanisms implicated in development and cancer using other molecular modalities such as miRNA and lncRNA activities, DNA methylation and histone modification profiles, alternative promoter, and poly-A usage.

Methods

Datasets and quantification of exon inclusion

For brain, liver, and kidney, uniformly processed RNA-seq data for tumors from TCGA (https://www.cancer.gov/tcga) and normal samples from GTEx⁹⁷ were downloaded from the UCSC-Xena browser (data version V7). We used UCSC-Xena browser^98,99 as it hosts the datasets from UCSC toil RNA-seq recompute compendium¹⁰⁰ which were normalized for multiple computational as well as within cohort batch effects. The UCSCXenaTools library in R⁹⁸ was used to download transcript-level TPM values computed using Kallisto¹⁰¹; the details of data integration and processing can be obtained from UCSC-Xena browser (https://xenabrowser.net/). In total, we obtained the expression levels of 197,046 transcripts across all samples. The number of samples obtained are brain cancer – Lower grade glioma (LGG): 523; Glioblastoma GBM: 172, normal brain – brain cerebellum: 118; brain cortex: 107, liver cancer – Liver hepatocellular carcinoma (LIHC): 369, normal liver –110, kidney cancer – Kidney renal papillary cell carcinoma (KIRP): 321; Kidney renal cell carcinoma (KIRC) 595, normal kidney – 27. For developmental data²⁹, we obtained the raw reads from the array express using the accession number E-MTAB-6814 and computed the transcript level TPM values using Kallisto¹⁰¹ and the transcriptome index based on Gencode version v23 (https://www.gencodegenes.org/human/release_23.html) annotations, the same version which was used by UCSC-Xena. We used pseudoalignments based approach using Kallisto software to process RNA-seq datasets as it is much faster than classical alignment^101,102, and estimated TPMs showed very high concordance with RT-PCR-based measurements^103,104. The data includes multiple pre-natal and post-natal time points in each organ (Supplementary Data 1). To quantify the inclusion level of exons in each sample, we calculated the ‘percent-spliced-in’ (PSI) value for each exon, which ranges from 0-1 (i.e. from fully excluded to fully included), using SUPPA-2¹⁰⁵. We choose SUPPA2 as it enabled us to directly use the elegant datasets from UCSC Toil RNA-seq recompute compendium¹⁰⁰ hosted at toilhub of UCSC-Xena browser⁹⁸, ensuring uniform processing and normalization of batch effects, Additionally SUPPA2 is much faster than most other tools and requires lesser storage space as it can use pre-computed TPM values¹⁰⁵. Further, we validated our main conclusion, namely, reversal of splicing events in cancer to pre-natal state of the corresponding tissue, using an entirely different pipeline – STAR 2 pass alignment¹⁰⁶ followed by rMATs¹⁰⁷ (Supplementary note 2). Transcript-level TPMs were converted to gene-level TPMs and subsequently quantile normalized as needed for the follow-up analyses. All the scripts used for downloading and processing the RNA-seq datasets are available in at https://github.com/hannenhalli-lab/AltSplDevCancer.

Developmental splicing events

To identify splicing events deemed to be involved in embryonic development, we adapted a previously published strategy called PEGASAS³¹. PEGASAS identifies the alternative splicing events that correlate with the activity of a specific biological pathway. In this study, we identified developmental exons via a three-step process as follows.

Step 1: We scored the activity of each of the 332 KEGG pathways³⁰ at each time point during development using the median of log-transformed expression of its constituent genes, resulting in a 332 × N activity matrix, where N is the number of developmental time points that were sampled for each tissue and are given in Supplementary Data 1. Clustering this activity matrix reveals two broad clusters (Supplementary Fig. 1a)—one active pre-natally and the other active post-natally.

Step 2: We applied an additional smoothing procedure in PCA space where our goal was to quantify each pathway’s tendency to be preferentially oriented towards a specific developmental timepoint. In 5-dimensional PC space (first 5 PCs explain ~65% of variance), each timepoint occupies a unique coordinate based on the PC scores. In this space, similarly, each pathway corresponds to 5-dimensional vector of the pathway’s loading in each of the 5 PCs. We quantify the preferential orientation of a pathway toward a specific timepoint as cosine similarity between the loading vector and the location of the time point in the 5-dimensional space. This procedure yields a smoothed 332 × N matrix clearly segregating 332 pathways into two broad groups based on their preferential activity during pre- or post-natal stages of development (Fig. 1a). The grouped pathways were correspondingly called embryonic positive and embryonic negative pathways.

Step 3: Next, we used an approach similar to PEGASAS³¹ and computed the cross-sample Pearson’s correlation coefficient (PCC) between the PSI value of each exon and pathway activity score in Step 1. For each exon, we selected the significantly positively or negatively correlated KEGG pathways correcting for 332 tests performed for each exon based on the Benjamini-Hochberg FDR threshold of 0.05. We call an exon embryonic positive (EP) if it is significantly correlated with at least 10% of the embryonic positive pathways vs. at most 5% of the embryonic negative pathways. Analogous criteria were applied to define embryonic negative (EN) exons.

The PEGASAS-based approach is superior in detecting the splicing events relevant to embryonic development of tissues compared to simply performing differential splicing between pre- and post-natal stages of development because (i) the sample size of the developmental dataset is insufficient for a robust differential inclusion analysis, (ii) an individual exon’s inclusion can be highly variable within pre- and post-natal stages, which can confound the identification of embryonic splicing events using differential analysis, (iii) since the PEGASAS approach is anchored on robustly identified embryonic positive and negative pathways, instead of relying only on an individual event’s temporal dynamics, it is likely more robust to noise. In Supplementary note 1, we provide a detailed discussion of relative advantages of PEGASAS approach compared with the conventional differential inclusion analysis.

Cancer-specific splicing events

For each exon skipping event identified by SUPPA2, we performed a tumor-normal comparison of its PSI value to identify the splicing events which were differentially included in tumors. Owing to the transcriptomic heterogeneity across tumors, a standard differential splicing analysis, which assesses the significance of difference in the median PSI values of cancer and normal samples, will not detect exons mis-spliced in a small number of tumors, which can nevertheless be biologically significant⁶¹. Therefore, we selected the events which were at least 2 standard deviations away from the mean of their distribution in the corresponding GTEx normal samples in a consistent direction (i.e., increased or decreased) in at least 15% of the cancer patients. (Fig. 1a). Correspondingly, such events were termed as frequently increased or decreased in cancer. We focused only on exon skipping events as those are better annotated in transcriptional databases and are easier to interpret functionally.

Comparison of cancer and developmental splicing and functional enrichment analysis

To assess if cancer recapitulates embryonic splicing events, we assessed the significance of overlap between cancer and developmental splicing events using Fisher’s exact test and adjusted the P-value using Benjamini-Hochberg’s FDR method. Functional enrichment analysis was performed using the clusterProfiler library in R and the p-values of the resulting significant terms were adjusted with Benjamini-Hochberg’s method. For plotting, the resulting GO terms were simplified based on their semantic similarity using the ‘simplify’ function from clusterProfiler in R (similarity threshold of 0.7).

Protein domain enrichment in EP and EN exons

We downloaded the transcriptomic coordinates of all the PFAM domains that were mapped to the reference genome (hg38) from the prot2hg database (http://www.prot2hg.com)¹⁰⁸. Since any given domain can be incorporated either fully or partially in multiple transcripts, the downloaded file was preprocessed to remove redundancy of genomic coordinates resulting from the same domain mapping to multiple transcripts by using bedtools merge¹⁰⁹. We then intersected the preprocessed genomic coordinates of protein domains to the unique and non-overlapping set of EP and EN exons as well as the rest of the alternatively spliced skip exons (called background exons) using bedtools intersect in each tissue. To identify the domains enriched in EP and EN events, we computed the frequency of occurrence of each domain in EP, EN, and background exons and performed a Fishers’ test of enrichment in each tissue. The resulting p-values from the Fishers’ test were corrected for multiple testing by using Benjamini-Hochberg’s method and the domains with an odds ratio > 1 and corrected p-value <0.1 were considered enriched among EP or EN exons in each tissue.

Survival analysis

We used clinical data from TCGA to model the overall survival of cancer patients using the inclusion level (PSI value) of each exon as a predictor variable and age as a covariate in the cox regression. We used the R library “survival” for this analysis and the resulting p-values were adjusted for multiple testing using Benjamini-Hochberg’s method. The distribution of the resulting hazard ratios was compared between embryonic positive, negative and the rest of the splicing events.

Model for regulation of embryonic splicing

To dissect the potential regulators of embryonic splicing events, we built upon a commonly used notion that differential expression of splicing factors could lead to the differential splicing of the exons¹¹⁰. For this, we identified 442 proteins which have the term ‘splicing’ in their GO definition from the Amigo database¹¹¹. We then used a partial least square regression (PLSR) analysis to model the inclusion of EP events using the gene expression of splicing factors in the developmental data. PLSR outperforms multiple linear regression when dealing with multicollinearity among the predictor variables or when the predictor matrix is non-singular¹¹².

For gene expression matrix X of 442 features (SFs) across N developmental timepoints (n × 442) and response matrix Y of median EP splicing across n timepoints (n × 1), the PLSR transforms X and Y as per the following relations:

$${{{{{\bf{X}}}}}}={{{{{\bf{T}}}}}}{{{{{{\bf{P}}}}}}}^{T}+{{{{{\bf{E}}}}}}$$

(1)

$${{{{{\bf{Y}}}}}}={{{{{\bf{U}}}}}}{{{{{{\bf{Q}}}}}}}^{T}+{{{{{\bf{F}}}}}}$$

(2)

where T and U are the N × r matrices of the extracted latent vectors and P (p × r) and Q (1 × r) are the loadings of X and Y. E (n × p) and F (1 × p) are the residuals. In the PLSR algorithm, T and U are constrained to have a maximum covariance as per following relation:

$${{{{{\bf{U}}}}}}={{{{{\bf{T}}}}}}{{{{{\bf{B}}}}}}+{{{{{\bf{H}}}}}}$$

(3)

where B (r × r) is a diagonal matrix of regression coefficients and H is a matrix of residuals.

Splicing factors with positive regression coefficients and a significant p-value (p < 0.05 after FDR correction) were considered critical regulators of EP events (CSF) PLSR was implemented using ‘pls’ package in R¹¹².

Mutation analysis of splicing factors

To assess the causal role of CSFs in the regulation of EP events, we obtained level 2 mutation data from TCGA cohorts of brain, liver, and kidney cancers (https://portal.gdc.cancer.gov/) using ‘maftools’ in R¹¹³ and identified the tumors which had nonsense or truncating mutations for these factors. For each mutated factor in each cancer type, we compared the median inclusion level of EP events in the mutant samples against the background set of samples that were not mutated for any of the splicing factors. Thus, the factors were classified into ‘increased’ or ‘decreased’ categories depending upon at least 5% increase or decrease in the median EP inclusion level. To account for the potential confounding effect of the differential expression of splicing factors between samples, we identified, for each mutant sample, a set of 10 non-mutant samples with similar splicing factor expression. Specifically, for each mutant sample, we identified 10 non-mutant samples with the shortest Euclidean distance to the mutant sample in terms of the gene expression of all splicing factors. For robustness, we discarded the splicing factors for which the background set of patients had a high variability (standard deviation > 0.1) in the median EP splicing across the 10 samples (Supplementary Fig. 4e).

Transcriptional regulators of splicing factors

To identify the potential transcriptional regulators of critical splicing factors (Fig. 6a), in each organ independently, we divided the splicing factors into two classes: namely, a foreground set comprising of the top 100 critical splicing factors, and a background set comprising the remaining splicing factors (nCSFs). To assess whether a TF was more likely to regulate CSFs compared to nCSFs, we used four complementary approaches (Fig. 6a). In the first step, we used the TFEA.ChIP library in R, which uses publicly available genome-wide binding datasets from ChIP-seq experiments⁵⁵. TFEA.ChIP used a Fisher’s test to assess if a specific TF’s binding is significantly enriched in the promoter regions (i.e., within 1 kb upstream of the transcription start site) of the CSFs relative to nCSFs (step 1 in Fig. 6a). TFs with an odds ratio > 2 and an FDR of 0.05 were considered putative regulators of CSFs. This first step was used as a strict filter for a TF to be further considered. To validate the ChIP-seq-based findings with the gene knockout/knockdown studies, we used the KnockTF database⁵⁶, which is a compendium of publicly available genome-wide transcriptional profiling following the deletion of TFs across multiple cell lines (step 2 in Fig. 6a). In this step, we obtained all the genes which were marked as downregulated based on a robust statistical analysis in the KnockTF database⁵⁶ following the deletion of a transcription factor and again assessed if CSFs were enriched as compared to nCSFs among the downregulated targets using a Fisher’s test. TFs with an FDR of <0.25 and a positive odds ratio in any of the cell lines were considered putative experimentally derived regulators of CSFs. Furthermore, because KnockTF has a poor coverage of TFs, we did not use this as a strict filter and instead used two additional computational approaches to infer the potential TFs: (i) We built a gene regulatory network for TFs shortlisted by ChIP-seq using the developmental time course data for relevant tissues and the ARACNe software⁵⁷ and assessed if the CSFs were enriched relative to nCSFs among the in silico derived targets of each TF using a Fisher’s test (step 3a in Fig. 6a). TFs with an odds ratio > 2 and an FDR < 0.2 were considered potential in silico derived regulators of CSFs. (ii) In parallel, we assessed the correlation of ChIP-seq shortlisted factors with CSFs and nCSFs in relevant cancer types (step 3b in Fig. 6a). The factors, with a correlation difference > 0.2 between CSFs and nCSFs were considered putative regulators. The ChIP-seq shortlisted factors, which either passed the KnockTF test OR passed both computational tests, were proposed as regulators of CSFs. In all the applicable cases, p-values were adjusted for multiple comparisons using the Benjamini-Hochberg procedure in R.

Analysis of shRNA data for HepG2 cell line

To investigate the effect of knocking down of CSF on the inclusion level of EP events, we used an shRNA knockdown data for RNA binding proteins in HepG2 (a liver cancer) cell line from ENCODE database⁴⁸. The dataset consisted of RNA-seq experiments following the knockdown of 223 RNA binding proteins, each with two biological replicates, and controls which were shared between different targets. The raw sequencing reads for the knockdown as well control experiments (26 controls with two replicates each) were downloaded and processed to quantify transcript/gene expression using Kallisto and exon inclusion using SUPPA2. Following a similar procedure as before (i.e., EP events in human tissues), the gene expression and splicing quantification in the control set of cell lines were used to train a PLSR model and learn the critical splicing factors of liver EP events in HepG2 cell line. We considered only those splicing factors in which shRNA knockdown resulted in at-least 50% reduction in their expression. For each RNA binding protein considered in this analysis, we calculated the proportion of EP events whose inclusion was consistently decreased across two biological replicates (ΔPSI < −0.1 after shRNA knockdown relative to the controls) and plotted the distribution of this proportions in critical and remaining splicing factors in HepG2 cell line.

CNV analysis

For each cancer type we obtained the level 4 CNV data from TCGA, which contained sample-specific information about the CNV profile of each gene (1 being CNV amplification, 0 being no CNVs, −1 being CNV deletion). To assess the CNVs of CSFs in each cancer type, we divided all samples into three quartiles based on the gene expression of each CSF. For each group of samples obtained in this way, we calculated the average CNV value for each CSF and compared these values for all CSFs between the quartiles using a Wilcoxon test.

Single cell validation

For single cell validation of prioritized transcription and splicing factors, we obtained GBM single-cell SMART-seq datasets from 20 adult GBM tumors¹¹⁴ from the Broad Institute Single Cell Portal (https://singlecell.broadinstitute.org/single_cell; Accession: SCP393). We also obtained normal brain single-cell SMART-seq and RNA-seq data and the annotations of cells from multiple cortical areas of the human brain from the Allen Brain atlas (2019 SMART-seq release, https://portal.brain-map.org/atlases-and-data/rnaseq)¹¹⁵. Oligodendrocytes, astrocytes, and oligodendrocyte progenitor cells were used as a normal reference to compute log-fold changes between malignant and normal cells. For liver cancer, LIHC single-cell RNA-seq data is 10X data sourced from a previous study¹¹⁶ and the read count matrices and annotations were downloaded from the GEO database (GSE125449). For healthy liver, read count matrices were obtained from the HumanLiver package¹¹⁷ (https://github.com/BaderLab/HumanLiver). Hepatocyte clusters (Hep 1−6) and cholangiocytes were used as a normal reference to compute log-fold changes between malignant and normal cells.

The activity of CSFs at the single-cell level was scored as a gene set using AUCell¹¹⁸, and the resulting activity scores were z-scored across all cells separately in each tissue. We used the batch ID of the samples as a covariate in this analysis to account for sequencing differences due to differing batches¹¹⁹. In each case, the cell type annotations and their uniform manifold approximation and projection (UMAP) coordinates were also downloaded from the respective source indicate above.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability

The public RNA-seq datasets for human cancers were generated by TCGA consortium (https://www.cancer.gov/tcga) and are publicly available from the ‘toilhub’ of UCSC-Xena browser¹⁰⁰ (UCSC-Xena-TCGA). The public RNA-seq datasets for healthy human tissues were generated by GTEx consortium and publicly available from the ‘toilhub’ of the UCSC-Xena browser¹⁰⁰ (UCSC-Xena-GTEx). The public mutation calls and copy number amplifications from whole exome sequencing data of human cancers are publicly available from TCGA genomics data commons portal (https://portal.gdc.cancer.gov/)¹²⁰. The public RNA-seq datasets spanning multiple stages during human organogenesis are publicly available and downloaded from array express (E-MTAB-6814)²⁹. The public clinical and survival data of cancer patients is publicly available and downloaded from Pan-Cancer Atlas initiative (TCGA-clinical)¹²¹. The public mapping of PFAM domains to hg38 assembly was performed by a previous study and the mapping coordinates are publicly available to download from the prot2hg database (http://www.prot2hg.com)¹⁰⁸. The public data for frequently mutated splicing factors with a significant evidence for their cancer driver gene activity is publicly available and downloaded from Table S1 of Seiler et al.⁵⁴. The public RNA-seq datasets for shRNA knockdown of splicing factors and corresponding controls for HepG2 cell line were downloaded from ENCODE database (ENCODE-shRNA-HepG2)⁴⁸. The public data for doubling time, RNA-seq, and genome-wide dependency score for cancer cell lines are publicly available and downloaded from the DepMap portal release 22Q2 (DepMaP)¹²². The public single-cell RNA-seq data for glioblastoma patients is publicly available and downloaded from the Single Cell Portal of the Broad Institute under the accession code SCP393 (sc-GBM)¹¹⁴. The public single-cell RNA-seq data for healthy brain samples is publicly available and downloaded from Allen Brain Atlas (sc-Brain)¹¹⁵. The public single-cell RNA-seq data for liver cancer is publicly available and downloaded from GEO database under the accession code GSE125449 (sc-LIHC)¹¹⁶. Single-cell RNA-seq data for healthy liver is publicly available and imported with HumanLiver package in R (sc-Liver)¹¹⁷. The public data for differentially expressed genes following the deletion of TFs across multiple cell lines is publicly available and downloaded from KnockTF database (KnockTF)⁵⁶. The public ChIP-seq datasets for the genome-wide binding of TFs across multiple model systems is publicly available and downloaded from the GitHub repository of the TFEA.ChIP library in R⁵⁵ (TFEA.ChIP). The public data for human phenotype ontology terms is publicly available and downloaded from The Jackson laboratory (HPO)¹²³. Gene and transcript coordinates for hg38 assembly were downloaded from Gencode (Gencode V23)¹²⁴. The remaining data generated in this study are provided with this paper as supplementary files and source data file. Source data are provided with this paper.

Code availability

All the codes used in collection, processing and analysis of datasets are available are deposited at GitHub (https://github.com/hannenhalli-lab/AltSplDevCancer/) and the corresponding DOI is as follows: https://doi.org/10.5281/zenodo.7325464¹²⁵.

References

Hanahan, D. & Weinberg, R. A. Hallmarks of cancer: the next generation. Cell. 144, 646–674 (2011).
Article CAS Google Scholar
Ben-Porath, I. et al. An embryonic stem cell-like gene expression signature in poorly differentiated aggressive human tumors. Nat. Genet. 40, 499–507 (2008).
Article CAS Google Scholar
Kelleher, F., Fennelly, D. & Rafferty, M. Common critical pathways in embryogenesis and cancer. Acta Oncol. 45, 375–388 (2006).
Article CAS Google Scholar
Patra, S. K. Roles of OCT4 in pathways of embryonic development and cancer progression. Mech. Ageing Dev. 189, 111286 (2020).
Article CAS Google Scholar
Kim, J. et al. A Myc network accounts for similarities between embryonic stem and cancer cell transcription programs. Cell 143, 313–324 (2010).
Article CAS Google Scholar
Gidekel, S., Pizov, G., Bergman, Y. & Pikarsky, E. Oct-3/4 is a dose-dependent oncogenic fate determinant. Cancer Cell 4, 361–370 (2003).
Article CAS Google Scholar
Rodriguez-Pinilla, S. M. et al. Sox2: a possible driver of the basal-like phenotype in sporadic breast cancer. Mod. Pathol. 20, 474–481 (2007).
Article CAS Google Scholar
Li, X. L. et al. Expression of the SRY-related HMG box protein SOX2 in human gastric carcinoma. Int. J. Oncol. 24, 257–263 (2004).
CAS Google Scholar
Malta, T. M. et al. Machine learning identifies stemness features associated with oncogenic dedifferentiation. Cell 173, 338–354 (2018).
Article CAS Google Scholar
Dreesen, O. & Brivanlou, A. H. Signaling pathways in cancer and embryonic stem cells. Stem Cell Rev. 3, 7–17 (2007).
Article CAS Google Scholar
Sanchez-Vega, F. et al. Oncogenic signaling pathways in the cancer genome atlas. Cell 173, 321–337 (2018).
Article CAS Google Scholar
Pan, Q., Shai, O., Lee, L. J., Frey, B. J. & Blencowe, B. J. Deep surveying of alternative splicing complexity in the human transcriptome by high-throughput sequencing. Nat. Genet. 40, 1413–1415 (2008).
Article CAS Google Scholar
Wang, E. T. et al. Alternative isoform regulation in human tissue transcriptomes. Nature 456, 470–476 (2008).
Article ADS CAS Google Scholar
Agosto, L. M. & Lynch, K. W. Alternative pre-mRNA splicing switch controls hESC pluripotency and differentiation. Genes Dev. 32, 17–18 (2018).
Article Google Scholar
Baralle, F. E. & Giudice, J. Alternative splicing as a regulator of development and tissue identity. Nat. Rev. Mol. Cell Biol. 18, 437–451 (2017).
Article CAS Google Scholar
Han, H. et al. MBNL proteins repress ES-cell-specific alternative splicing and reprogramming. Nature 498, 241–245 (2013).
Article ADS CAS Google Scholar
Salomonis, N. et al. Alternative splicing in the differentiation of human embryonic stem cells into cardiac precursors. PLoS Comput. Biol. 5, e1000553 (2009).
Article Google Scholar
Bonnal, S. C., López-Oreja, I. & Valcárcel, J. Roles and mechanisms of alternative splicing in cancer — implications for care. Nat. Rev. Clin. Oncol. 17, 457–474 (2020).
Article Google Scholar
Takehara, T., Liu, X., Fujimoto, J., Friedman, S. L. & Takahashi, H. Expression and role of Bcl-xL in human hepatocellular carcinomas. Hepatology 34, 55–61 (2001).
Article CAS Google Scholar
Xerri, L. et al. Predominant expression of the long isoform of Bcl-x (Bcl-xL) in human lymphomas. Br. J. Haematol. 92, 900–906 (1996).
Article CAS Google Scholar
Yan, G., Fukabori, Y., McBride, G., Nikolaropolous, S. & McKeehan, W. L. Exon switching and activation of stromal and embryonic fibroblast growth factor (FGF)-FGF receptor genes in prostate epithelial cells accompany stromal independence and malignancy. Mol. Cell. Biol. 13, 4513–4522 (1993).
CAS Google Scholar
Warzecha, C. C., Sato, T. K., Nabet, B., Hogenesch, J. B. & Carstens, R. P. ESRP1 and ESRP2 are epithelial cell-type-specific regulators of FGFR2 splicing. Mol. Cell 33, 591–506 (2009).
Article CAS Google Scholar
Matsuda, Y., Hagio, M., Seya, T. & Ishiwata, T. Fibroblast growth factor receptor 2 IIIc as a therapeutic target for colorectal cancer cells. Mol. Cancer Ther. 11, 2010–2020 (2012).
Article CAS Google Scholar
Sonvilla, G. et al. Fibroblast growth factor receptor 3-IIIc mediates colorectal cancer growth and migration. Br. J. Cancer 102, 1145–1156 (2010).
Article CAS Google Scholar
Cha, J. Y., Lambert, Q. T., Reuther, G. W. & Der, C. J. Involvement of fibroblast growth factor receptor 2 isoform switching in mammary oncogenesis. Mol. Cancer Res. 6, 435–445 (2008).
Article CAS Google Scholar
Pokorná, Z., Vysloužil, J., Hrabal, V., Vojtěšek, B. & Coates, P. J. The foggy world(s) of p63 isoform regulation in normal cells and cancer. J. Pathol. 254, 454–473 (2021).
Article Google Scholar
David, C. J. & Manley, J. L. Alternative pre-mRNA splicing regulation in cancer: Pathways and programs unhinged. Genes Dev. 24, 2343–2364 (2010).
Article CAS Google Scholar
Mazin, P. V., Khaitovich, P., Cardoso-Moreira, M. & Kaessmann, H. Alternative splicing during mammalian organ development. Nat. Genet. 53, 925–934 (2021).
Article CAS Google Scholar
Cardoso-Moreira, M. et al. Gene expression across mammalian organ development. Nature. 571, 505–509 (2019).
Article CAS Google Scholar
Kanehisa, M., Goto, S., Sato, Y., Furumichi, M. & Tanabe, M. KEGG for integration and interpretation of large-scale molecular data sets. Nucleic Acids Res. 40, D109–D114 (2012).
Article CAS Google Scholar
Phillips, J. W. et al. Pathway-guided analysis identifies Myc-dependent alternative pre-mRNA splicing in aggressive prostate cancers. Proc. Natl Acad. Sci. USA 117, 5269–5279 (2020).
Article ADS CAS Google Scholar
Paronetto, M. P., Passacantilli, I. & Sette, C. Alternative splicing and cell survival: from tissue homeostasis to disease. Cell Death Differ. 23, 1919–1929 (2016).
Article CAS Google Scholar
Tsai, Y. S., Dominguez, D., Gomez, S. M. & Wang, Z. Transcriptome-wide identification and study of cancer-specific splicing events across multiple tumors. Oncotarget 6, 6825–6839 (2015).
Article Google Scholar
Boudhraa, Z., Carmona, E., Provencher, D. & Mes-Masson, A. M. Ran GTPase: A Key Player in Tumor Progression and Metastasis. Front. Cell Dev. Biol. 8, 345 (2020).
Article Google Scholar
Brayford, S., Schevzov, G., Vos, J. & Gunning, P. The role of the actin cytoskeleton in cancer and its potential use as a therapeutic target. in The Cytoskeleton in Health and Disease. https://doi.org/10.1007/978-1-4939-2904-7_16 (2015).
El-Gebali, S. et al. The Pfam protein families database in 2019. Nucleic Acids Res. 47, D427–D432 (2019).
Article CAS Google Scholar
Pashkova, N. et al. WD40 repeat propellers define a ubiquitin-binding domain that regulates turnover of F box proteins. Mol. Cell 40, 433–443 (2010).
Article CAS Google Scholar
Pinho, S. S. & Reis, C. A. Glycosylation in cancer: Mechanisms and clinical implications. Nat. Rev. Cancer 15, 540–555 (2015).
Article CAS Google Scholar
Trikha, M., Cai, Y., Grignon, D. & Honn, K. V. Identification of a novel truncated αIIb integrin. Cancer Res. 58, 4771–4775 (1998).
CAS Google Scholar
Jin, R., Trikha, M., Cai, Y., Grignon, D. & Honn, K. V. A naturally occurring truncated β3 integrin in tumor cells: native anti-integrin involved in tumor cell motility. Cancer Biol. Ther. 6, 1559–1568 (2007).
Article CAS Google Scholar
Lin, L. et al. RhoA inactivation by S-nitrosylation regulates vascular smooth muscle contractive signaling. Nitric Oxide 74, 56–64 (2018).
Article CAS Google Scholar
Raines, K. W., Bonini, M. G. & Campbell, S. L. Nitric oxide cell signaling: S-nitrosation of Ras superfamily GTPases. Cardiovasc. Res. 75, 229–239 (2007).
Article CAS Google Scholar
Miyamoto, S. et al. Aberrant alternative splicing of RHOA is associated with loss of its expression and activity in diffuse-type gastric carcinoma cells. Biochem. Biophys. Res. Commun. 495, 1942–1947 (2018).
Article CAS Google Scholar
Chi, X., Wang, S., Huang, Y., Stamnes, M. & Chen, J. L. Roles of Rho GTPases in intracellular transport and cellular transformation. Int. J. Mol. Sci. 14, 7089–7108 (2013).
Article CAS Google Scholar
Suda, Y., Kurokawa, K. & Nakano, A. Regulation of ER-Golgi transport dynamics by GTPases in budding yeast. Front. Cell Dev. Biol. 5, 122 (2018).
Article Google Scholar
Iyer, A. K. V., Azad, N., Wang, L. & Rojanasakul, Y. Role of S-nitrosylation in apoptosis resistance and carcinogenesis. Nitric Oxide 19, 146–151 (2008).
Article CAS Google Scholar
Blanchette, M., Green, R. E., Brenner, S. E. & Rio, D. C. Global analysis of positive and negative pre-mRNA splicing regulators in Drosophila. Genes Dev. 19, 1306–1314 (2005).
Article CAS Google Scholar
Van Nostrand, E. L. et al. A large-scale binding and functional map of human RNA-binding proteins. Nature 583, (711–719 (2020).
Chen, W. et al. Expression of CDC5L is associated with tumor progression in gliomas. Tumor Biol. 37, 4093–4103 (2016).
Article CAS Google Scholar
Han, W. et al. RNA-binding protein PCBP2 modulates glioma growth by regulating FHL3. J. Clin. Investig. 123, 2103–2118 (2013).
Article CAS Google Scholar
Zhang, Z. et al. Depletion of CDC5L inhibits bladder cancer tumorigenesis. J. Cancer 11, 353–363 (2020).
Article CAS Google Scholar
Zhou, Z. et al. The biological function and clinical significance of SF3B1 mutations in cancer. Biomarker Res. 8, 38 (2020).
Article Google Scholar
Maji, D. et al. Representative cancer-associated U2AF2 mutations alter RNA interactions and splicing. J. Biol. Chem. 295, 17148–17157 (2020).
Article CAS Google Scholar
Seiler, M. et al. Somatic Mutational landscape of splicing factor genes and their functional consequences across 33 cancer types. Cell Rep. 23, 282–296 (2018).
Article CAS Google Scholar
Puente-Santamaria, L. & Wasserman, W. W. & Del Peso, L. TFEA.ChIP: A tool kit for transcription factor binding site enrichment analysis capitalizing on ChIP-seq datasets. Bioinformatics 35, 5339–5340 (2019).
Article CAS Google Scholar
Feng, C. et al. KnockTF: a comprehensive human gene expression profile database with knockdown/knockout of transcription factors. Nucleic Acids Res. 48, D93–D100 (2020).
Article CAS Google Scholar
Margolin, A. A. et al. ARACNE: an algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context. BMC Bioinf. 7(Suppl 1), S7 (2006).
Article MathSciNet Google Scholar
Zhang, Y. et al. Pan-cancer analysis of clinical relevance of alternative splicing events in 31 human cancers. Oncogene 38, 6678–6695 (2019).
Article CAS Google Scholar
Calabrese, C. et al. Genomic basis for RNA alterations in cancer. Nature 578, 129–136 (2020).
Article ADS CAS Google Scholar
Tian, J. et al. CancerSplicingQTL: a database for genome-wide identification of splicing QTLs in human cancer. Nucleic Acids Res. 47, D909–D916 (2019).
Article CAS Google Scholar
Kahles, A. et al. Comprehensive analysis of alternative splicing across tumors from 8,705 patients. Cancer Cell 34, 211–224.e6 (2018).
Article CAS Google Scholar
Bland, C. S. et al. Global regulation of alternative splicing during myogenic differentiation. Nucleic Acids Res. 38, 7651–7664 (2010).
Article ADS CAS Google Scholar
Yamamoto, M. L. et al. Alternative pre-mRNA splicing switches modulate gene expression in late erythropoiesis. Blood 113, 3363–3370 (2009).
Article CAS Google Scholar
Warzecha, C. C. et al. An ESRP-regulated splicing programme is abrogated during the epithelial-mesenchymal transition. EMBO J. 29, 3286–3300 (2010).
Article CAS Google Scholar
Moore, M. J., Wang, Q., Kennedy, C. J. & Silver, P. A. An alternative splicing network links cell-cycle control to apoptosis. Cell 142, 625–636 (2010).
Article CAS Google Scholar
Hartmann, B. et al. Global analysis of alternative splicing regulation by insulin and wingless signaling in Drosophila cells. Genome Biol. 10, R11 (2009).
Article Google Scholar
Liu, Y. et al. Impact of alternative splicing on the human proteome. Cell Rep. 20, 1229–1241 (2017).
Article CAS Google Scholar
Bush, S. J., Chen, L., Tovar-Corona, J. M. & Urrutia, A. O. Alternative splicing and the evolution of phenotypic novelty. Philos. Trans. R. Soc. B Biol. Sci. 372, 20150474 (2017).
Article Google Scholar
Jayasinghe, R. G. et al. Systematic analysis of splice-site-creating mutations in cancer. Cell Rep. 23, 270–281.e3 (2018).
Article CAS Google Scholar
Kelemen, O. et al. Function of alternative splicing. Gene 514, 1–30 (2013).
Article CAS Google Scholar
Bunnell, T. M., Burbach, B. J., Shimizu, Y. & Ervasti, J. M. β-Actin specifically controls cell growth, migration, and the G-actin pool. Mol. Biol. Cell 22, 4047–4058 (2011).
Article CAS Google Scholar
Collins, E. S., Balchand, S. K., Faraci, J. L., Wadsworth, P. & Lee, W. L. Cell cycle-regulated cortical dynein/dynactin promotes symmetric cell division by differential pole motion in anaphase. Mol. Biol. Cell 23, 3380–3390 (2012).
Article CAS Google Scholar
Forth, S. & Kapoor, T. M. The mechanics of microtubule networks in cell division. J. Cell Biol. 216, 1525–1531 (2017).
Article CAS Google Scholar
Hawkins, T., Mirigian, M., Selcuk Yasar, M. & Ross, J. L. Mechanics of microtubules. J. Biomech. 43, 23–30 (2010).
Article Google Scholar
McNeill, M. C. et al. Nuclear actin regulates cell proliferation and migration via inhibition of SRF and TEAD. Biochim. Biophys. Acta Mol. Cell Res. 1867, 118691 (2020).
Article CAS Google Scholar
Callan-Jones, A. C. & Voituriez, R. Actin flows in cell migration: From locomotion and polarity to trajectories. Curr. Opin. Cell Biol. 38, 12–17 (2016).
Article CAS Google Scholar
Seetharaman, S. & Etienne-Manneville, S. Cytoskeletal Crosstalk in Cell Migration. Trends Cell Biol. 30, 720–735 (2020).
Article CAS Google Scholar
Svitkina, T. The actin cytoskeleton and actin-based motility. Cold Spring Harb. Perspect. Biol. 10, a018267 (2018).
Article Google Scholar
Buda, A. & Pignatelli, M. E-cadherin and the cytoskeletal network in colorectal cancer development and metastasis. Cell Commun. Adhes. 18, 133–143 (2011).
Article CAS Google Scholar
Liu, S. & Altman, R. B. Large scale study of protein domain distribution in the context of alternative splicing. Nucleic Acids Res. 31, 4828–4835 (2003).
Article CAS Google Scholar
Ogawa, T. et al. APAF-1-ALT, a novel alternative splicing form of APAF-1, potentially causes impeded ability of undergoing DNA damage-induced apoptosis in the LNCaP human prostate cancer cell line. Biochem. Biophys. Res. Commun. 306, 537–543 (2003).
Article CAS Google Scholar
Vlachos, A., Rosenberg, P. S., Atsidaftos, E., Alter, B. P. & Lipton, J. M. Incidence of neoplasia in Diamond Blackfan anemia: A report from the Diamond Blackfan anemia registry. Blood 119, 3815–3819 (2012).
Article CAS Google Scholar
Lipiński, P., Styś, A. & Starzyński, R. R. Molecular insights into the regulation of iron metabolism during the prenatal and early postnatal periods. Cell. Mol. Life Sci. 70, 23–38 (2013).
Article Google Scholar
Termini, C. M. & Gillette, J. M. Tetraspanins function as regulators of cellular signaling. Front. Cell Dev. Biol. 5, 34 (2017).
Article Google Scholar
Yáñez-Mó, M., Barreiro, O., Gordon-Alonso, M., Sala-Valdés, M. & Sánchez-Madrid, F. Tetraspanin-enriched microdomains: a functional unit in cell plasma membranes. Trends Cell Biol. 19, 434–446 (2009).
Article Google Scholar
Marie, K. L. et al. Melanoblast transcriptome analysis reveals pathways promoting melanoma metastasis. Nat. Commun. 11, 333 (2020).
Article ADS CAS Google Scholar
Couturier, C. P. et al. Single-cell RNA-seq reveals that glioblastoma recapitulates a normal neurodevelopmental hierarchy. Nat. Commun. 11, 3406 (2020).
Article ADS CAS Google Scholar
Curry, R. N. & Glasgow, S. M. The role of neurodevelopmental pathways in brain tumors. Front. Cell Dev. Biol. 9, 659055 (2021).
Article Google Scholar
Lareau, L. F. & Brenner, S. E. Regulation of splicing factors by alternative splicing and NMD is conserved between kingdoms yet evolutionarily flexible. Mol. Biol. Evol. 32, 1072–1079 (2015).
Article CAS Google Scholar
Park, S. H. et al. Differential functions of splicing factors in mammary transformation and breast cancer metastasis. Cell Rep. 29, 2672–2688.e7 (2019).
Article CAS Google Scholar
Urbanski, L. MYC regulates a pan-cancer network of co-expressed oncogenic splicing factors. Cell Rep. 41, 111704 (2022).
Article CAS Google Scholar
Maria, L., Golson Klaus, H. & Kaestner Fox transcription factors: from development to disease. Development 143, 4558–4570 (2016).
Article Google Scholar
Dang, C. V. (2013) MYC Metabolism Cell Growth and Tumorigenesis. Cold Spring Harbor Perspectives in Medicine 3, a014217 (2013).
Article Google Scholar
Lee, T. I. et al. Transcriptional regulatory networks in Saccharomyces cerevisiae. Science 298, 799–804 (2002).
Article ADS CAS Google Scholar
Thomas, J. D. et al. RNA isoform screens uncover the essentiality and tumor-suppressor activity of ultraconserved poison exons. Nat. Genet. 52, 84–94 (2020).
Article CAS Google Scholar
Gonatopoulos-Pournatzis, T. et al. Genetic interaction mapping and exon-resolution functional genomics with a hybrid Cas9–Cas12a platform. Nat. Biotechnol. 38, 638–648 (2020).
Article CAS Google Scholar
Carithers, L. J. & Moore, H. M. The genotype-tissue expression (GTEx) project. Biopreserv. Biobank. 13, 307–308 (2015).
Article Google Scholar
Wang, S. & Liu, X. The UCSCXenaTools R package: a toolkit for accessing genomics data from UCSC Xena platform, from cancer multi-omics to single-cell RNA-seq. J. Open Source Softw. 4, 1627 (2019).
Article ADS Google Scholar
Goldman, M. J. et al. Visualizing and interpreting cancer genomics data via the Xena platform. Nat. Biotechnol. 38, 675–678 (2020).
Article CAS Google Scholar
Vivian, J. et al. Toil enables reproducible, open source, big biomedical data analyses. Nat. Biotechnol. 35, 314–316 (2017).
Article CAS Google Scholar
Bray, N. L., Pimentel, H., Melsted, P. & Pachter, L. Near-optimal probabilistic RNA-seq quantification. Nat. Biotechnol. 34, 525–527 (2016).
Article CAS Google Scholar
Melsted, P. et al. Modular, efficient and constant-memory single-cell RNA-seq preprocessing. Nat Biotechnol. 39, 813–818 (2021).
Article CAS Google Scholar
Schaarschmidt, S., Fischer, A., Zuther, E. & Hincha, D. K. Evaluation of seven different rna-seq alignment tools based on experimental data from the model plant arabidopsis thaliana. Int. J. Mol. Sci. 21, 1720 (2020).
Article CAS Google Scholar
Everaert, C. et al. Benchmarking of RNA-sequencing analysis workflows using whole-transcriptome RT-qPCR expression data. Sci. Rep. 7, 1559 (2017).
Article ADS Google Scholar
Trincado, J. L. et al. SUPPA2: Fast, accurate, and uncertainty-aware differential splicing analysis across multiple conditions. Genome Biol. 19, 40 (2018).
Article Google Scholar
Dobin, A. et al. STAR: Ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013).
Article CAS Google Scholar
Shen, S. et al. rMATS: Robust and flexible detection of differential alternative splicing from replicate RNA-Seq data. Proc. Natl. Acad. Sci. USA 111, E5593–E5601 (2014).
Article CAS Google Scholar
Stanek, D. et al. Prot2HG: A database of protein domains mapped to the human genome. Database 2020, baz161 (2020).
Article CAS Google Scholar
Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).
Article CAS Google Scholar
Wang, Z. & Burge, C. B. Splicing regulation: from a parts list of regulatory elements to an integrated splicing code. RNA 14, 802–813 (2008).
Article CAS Google Scholar
Carbon, S. et al. AmiGO: Online access to ontology and annotation data. Bioinformatics 25, 288–289 (2009).
Article CAS Google Scholar
Mevik, B.-H., Wehrens, R.The pls Package: Principal Component and Partial Least Squares Regression in R. Journal of Statistical Software. (2007).
Mayakonda, A., Lin, D. C., Assenov, Y., Plass, C. & Koeffler, H. P. Maftools: Efficient and comprehensive analysis of somatic variants in cancer. Genome Res. 28, 1747–1756 (2018).
Article CAS Google Scholar
Neftel, C. et al. An integrative model of cellular states, plasticity, and genetics for glioblastoma. Cell 178, 835–849.e21 (2019).
Article CAS Google Scholar
Boldog, E. et al. Transcriptomic and morphophysiological evidence for a specialized human cortical GABAergic cell type. Nat. Neurosci. 21, 1185–1195 (2018).
Article CAS Google Scholar
Ma, L. et al. Tumor cell biodiversity drives microenvironmental reprogramming in liver cancer. Cancer Cell 36, 418–430.e6 (2019).
Article CAS Google Scholar
MacParland, S. A. et al. Single cell RNA sequencing of human liver reveals distinct intrahepatic macrophage populations. Nat. Commun. 9, 153–167 (2018).
Article Google Scholar
Aibar, S. et al. SCENIC: Single-cell regulatory network inference and clustering. Nat. Methods 14, 1083–1086 (2017).
Article CAS Google Scholar
Ntranos, V., Yi, L., Melsted, P. & Pachter, L. A discriminative learning approach to differential expression analysis for single-cell RNA-seq. Nat. Methods 16, 163–166 (2019).
Article CAS Google Scholar
Grossman, R. L. et al. Toward a shared vision for cancer genomic data. N. Engl. J. Med. 375, 1109–1112 (2016).
Article Google Scholar
Weinstein, J. N. et al. The cancer genome atlas pan-cancer analysis project. Nat. Genet. 45, 1113–1120 (2013).
Article Google Scholar
Tsherniak, A. et al. Defining a cancer dependency map. Cell 170, 564–576 (2017).
Article CAS Google Scholar
Köhler, S. et al. The human phenotype ontology in 2021. Nucleic Acids Res. 49, D1207–D1217 (2021).
Article Google Scholar
Frankish, A. et al. GENCODE reference annotation for the human and mouse genomes. Nucleic Acids Res. 47, D766–D773 (2019).
Article CAS Google Scholar
Singh, A. et al. Broad misappropriation of developmental splicing profile by cancer in multiple organs. GitHub. https://doi.org/10.5281/zenodo.7325464 (2022).

Download references

Acknowledgements

This work is supported by the Intramural Research Program of the National Cancer Institute, Center for Cancer Research, NIH, and utilized the computational resources of the NIH HPC Biowulf cluster. We would like to thank Stephan Muljo and Thomas Gonatopoulos-Pournatzis for feedback on the manuscript.

Funding

Open Access funding provided by the National Institutes of Health (NIH).

Author information

These authors contributed equally: Arati Rajeevan, Vishaka Gopalan.

Authors and Affiliations

Cancer Data Science Laboratory, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
Arashdeep Singh, Arati Rajeevan, Vishaka Gopalan, Piyush Agrawal & Sridhar Hannenhalli
Laboratory of Cancer Biology and Genetics National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
Chi-Ping Day

Authors

Arashdeep Singh
View author publications
You can also search for this author in PubMed Google Scholar
Arati Rajeevan
View author publications
You can also search for this author in PubMed Google Scholar
Vishaka Gopalan
View author publications
You can also search for this author in PubMed Google Scholar
Piyush Agrawal
View author publications
You can also search for this author in PubMed Google Scholar
Chi-Ping Day
View author publications
You can also search for this author in PubMed Google Scholar
Sridhar Hannenhalli
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

A.S.: Data curation, study design, software, formal analysis, investigation, visualization, methodology, writing–original draft, writing–review and editing. A.R., V.G., P.A.: Data curation, formal analysis, software, writing–review and editing. C.P.D.: Data curation, formal analysis, writing–review and editing. S.H.: Conceptualization, study design, supervision, funding acquisition, investigation, visualization, methodology, writing–original draft, project administration, writing–review and editing.

Corresponding authors

Correspondence to Arashdeep Singh or Sridhar Hannenhalli.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Communications thanks the anonymous reviewers for their contribution to the peer review of this work. Peer reviewer reports are available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Peer Review File

Description of Additional Supplementary Files

Supplementary Data 1

Supplementary Data 2

Supplementary Data 3

Supplementary Data 4

Supplementary Data 5

Supplementary Data 6

Supplementary Data 7

Supplementary Data 8

Reporting Summary

Source data

Source Data

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Singh, A., Rajeevan, A., Gopalan, V. et al. Broad misappropriation of developmental splicing profile by cancer in multiple organs. Nat Commun 13, 7664 (2022). https://doi.org/10.1038/s41467-022-35322-1

Download citation

Received: 18 December 2021
Accepted: 29 November 2022
Published: 12 December 2022
DOI: https://doi.org/10.1038/s41467-022-35322-1

This article is cited by

Comparison of the Ways in Which Nitidine Chloride and Bufalin Induce Programmed Cell Death in Hematological Tumor Cells
- Zejie Su
- Man Luo
- Hai Lan
Applied Biochemistry and Biotechnology (2023)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.