Public archives of next-generation sequencing data are growing exponentially, but the difficulty of marshaling this data has led to its underutilization by scientists. Here, we present ASCOT, a resource that uses annotation-free methods to rapidly analyze and visualize splice variants across tens of thousands of bulk and single-cell data sets in the public archive. To demonstrate the utility of ASCOT, we identify novel cell type-specific alternative exons across the nervous system and leverage ENCODE and GTEx data sets to study the unique splicing of photoreceptors. We find that PTBP1 knockdown and MSI1 and PCBP2 overexpression are sufficient to activate many photoreceptor-specific exons in HepG2 liver cancer cells. This work demonstrates how large-scale analysis of public RNA-Seq data sets can yield key insights into cell type-specific control of RNA splicing and underscores the importance of considering both annotated and unannotated splicing events.
RNA-Seq is a powerful tool for studying gene expression, alternative splicing, and post-transcriptional regulation. Its utility has made it one of the most common experimental data types stored in the Sequence Read Archive1 and other related international archives2. However, public archives store raw, unprocessed data. Drawing new conclusions from many raw RNA-Seq data sets requires a level of computational power and expertise that is out of reach for most labs. Likewise, the need to analyze this data from scratch leads to unnecessary duplications of effort across research groups3,4. To address this, we previously developed a bioinformatics pipeline (Rail-RNA)5,6 and created the recount2 (ref. 7) resource and accompanying Snaptron8 query engine. Together, these allow researchers to query publicly available RNA-Seq data in a standardized and reproducible manner. In this work we focus on the alternative splicing use case for RNA-Seq data.
Alternative splicing of pre-mRNA (RNA splicing) is a highly regulated process that generates extensive transcriptomic and proteomic diversity across all cell types. RNA splicing is governed by both cis-regulatory elements (specific sequences in the pre-mRNA that influence the strength of a splice site) and trans-acting splicing factors (RNA-binding proteins that can act as either splicing enhancers or repressors). RNA-Seq has accelerated our understanding of how alternative splicing networks are coordinated, in part through the meta-analysis of RNA-Seq data gathered from many independent experiments9,10. Numerous algorithms for alternative splicing analysis have been developed11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27, including several recent studies that propose useful models for studying complex splicing patterns in RNA-Seq data11,13,25. However, there is a need for new methods that can summarize alternative splicing across thousands of public data sets in a unified manner, without relying on prior transcript annotation28,29.
Our work aims to make alternative splicing analysis of public RNA-Seq data accessible to the general researcher by reducing computational barriers to entry. We have developed alternative splicing catalog of the transcriptome (ASCOT), a resource that allows users to query alternative splicing and gene expression across a wide range of cell types and tissues from mouse and human. ASCOT uses an annotation-free method to quickly identify splice-variants in large-scale databases of splice junction counts derived from the public archive8. ASCOT performs a rapid and computationally inexpensive “junction-walking” strategy to calculate the percent spliced-in (PSI) ratio for a given exon, whereby inclusion and exclusion junctions are predicted using only counts from a splice junction database (Supplementary Fig. 1). ASCOT focuses on identifying binary splicing decisions, as these represent the majority of alternative splicing events (Supplementary Fig. 2). Although it is possible to capture more complex splicing variation with nested decision trees, here we focus on four easily interpretable and binary splicing patterns: cassette exons, alternative splice site exon groups, linked exons, and mutually exclusive exons. This exon-centric approach can rapidly capture much of the alternative splicing in the transcriptome, while simultaneously calculating each exon’s PSI across thousands of indexed data sets.
We then used ASCOT to analyze data sets from a manually curated list of purified mouse cell types (732 run accessions) in the Sequence Read Archive (SRA), tissue data sets from the human Gene Tissue Expression Consortium30 (GTEx – 9,662 run accessions), shRNA-Seq data sets from the ENCODE Project10,31,32 (1,159 run accessions), 43 single-cell studies (33,303 cells) in human and mouse including the Allen Brain Institute adult mouse primary visual cortex study33, and over 50,000 other human RNA-Seq run accessions from the SRA as generated for the recount2 database. To demonstrate the utility of our work, we used ASCOT to characterize the cell type-specific splicing patterns of rod photoreceptors.
The vertebrate nervous system derives much of its transcriptomic and proteomic diversity from highly specific alternative splicing patterns that are not present elsewhere in the body34. Many neuronal subtypes, such as rod photoreceptors, also exhibit alternative exons that are only detected in that specific cell type35,36,37,38. Photoreceptors are cells within the retina that sense light and transduce this information for the brain. These sensory neurons are unique in terms of morphology, metabolism, and function — characteristics that may require specialized alternative exons35,39,40,41,42,43,44,45,46,47. Photoreceptor degeneration is the main cause of hereditary blindness in the developed world. While some forms of vision loss can be successfully managed with therapies such as angiogenesis inhibitors, prosthetic devices, or tissue transplantation, few treatments exist for blindness that is directly caused by photoreceptor degeneration. Understanding how photoreceptor-specific splicing patterns emerge may facilitate development of cell-based regenerative strategies for treating photoreceptor dystrophies.
Identification of cell type-specific alternative exons
We first tested if we could use ASCOT to identify neuron-specific splicing patterns (Fig. 1). Publicly available RNA-Seq data sets from mouse cell types across the body were manually curated from the SRA and incorporated into ASCOT as a data compilation called MESA: mouse expression and splicing atlas. All data is openly available at http://ascot.cs.jhu.edu/. These cell types were isolated by different research groups using fluorescence-activated cell sorting (FACS) or affinity purification. As expected, we identified many exons that were highly utilized (high PSI) in neurons but skipped by other cell types (Fig. 1a). We also identified exons exhibiting the opposite pattern, having high PSI across most cell types but low PSI in neuronal cell types. Exons enriched in neurons could be further categorized based on their use in muscles and/or pancreatic islet cells. Finally, an analysis of NRL-positive rod photoreceptors48, profiled at several timepoints from postnatal day 2 (P2) to P28, revealed that rods utilize only a subset of pan-neuronal exons, and exclude many other exons that have high PSI across other neuronal subtypes. This is consistent with the observation that rods do not express many common neuronal splicing factors35 (Supplementary Fig. 3). Next, we tested whether we could identify alternative exons utilized only by a single brain cell type, despite near ubiquitous expression of the associated gene. We found many examples of cell type-specific exons, of which ~70% (168/239) were entirely unannotated in GENCODE release M20 (Supplementary Data 1, RT-PCR validation in Supplementary Fig. 4). For instance, an exon in Sptan1 is only used by cochlear hair cells (Fig. 1b), an exon in Cnih1 is selectively used by excitatory pyramidal neurons (Fig. 1c), and an exon in Exoc6b is selectively used in oligodendrocytes (Fig. 1d).
Photoreceptor-specific exons shared between mouse and human
We next sought to cross-validate our mouse cell type results with RNA-Seq data from human tissue. The Genotype-Tissue Expression (GTEx) project is a public archive of 9,662 human RNA-Seq samples across 53 tissues, although it is missing retinal tissue. We therefore analyzed GTEx data sets, supplemented with RNA-Seq data from peripheral retina49, and identified tissue-specific alternative exons (Supplementary Data 2). We identified ~104 exons that are selectively utilized in human retina, compared to all other GTEx human tissues (Fig. 2a, b). In the mouse genome, we identified ~88 exons that were enriched in rod photoreceptors, compared to all other mouse cell types in MESA. Cross referencing human retina-specific exons and mouse photoreceptor-specific exons revealed only 31 splicing events (found in 28 genes) that were common between both species. These 28 genes generally fall under pathways of cilia formation, neuronal connectivity and various metabolic pathways. Likewise, mutations in several genes are linked to retinitis pigmentosa or intellectual disability, underscoring their functional importance (Fig. 2c). Among the 31 rod-specific splicing events, 17/31 have been previously identified while 14/31 have not been reported in the literature (Supplementary Fig. 5). Comparison of results between mouse and human is important since there can be significant variation in splicing specificity between species. For example, an exon in mouse Cep290 is only utilized by photoreceptors, but the alternative exon in human CEP290 is constitutively spliced across all human tissues.
Cross-validation of splicing analysis using single-cell data
To further test the sensitivity of our method, we incorporated single-cell RNA-Seq data sets generated using full-length library strategies (e.g. SmartSeq, Fluidigm) into ASCOT as a compilation called CellTower and analyzed the PSI tables for cell type-specific splicing patterns. Droplet-based strategies that sequence short sequences from the polyA tail (e.g. DropSeq, 10x Genomics) are useful for gene-level quantification, but are unable to capture most alternative splicing events. By contrast, single-cell protocols that capture sequences across the full transcript can analyze splicing, given sufficient read depth. The Allen Brain Institute recently generated extremely high coverage single-cell data sets from adult mouse primary visual cortex33 and clustered cells into 49 types (19 glutamatergic, 23 GABAergic, and 7 non-neuronal). Our analysis identified many alternative exons that showed not only differential usage between glutamatergic, GABAergic, and non-neuronal cell types, but also high variation within each broad grouping (Supplementary Fig. 6a). We were also able to identify mutually exclusive exons that varied among cell types, including those previously analyzed33 (Supplementary Fig. 6b). Having validated our approach using single-cell RNA-Seq data, we then analyzed a data set containing both retinal progenitor cells and immature postmitotic precursor cells50 from embryonic days E14, E18 or P2 that were profiled using Smart-Seq. We found that rod-specific exons in Atp1b2 and Ttc8 were detectable at low levels in early photoreceptor precursors, but not in retinal progenitors or postmitotic precursors of other retinal cell types (Supplementary Fig. 6c). Lastly, we confirmed that cell type-specific alternative exons in Sptan1, Cnih1, and Exoc6b (Fig. 1b–d) exhibited the same specificity in CellTower (Supplementary Fig. 6d).
Using gene expression to identify candidate splicing factors
What are the splicing factors that mediate rod-specific splicing patterns identified in MESA (Fig. 3a)? Although rods do not express many of the RNA-binding proteins (RBPs) thought to be involved in regulating alternative splicing in neurons35 (Supplementary Fig. 3), they do show similar relative expression levels of Polypyrimidine tract-binding protein 1 (Ptbp1) and its paralog Ptbp2 (Fig. 3b). High levels of Ptbp1 repress many exons, and downregulation of Ptbp1 accompanied by an upregulation of Ptbp2 is an important prerequisite for neuronal splicing36,37,51,52,53,54,55,56,57,58,59,60. We hypothesized that certain RBPs, acting as splicing enhancers, could be selectively expressed in rods to mediate rod-specific splicing. We defined a list of putative splicing factors by identifying genes with RNA-binding domains, as determined by RBPDB61, in the InterPro database. Overlapping rod-enriched genes with putative splicing factors revealed two top candidates, Musashi RNA-binding protein 1 (Msi1) and Poly(rC)-binding protein 2 (Pcbp2), that were expressed at much higher levels in rods relative to other cell types across the body. This is consistent with previous work demonstrating that Msi1 promotes photoreceptor-specific splicing35, although no studies have yet shown if Pcbp2 performs a similar function. We also considered the possibility that knockdown of constitutive splicing factors could activate rod-specific exons. However, analysis of 1,159 data sets from the ENCODE shRNA-Seq project31,32 did not reveal any shRNA knockdown that could activate rod-specific exons (Fig. 4a).
MSI1 and PCBP2 induce rod-specific splicing in non-neurons
To test whether MSI1 and PCBP2 overexpression was sufficient to activate rod-specific exons, we transfected these proteins into HepG2 cells, a liver cancer cell line used by the ENCODE project. Initially, we found that normal transfection of MSI1 or PCBP2 could not activate rod-specific exons (Supplementary Fig. 7). However, given the extremely high expression of these factors in mature rods, we hypothesized that the average expression levels achieved by transfection were not high enough to induce rod-specific splicing. We therefore used FACS to isolate the most strongly GFP-positive MSI1/PCBP2-transfected HepG2 cells (with or without simultaneous knockdown of PTBP1) to more accurately reflect the expression levels of these splicing factors seen in mature rods. These robustly transfected cells had significant activation of rod-specific exons (Fig. 4a, Supplementary Data 3). Specifically, PCBP2 activated a single rod-specific exon in the monocarboxylate transporter Basigin (BSG) independent of PTBP1 knockdown; BSG is necessary for photoreceptor survival42,43,44. By contrast, high expression of MSI1 activated an extremely broad range of exons and appears to strongly synergize with PTBP1 downregulation (Fig. 4a, Supplementary Fig. 8). Not only are high levels of MSI1 capable of activating rod-specific exons in HepG2 cells, we also observed activation of neuronal/muscle enriched exons thought to be regulated by PTBP1 knockdown. Indeed, exon activation by MSI1 alone was stronger than the effect of knockdown of PTBP1 alone, suggesting a more complex interaction between MSI1 and PTBP1 (Supplementary Fig. 8).
High levels of MSI1 lead to splicing-in of cryptic exons
Interestingly, we also identified human-specific exons that were activated by high levels of MSI1, many of which are not found in any other tissue (Fig. 4a, Supplementary Data 3). These cryptic exons are likely incidentally activated by the extremely high expression of MSI1 in the most robustly transfected cells. This contrasts with previous work in which cryptic exons are activated as a result of knockdown of splicing factor repressors51,57,62,63,64. MSI1 has been reported to selectively bind to RNA that contain multiple UAG sequences35,65,66,67,68,69,70,71. A motif analysis reveals that UAG clusters are significantly enriched at the proximal intron of the 5’ splice site (Fig. 4b); this pattern was consistent for both alternative and cryptic exons. Of the 31 rod-specific exons common between mouse and human, the majority are flanked by binding motifs for both PTBP1 and MSI1 (Supplementary Fig. 9). Using previously published Msi1 CLIP-Seq72 data, we also identified several photoreceptor-specific exons with Msi1 CLIP peaks, supporting a mechanism of direct interaction (Supplementary Fig. 10). UAG motif frequencies were compared to a baseline of all protein coding exons (<400 bp) in the GENCODE v28 basic gene annotation and exon examples are visualized in Fig. 5a.
Msi1 knockdown abolishes photoreceptor-specific splicing
Finally, we wanted to test whether loss of Msi1 or Pcbp2 function would result in a reduction of rod-specific exon splicing. We electroporated mouse retinal explants with shRNA or dominant negative versions of Msi1 and Pcbp2 (Fig. 5b) and found that while reducing PCBP2 function did not affect splicing of the rod-specific exon in Bsg, reducing Msi1 function with shRNA or a dominant negative protein blocked rod-specific splicing (Supplementary Fig. 11). We confirmed that this result was specific to Msi1 by electroporating shRNA targeting Msi2 (an Msi1 homolog), and found that Msi2 shRNA did not reduce rod-specific splicing. We then analyzed the expression of a set of genes correlated with photoreceptor differentiation73 and found that Msi1 loss of function leads to expression patterns that resembled immature P2-P4 photoreceptors (Supplementary Fig. 11AA). Overall, expression of dominant negative Msi1 mimics Msi1 knockdown, but produces a somewhat weaker effect (Fig. 5b). Interestingly, while most rod-specific exons are reduced after Msi1 knockdown, some rod-specific exons remain robustly incorporated (e.g. Doc2b, Ppp3cc, Plekhb1).
We have developed ASCOT, a resource that enables researchers to more easily perform cross-study splicing and expression analyses of public RNA-Seq data. ASCOT rapidly calculates exon PSIs and alternative splicing patterns using an annotation-free method that queries splice junction count tables. ASCOT’s user interface and associated splicing/expression data sets are openly available at http://ascot.cs.jhu.edu. Although there have been past efforts to summarize public RNA-Seq data28,29,74, ASCOT represents the largest effort to date to make alternative splicing and gene expression summaries of diverse data sets available to general researchers. ASCOT also demonstrates the value of using annotation-free methods to summarize publicly archived data.
Beyond scalability, ASCOT has several other advantages for analyzing cell type-specific alternative splicing. First, data set columns in splicing and expression summaries can be easily grouped and regrouped depending on the researchers needs, a feature that is especially useful for analyzing single-cell data (Supplementary Fig. 6). For example, clustering neonatal inner ear cells75 based on primary cell type confirms that the exon in Sptan1 (Fig. 1b) is only present in cochlear and vestibular hair cells and is absent in other inner ear cell types. Alternative splicing and gene expression data for these inner ear data sets, and a variety of other single-cell RNA-Seq studies, are available under the CellTower compilation of ASCOT (http://ascot.cs.jhu.edu). Data set clustering can also help identify alternative exons in bulk data that may be missed due to low gene expression. For example, by clustering GTEx data sets by organ, we can identify many exons that are differentially utilized between brain and heart that were not detected in Leafcutter’s shiny app visualization, LeafViz (https://leafcutter.shinyapps.io/leafviz/)11 (Supplementary Fig. 13). Second, ASCOT does not require transcript references to identify alternative splicing events, and is therefore unbiased toward annotated or unannotated exons. We estimate that ~40-60% of mouse and ~10-30% of human cassette exons identified by ASCOT are unannotated (Supplementary Fig. 12). Third, ASCOT can answer custom queries that go beyond the data sets summarized in this study. For example, we queried rod-specific exons across 50,062 public data sets in the SRA (SRAv2 Snaptron compilation) to estimate the frequency of retinal data sets in the public archive (Supplementary Fig. 14). We found 37 data sets (0.07%) that had high PSI levels of rod-specific exons, and confirmed that these data sets were indeed from human retina. Finally, ASCOT can harmonize single- and paired-end RNA-Seq data of various read lengths. By starting from a splice junction count table, ASCOT can analyze alternative splicing across tens of thousands of archived RNA-Seq data sets without having to restart each analysis from raw fastq reads.
ASCOT is currently limited by its inability to detect complex alternative splicing events that other algorithms11,13 can identify. We intentionally targeted binary splicing decisions as they have a straightforward biological interpretation and represent the majority of alternative splicing events. However, complex splicing can certainly be modeled with nested decision trees that would still be compatible with a junction-walking strategy. We believe that splice junction count tables contain enough information to build these splice models. Also, ASCOT does not attempt to model biases that can distort junction counts, such as GC content or secondary structure. We plan for future versions of ASCOT to model and mitigate these effects.
We used ASCOT to study tens of thousands of data sets from SRA, ENCODE, GTEx. Analyzing splicing factor gene expression across various mouse cell types allowed us to identify MSI1 and PCBP2 as candidates for inducing rod-specific splicing patterns, while the ENCODE shRNA-Seq data confirmed for us that knockdown of constitutive splicing factors could not activate rod-specific exons. Taken together, these observations led to the hypothesis that manipulating certain splicing factors could lead to rod-like splicing patterns. Only with this hypothesis in mind were we able to generate new data to conclude that robust overexpression of PCBP2 and MSI1 combined with PTBP1 knockdown was able to activate rod-specific exons, even in a non-neuronal cell line such as HepG2. This study is emblematic of a larger shift toward using public data sets, often pre-summarized or indexed, to generate hypotheses and narrow the scientific question prior to designing experiments and generating new data. Resources such as ASCOT can save researchers much time and effort, as well as create new avenues of research for smaller labs with limited funding.
Together, our results suggest a model of photoreceptor splicing regulation (Fig. 5c) whereby MSI1 and PTBP1 downregulation interact synergistically. MSI1 overexpression leads to the incorporation of PTBP1-repressed exons, while PTBP1 downregulation increases MSI1’s ability to activate rod-specific exons (Supplementary Fig. 8). We have also identified that PCBP2 is another regulator of photoreceptor-specific splicing. The rod-specific exon in BSG is essentially undetectable in all non-retinal tissues and PCBP2 overexpression increases the exon PSI to ~8% (PSI in photoreceptors is >80%). However, mouse retina electroporation of shRNA and dominant negative constructs targeting PCBP2 did not reduce levels of the rod-specific BSG exon, suggesting that PCBP2 overexpression can activate the exon in non-neuronal cells but is not required to maintain splicing in mature photoreceptors. By contrast, knockdown of Msi1 in electroporated mouse retina abolishes most of the rod-specific splicing events, leading to a delay in photoreceptor maturation (Supplementary Fig. 11). Although high levels of MSI1 are required for photoreceptor-specific splicing, our results indicate that MSI1 expression levels must still be titrated, since excessive overexpression in HepG2 cells led to the incorporation of deleterious, cryptic exons (Fig. 4a).
These cryptic exons reinforce the importance of obtaining human RNA-Seq data at the resolution of individual cell types, as there can be significant differences in splicing between mouse and human. With ASCOT, we identified 31 photoreceptor-specific splicing events that are common between mouse and human. However, this analysis is incomplete since isolated mouse photoreceptors were compared to human retina as opposed to isolated human photoreceptors. More remains to be understood about splicing in the retina since neighboring cell types, epigenetic states, and/or developmental timing may play a role in mediating optimal photoreceptor splicing. Conditional knockout of Msi1 in the adult retina will help clarify these results, as will single-cell sequencing of human retinal organoids.
ASCOT is part of a larger effort to make gene expression and alternative splicing data more accessible to the general researcher7,8. By reducing the initial barriers to data analysis, we hope to accelerate cross-disciplinary work and foster unexpected discoveries.
ASCOT data tables, software, and interactive browser are available at http://ascot.cs.jhu.edu.
Publications used as data sources and bigWig visualization on the UCSC Genome Browser
All RNA-Seq data used for this study was obtained from various publication as documented on the ASCOT web resource (http://ascot.cs.jhu.edu/ds/ds_list.html). To visualize individual data sets, bigWigs were generated from aligned bam files and compiled as UCSC TrackHubs. Instructions for visualizing this data is linked on the ASCOT web resource (http://ascot.cs.jhu.edu/ucsctracks.html).
ASCOT splicing analysis methodology and software
A detailed description of ASCOT’s splicing analysis methodology is available in Supplementary Fig. 1. Briefly, ASCOT uses an exon-centric approach to consider only the local regions of a splice graph and analyzes these elements independently from one another. We focus on four binary splicing decisions: cassette exons, alternative splice site exon groups that share the same exclusion junction, linked exons, and mutually exclusive pairs of exons. Our method for splicing analysis relies on evidence from RNA-Seq split-read alignments (i.e. splice junctions), as opposed to coverage. By grouping splice junctions based on shared start or end coordinates, closed loops can be identified where we can start from any coordinate and trace a path through an alternating series of exons and introns that leads back to original starting coordinate. For binary splicing events, there are will be two independent loops that share the same exclusion junction conditions. All scripts used to generate ASCOT are available on a GitHub repository at https://github.com/jpling/ascot.
HepG2 cell culture, transfection, and FACS isolation
HepG2 cells (ATCC, HB-8065) were cultured in Eagle’s Minimum Essential Medium (Quality Biological, 112-018-101CS) supplemented with 1x GlutaMAX (ThermoFisher Scientific, 35050061), 10% FBS (Corning, 35-010-CV) and 1% Penicillin-Streptomycin (ThermoFisher Scientific, 15070063). siRNA targeting PTBP1 (Sigma, SASI_Hs01_00216644) or eGFP as negative control (ThermoFisher Scientific, AM4626) were transfected using Lipofectamine 3000 (Thermo Fisher Scientific, L3000-008) following the manufacturer’s protocol. For overexpression of MSI1 and PCBP2, Ultimate ORF expression clones from ThermoFisher Scientific (MSI1 - IOH41182, PCBP2 - IOH4487) were cloned into pCAGIG (Addgene, 11159) and again transfected using Lipofectamine 3000. For experiments involving a combination of plasmid overexpression and siRNA knockdown, plasmids were first transfected at 0 h, siRNA were transfected at 24 h, and cells were processed two days later at 72 h. For FACS isolation, cells were dissociated using TrypLE (ThermoFisher Scientific, 12604013) to form a single-cell suspension and sorted by GFP fluorescence on a BD FACSCalibur in the JHMI Ross Flow Cytometry Core Facility.
RNA extraction, library preparation, and RNA sequencing
RNA was extracted from cell culture samples using the Monarch Total RNA Miniprep Kit (New England BioLabs, T2010S). Total RNA for RNA-Seq was then processed using the TruSeq Stranded Total RNA Library Prep Kit (Illumina) to construct RNA-Seq libraries. Sample libraries were then sequenced on an Illumina NextSeq. Data was de-multiplexed and converted into fastq files. Fastq files were then processed by the Rail-RNA spliced alignment program and incorporated into a Snaptron compilation.
RT-PCR primers used for novel exon validation:
Kctd5-forward: CTCCATACGGCACAACCAGT, Kctd5-reverse: GTAGCACCAAGGACCCTGTC, Flna-forward: TCGTAGCCCCTACACTGTCA, Flna-reverse: TTACACGCTCCTCACCCTTG, Flnb-forward: CCCATGTGGTCAAGGTCTCC, Flnb-reverse: GTTACACCAAGCTCTCCGCT, Itgb1-forward: GGCGTCTGTGCAGAGCATAA, Itgb1-reverse: CAGTTGTCACGGCACTCTTG, Ywhae-forward: ACAGCCTCGTGGCTTACAAA, Ywhae-reverse: ACATCCTGCAGCGCTTCTTT, Vcl-forward: TCTCCCCCATGGTGATGGAT, Vcl-reverse: TGAATAAGTGCCCGCTTGGT, Farp2-forward: GTGTCACAGGAGCCAGTCAT, Farp2-reverse: TCCTTTTCTAGCCGAGTGCTG, Cltc-forward: TGATCCCGAGCGAGTCAAGA, Cltc-reverse: ACCAGGTCATGGACAAAGTCA, Ptprf-forward: TTGTCATCGCCATCCTCCTG, Ptprf-reverse: TCCTTCAGCCCGATTGACTG, Ank3-forward: CGAGAACGACACGAAGGGAA, Ank3-reverse: GGCAACGTGTAAGGGAGTGA, Ppp6r3-forward: GCGGCATGAAGGAAACACTC, Ppp6r3-reverse: TGCATCTTTGCAAGCAGCAT. Large differences in RT-PCR product sizes were resolved on 2% agarose gels. To resolve small differences in RT-PCR product sizes (<30 bp), an Agilent Fragment Analyzer was used instead.
Ex vivo mouse retina electroporation
All experimental procedures were preapproved by the Institutional Animal Care and Use Committee of the Johns Hopkins University School of Medicine. For ex vivo electroporation experiments, postnatal day (P)2 to P4 mouse retinas were dissected into DMEM/F12 with 10% FBS and electroporated with 100 µg total plasmid in 100 µl volume using a BTX ECM 830 Generator. Electroporation was performed using six square pulses of 50 volts and 50 milliseconds duration with a 950 milliseconds interval between pulses. Retinas were then cultured on 0.2 µm Whatman Nucleopore Track-Etched Membranes (MilliporeSigma, WHA110406). At P14, electroporated retinas (8–16 per condition) were then dissociated into single-cell suspension using the Worthington Papain Dissociation System (Worthington, LK003150) and GFP-positive cells were isolated with FACS. For shRNA knockdown, we used prevalidated constructs from The RNAi Consortium to knockdown Msi1 (TRCN0000098550), Msi2 (TRCN0000071974), and Pcbp2 (TRCN0000120931) and control shRNA (MilliporeSigma, SHC005). Electroporated shRNA plasmids were mixed with the pCAGIG plasmid (Addgene, 11159) at a ratio of 3:1 by weight (shRNA:pCAGIG) to label electroporated cells with GFP. Dominant negative constructs were generated using an N-terminal truncated PCBP2 sequence (ΔKH1-PCBP2, aa125–365) and a C-terminal truncated MSI1 sequence (aa1-199). Sequences were cloned into pCAGIG. Empty pCAGIG vector was used as a second control.
Generation of Snaptron compilations
Raw RNA-Seq fastq reads from all the input accessions were first analyzed using Rail-RNA, a cloud-enabled spliced alignment program that can analyze many samples at once5,6. Rail-RNA outputs a few summaries for each run accession, including a table of splice-junction evidence. In this table, each row is a splice junction and each column is an individual run accession. The elements of the table give the number of times a spliced alignment from an individual (column) spanned a junction (row). These summaries are then composed and indexed using Tabix and SQLite, and all the associated metadata for the run accessions are indexed using Lucene, to form a Snaptron compilation. A Snaptron compilation can be queried via command line or via RESTful API queries.
Further information on research design is available in the Nature Research Reporting Summary linked to this article.
Leinonen, R., Sugawara, H. & Shumway, M., The sequence read archive. Nucleic Acids Res. 39, D19–D21 (2011).
Karsch-Mizrachi, I., Takagi, T. & Cochrane, G. The international nucleotide sequence database collaboration. Nucleic Acids Res. 46, D48–D51 (2018).
Langmead, B. & Nellore, A. Cloud computing for genomic data analysis and collaboration. Nat. Rev. Genet. 19, 208–219 (2018).
Denk, F. Don’t let useful data go to waste. Nature 543, 7 (2017).
Nellore, A. et al. Rail-RNA: scalable analysis of RNA-seq splicing and coverage. Bioinformatics. 33, 4033–4040 (2016)
Nellore, A. et al. Human splicing diversity and the extent of unannotated splice junctions across human RNA-seq samples on the Sequence Read Archive. Genome Biol. 17, 266 (2016).
Collado-Torres, L. et al. Reproducible RNA-seq analysis using recount2. Nat. Biotechnol. 35, 319–321 (2017).
Wilks, C., Gaddipati, P., Nellore, A. & Langmead, B. Snaptron: querying splicing patterns across tens of thousands of RNA-seq samples. Bioinformatics 34, 114–116 (2018).
Kahles, A. et al. Comprehensive analysis of alternative splicing across tumors from 8,705 patients. Cancer Cell 34, 211–224.e6 (2018).
Dominguez, D. et al. Sequence, structure, and context preferences of human RNA binding proteins. Mol. Cell 70, 854–867.e9 (2018).
Li, Y. I. et al. Annotation-free quantification of RNA splicing using LeafCutter. Nat. Genet. 50, 151–158 (2018).
Bray, N. L., Pimentel, H., Melsted, P. & Pachter, L. Near-optimal probabilistic RNA-seq quantification. Nat. Biotechnol. 34, 525–527 (2016).
Vaquero-Garcia, J. et al. A new view of transcriptome complexity and regulation through the lens of local splicing variations. eLife 5, e11752 (2016).
Patro, R., Mount, S. M. & Kingsford, C. Sailfish enables alignment-free isoform quantification from RNA-seq reads using lightweight algorithms. Nat. Biotechnol. 32, 462–464 (2014).
Shen, S. et al. rMATS: robust and flexible detection of differential alternative splicing from replicate RNA-Seq data. Proc. Natl Acad. Sci. USA 111, E5593–E5601 (2014).
Wang, X. & Cairns, M. J. SeqGSEA: a Bioconductor package for gene set enrichment analysis of RNA-Seq data integrating differential expression and splicing. Bioinformatics 30, 1777–1779 (2014).
Hu, Y. et al. DiffSplice: the genome-wide detection of differential splicing events with RNA-seq. Nucleic Acids Res. 41, e39 (2013).
Leng, N. et al. EBSeq: an empirical Bayes hierarchical model for inference in RNA-seq experiments. Bioinformatics 29, 1035–1043 (2013).
Wang, W., Qin, Z., Feng, Z., Wang, X. & Zhang, X. Identifying differentially spliced genes from two groups of RNA-seq samples. Gene 518, 164–170 (2013).
Drewe, P. et al. Accurate detection of differential RNA processing. Nucleic Acids Res. 41, 5189–5198 (2013).
Trapnell, C. et al. Differential analysis of gene regulation at transcript resolution with RNA-seq. Nat. Biotechnol. 31, 46–53 (2013).
Aschoff, M. et al. SplicingCompass: differential splicing detection using RNA-seq data. Bioinformatics 29, 1141–1148 (2013).
Anders, S., Reyes, A. & Huber, W. Detecting differential usage of exons from RNA-seq data. Genome Res. 22, 2008–2017 (2012).
Alamancos, G. P., Pagès, A., Trincado, J. L., Bellora, N. & Eyras, E. Leveraging transcript quantification for fast computation of alternative splicing profiles. RNA 21, 1521–1531 (2015).
Trincado, J. L. et al. SUPPA2: fast, accurate, and uncertainty-aware differential splicing analysis across multiple conditions. Genome Biol. 19, 40 (2018).
Katz, Y., Wang, E. T., Airoldi, E. M. & Burge, C. B. Analysis and design of RNA sequencing experiments for identifying isoform regulation. Nat. Methods 7, 1009–1015 (2010).
Wang, Q. & Rio, D. C. JUM is a computational method for comprehensive annotation-free analysis of alternative pre-mRNA splicing patterns. Proc. Natl Acad. Sci. USA 115, E8181–E8190 (2018).
Saraiva-Agostinho, N. & Barbosa-Morais, N. L. Psichomics: graphical application for alternative splicing quantification and analysis. Nucleic Acids Res. 47, e7 (2018).
Tapial, J. et al. An atlas of alternative splicing profiles and functional associations reveals new regulatory programs and genes that simultaneously express multiple major isoforms. Genome Res. 27, 1759–1768 (2017).
GTEx Consortium et al. Genetic effects on gene expression across human tissues. Nature 550, 204–213 (2017).
Sloan, C. A. et al. ENCODE data at the ENCODE portal. Nucleic Acids Res. 44, D726–D732 (2016).
Sundararaman, B. et al. Resources for the comprehensive discovery of functional RNA elements. Mol. Cell 61, 903–913 (2016).
Tasic, B. et al. Adult mouse cortical cell taxonomy revealed by single cell transcriptomics. Nat. Neurosci. 19, 335–346 (2016).
Raj, B. & Blencowe, B. J. Alternative splicing in the mammalian nervous system: recent insights into mechanisms and functional roles. Neuron 87, 14–27 (2015).
Murphy, D., Cieply, B., Carstens, R., Ramamurthy, V. & Stoilov, P. The Musashi 1 controls the splicing of photoreceptor-specific exons in the vertebrate retina. PLoS Genet. 12, e1006256 (2016).
Weyn-Vanhentenryck, S. M. et al. Precise temporal regulation of alternative splicing during neural development. Nat. Commun. 9, 2189 (2018).
Vuong, C. K., Black, D. L. & Zheng, S. The neurogenetics of alternative splicing. Nat. Rev. Neurosci. 17, 265–281 (2016).
Li, Q., Lee, J.-A. & Black, D. L. Neuronal regulation of alternative pre-mRNA splicing. Nat. Rev. Neurosci. 8, 819–831 (2007).
Terada, Y. et al. Novel splice variants of amphiphysin I are expressed in retina. FEBS Lett. 519, 185–190 (2002).
Friedrich, U. et al. The Na/K-ATPase is obligatory for membrane anchorage of retinoschisin, the protein involved in the pathogenesis of X-linked juvenile retinoschisis. Hum. Mol. Genet. 20, 1132–1142 (2011).
Wan, J. et al. Dynamic usage of alternative splicing exons during mouse retina development. Nucleic Acids Res. 39, 7920–7930 (2011).
Ochrietor, J. D. et al. Retina-specific expression of 5A11/Basigin-2, a member of the immunoglobulin gene superfamily. Invest. Ophthalmol. Vis. Sci. 44, 4086–4096 (2003).
Ochrietor, J. D. & Linser, P. J. 5A11/Basigin gene products are necessary for proper maturation and function of the retina. Dev. Neurosci. 26, 380–387 (2004).
Clamp, M. F., Ochrietor, J. D., Moroz, T. P. & Linser, P. J. Developmental analyses of 5A11/Basigin, 5A11/Basigin-2 and their putative binding partner MCT1 in the mouse eye. Exp. Eye Res. 78, 777–789 (2004).
Riazuddin, S. A. et al. A splice-site mutation in a retina-specific exon of BBS8 causes nonsyndromic retinitis pigmentosa. Am. J. Hum. Genet. 86, 805–812 (2010).
Murphy, D., Singh, R., Kolandaivelu, S., Ramamurthy, V. & Stoilov, P. Alternative splicing shapes the phenotype of a mutation in BBS8 to cause nonsyndromic retinitis pigmentosa. Mol. Cell. Biol. 35, 1860–1870 (2015).
Bowne, S. J. et al. Why do mutations in the ubiquitously expressed housekeeping gene IMPDH1 cause retina-specific photoreceptor degeneration? Invest. Ophthalmol. Vis. Sci. 47, 3754–3765 (2006).
Kim, J.-W. et al. NRL-regulated transcriptome dynamics of developing rod photoreceptors. Cell Rep. 17, 2460–2473 (2016).
Kim, E. J. et al. Complete transcriptome profiling of normal and age-related macular degeneration eye tissues reveals dysregulation of anti-sense transcription. Sci. Rep. 8, 3040 (2018).
Clark, B. et al. Single-cell RNA-seq analysis of retinal development identifies NFI factors as regulating mitotic exit and late-born cell specification. Neuron. 102, 1111–1126.e5 (2019).
Ling, J. P. et al. PTBP1 and PTBP2 repress nonconserved cryptic exons. Cell Rep. 17, 104–113 (2016).
Vuong, J. K. et al. PTBP1 and PTBP2 serve both specific and redundant functions in neuronal pre-mRNA splicing. Cell Rep. 17, 2766–2775 (2016).
Keppetipola, N. M. et al. Multiple determinants of splicing repression activity in the polypyrimidine tract binding proteins, PTBP1 and PTBP2. RNA. 22, 1172–1180 (2016).
Gueroussov, S. et al. An alternative splicing event amplifies evolutionary differences between vertebrates. Science 349, 868–873 (2015).
Keppetipola, N., Sharma, S., Li, Q. & Black, D. L. Neuronal regulation of pre-mRNA splicing by polypyrimidine tract binding proteins, PTBP1 and PTBP2. Crit. Rev. Biochem. Mol. Biol. 47, 360–378 (2012).
Licatalosi, D. D. et al. Ptbp2 represses adult-specific splicing to regulate the generation of neuronal precursors in the embryonic brain. Genes Dev. 26, 1626–1642 (2012).
McClory, S. P., Lynch, K. W. & Ling, J. P. HnRNP L represses cryptic exons. RNA 24, 761–768 (2018).
Boutz, P. L. et al. A post-transcriptional regulatory switch in polypyrimidine tract-binding proteins reprograms alternative splicing in developing neurons. Genes Dev. 21, 1636–1652 (2007).
Llorian, M. et al. The alternative splicing program of differentiated smooth muscle cells involves concerted non-productive splicing of post-transcriptional regulators. Nucleic Acids Res. 44, 8933–8950 (2016).
Llorian, M. et al. Position-dependent alternative splicing activity revealed by global profiling of alternative splicing events regulated by PTB. Nat. Struct. Mol. Biol. 17, 1114–1123 (2010).
Cook, K. B., Kazan, H., Zuberi, K., Morris, Q. & Hughes, T. R. RBPDB: a database of RNA-binding specificities. Nucleic Acids Res. 39, D301–D308 (2011).
Ling, J. P., Pletnikova, O., Troncoso, J. C. & Wong, P. C. TDP-43 repression of nonconserved cryptic exons is compromised in ALS-FTD. Science 349, 650–655 (2015).
Ehrmann, I. et al. An ancient germ cell-specific RNA-binding protein protects the germline from cryptic splice site poisoning. eLife 8, 39304 (2019).
Zarnack, K. et al. Direct competition between hnRNP C and U2AF65 protects the transcriptome from the exonization of Alu elements. Cell 152, 453–466 (2013).
Iwaoka, R. et al. Structural insight into the recognition of r(UAG) by Musashi-1 RBD2, and construction of a model of musashi-1 RBD1-2 bound to the minimum target RNA. Molecules 22, 1207 (2017).
Cragle, C. & MacNicol, A. M. Musashi protein-directed translational activation of target mRNAs is mediated by the poly(A) polymerase, germ line development defective-2. J. Biol. Chem. 289, 14239–14251 (2014).
Zearfoss, N. R. et al. A conserved three-nucleotide core motif defines Musashi RNA binding specificity. J. Biol. Chem. 289, 35530–35541 (2014).
Ohyama, T. et al. Structure of Musashi1 in a complex with target RNA: the role of aromatic stacking interactions. Nucleic Acids Res. 40, 3218–3231 (2012).
Kawahara, H. et al. Musashi1 cooperates in abnormal cell lineage protein 28 (Lin28)-mediated let-7 family microRNA biogenesis in early neural differentiation. J. Biol. Chem. 286, 16121–16130 (2011).
Kawahara, H. et al. Neural RNA-binding protein Musashi1 inhibits translation initiation by competing with eIF4G for PABP. J. Cell Biol. 181, 639–653 (2008).
Imai, T. et al. The neural RNA-binding protein Musashi1 translationally regulates mammalian numb gene expression by interacting with its mRNA. Mol. Cell. Biol. 21, 3888–3900 (2001).
Li, N. et al. The Msi family of RNA-binding proteins function redundantly as intestinal oncoproteins. Cell Rep. 13, 2440–2455 (2015).
Blackshaw, S. et al. Genomic analysis of mouse retinal development. PLoS Biol. 2, E247 (2004).
Strobelt, H. et al. Vials: visualizing alternative splicing of genes. IEEE Trans. Vis. Comput. Graph. 22, 399–408 (2016).
Burns, J. C., Kelly, M. C., Hoa, M., Morell, R. J. & Kelley, M. W. Single-cell RNA-Seq resolves cellular complexity in sensory organs from the neonatal inner ear. Nat. Commun. 6, 8557 (2015).
We thank X. Zhang and the Johns Hopkins Ross Flow Cytometry Core Facility. We also thank the Johns Hopkins Deep Sequencing & Microarray Core Facility for sequencing services. This work was supported by grants from the NIH (R01EY020560 to SB, K99EY027844 to BSC, R01GM118568 to BL, and R01GM121459), a postdoctoral fellowship from the Johns Hopkins Kavli Neuroscience Discovery Institute to J.P.L., and seed funding from The Institute for Data Intensive Engineering and Science (IDIES) at Johns Hopkins University to B.L.
The authors declare no competing interests.
Peer review information Nature Communications thanks the anonymous reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Ling, J.P., Wilks, C., Charles, R. et al. ASCOT identifies key regulators of neuronal subtype-specific splicing. Nat Commun 11, 137 (2020). https://doi.org/10.1038/s41467-019-14020-5