Precise temporal regulation of alternative splicing during neural development

Alternative splicing (AS) is one crucial step of gene expression that must be tightly regulated during neurodevelopment. However, the precise timing of developmental splicing switches and the underlying regulatory mechanisms are poorly understood. Here we systematically analyze the temporal regulation of AS in a large number of transcriptome profiles of developing mouse cortices, in vivo purified neuronal subtypes, and neurons differentiated in vitro. Our analysis reveals early-switch and late-switch exons in genes with distinct functions, and these switches accurately define neuronal maturation stages. Integrative modeling suggests that these switches are under direct and combinatorial regulation by distinct sets of neuronal RNA-binding proteins including Nova, Rbfox, Mbnl, and Ptbp. Surprisingly, various neuronal subtypes in the sensory systems lack Nova and/or Rbfox expression. These neurons retain the “immature” splicing program in early-switch exons, affecting numerous synaptic genes. These results provide new insights into the organization and regulation of the neurodevelopmental transcriptome.

D uring development of the mammalian nervous system, neurons mature through a prolonged and sophisticated process that involves dramatic morphological and functional changes in individual neurons, as well as formation of synaptic connections between neurons to build intricate neural circuits [1][2][3] . These changes must occur with high accuracy 4 , which, at the molecular level, is achieved through tight temporal control of gene expression at multiple steps. To date, extensive efforts have been made to dissect the role of transcriptional regulation controlling specification of neuronal subtype identities during early neuronal development 5,6 . However, regulatory mechanisms that govern the precise timing of various molecular and cellular events required for neuronal development and maturation remain poorly understood.
Alternative splicing (AS) is an essential mechanism that allows the generation of multiple transcripts and protein variants, or isoforms, from a single gene 7 . This mechanism is increasingly recognized as a major source of molecular diversity, especially in the central nervous system (CNS) 8 . Genome-wide transcriptomic studies based on deep mRNA sequencing (RNA-seq) demonstrated that AS is ubiquitous in mammals, including many alternative exons with brain-specific or neuron-specific splicing patterns [9][10][11] . The neuron-specific splicing program must be established at specific stages of the neuronal differentiation and maturation process. Indeed, many alternative exons show dramatic changes during neuronal development, as revealed by several recent studies of developing cortices in primates 12 and rodents 13 , different laminar cortical layers 14 , and specific neuronal subtypes purified in situ 15 or differentiated in vitro from embryonic stem cells (ESCs) 16 . While the functional significance for the majority of these developmentally regulated alternative exons has yet to be demonstrated, decades of study have found multiple examples in which individual alternative exons play critical roles in various aspects of neuronal development, such as neuronal migration, axon guidance, and synapse formation 17 . Therefore, elucidating the precise timing of developmental splicing switches and their underlying regulatory mechanisms is a key step toward understanding the molecular program governing neurodevelopment.
Cell-type-specific or developmental-stage-specific AS events are largely controlled by recruiting RNA-binding proteins (RBPs) that recognize specific regulatory sequences embedded in the pre-mRNA transcripts. For instance, RBPs specifically expressed or enriched in neurons, such as Nova, Rbfox, Ptbp2, nElavl, nSR100, and Mbnl2 have been demonstrated to regulate AS of numerous neuronal transcripts (reviewed in ref. 8 ). Technological advances have also made it possible to define the comprehensive target networks of individual RBPs with high accuracy by integrating global splicing profiles upon depletion of each RBP and genomewide maps of in vivo, direct protein-RNA interactions, as we demonstrated in our previous studies 18,19 . Importantly, such global and unbiased analyses allowed us to demonstrate that Rbfox proteins in general promote the adult splicing pattern in the developing cortex 18 . Other groups also found that Mbnl2 and Ptbp2 promote and antagonize the adult splicing pattern, respectively [20][21][22] . However, how these and other RBPs contribute to the precise timing of developmental splicing switches has not been systematically investigated.
The limited sampling resolution, incomplete coverage of developmental stages, and the scope of analysis have impeded previous studies to uncover the precise timing of developmental splicing switches, the key regulatory signals, and the link to developmental cellular processes. To address these issues, here we systematically investigate the organization of the developmental splicing profiles in a large panel of developing mouse cortical tissues and different subtypes of neurons isolated in situ, as well as neurons differentiated in vitro from ESCs. In combination with integrative modeling of RNA-regulatory networks 23,24 , this approach allows us to dissect the underlying regulatory mechanisms that control the splicing program at specific neuronal maturation stages and diverse neuronal subtypes in the central and peripheral nervous system.

Results
A modular neurodevelopmental splicing program. To determine the precise timing of developmental splicing changes, we profiled the transcriptome of mouse cortices by RNA-seq at nine time points: embryonic day 14.5 (E14.5), E16.5, postnatal day 4 (P4), P7, P17, P30, 4 months, and 21 months ( Supplementary  Fig. 1a, b and Supplementary Dataset 1). These time points were chosen carefully to best capture the dynamics of developmental splicing changes based on several individual alternative exons characterized in detail in previous studies ( Supplementary  Fig. 1c-e, Supplementary Datasets 2 and 3, and Supplementary Discussion). Using stringent criteria (changes in percent spliced in (|ΔΨ|) ≥ 0.2 and false discovery rate (FDR) ≤ 0.05) applied to both known and novel AS events 10,23 , we identified over 20,000 events representing 32% of brain-expressing genes with significant changes between at least two stages, suggesting prevalence of developmental splicing regulation at an unprecedented scale (Supplementary Dataset 4).
For detailed analysis, we focused on 2883 non-redundant cassette exons under developmental regulation (1909 known and 974 novel) that could be accurately quantified in ≥7 of the nine time points (Fig. 1a). Compared to cassette exons overall, developmentally regulated cassette exons are much more likely to preserve the reading frame (68.7% vs. 43.2%) and to have a conserved AS pattern detected in the human brain transcriptome (63.4% vs. 29.2%). We recently developed a method to identify exons under strong purifying selection pressure based on crossspecies sequence conservation in the alternative exons and flanking intronic regions 10 . A much higher fraction of developmentally regulated exons is under strong purifying selection pressure (31% vs. 3.9%; Fig. 1b). For example, our analysis identified a highly conserved 12-nucleotide (nt) microexon in the Kdm1a gene encoding LSD1 (histone lysine specific demethylase 1) (Fig. 1c); this exon's inclusion level peaks between postnatal day 0 (P0) and P7, a temporal pattern consistent with its previously reported role in modulating neurite outgrowth by altering the availability of a phosphorylation site 24,25 .
To understand the timing of splicing changes on a global scale, we performed weighted gene co-expression network analysis (WGCNA) 26 on the developmentally regulated cassette exons. This analysis revealed four modules with distinct temporal patterns ( Fig. 1d and Supplementary Dataset 5). Among them, exons in module M2 show early splicing switches around birth and exons in M1 show late splicing switches between P4 and P15. These two modules, both characterized by monotonic splicing changes, account for 74% of developmentally regulated alternative exons, indicating that these are the predominant modes of regulation. The other two modules, M3 and M4, show more complex, non-monotonic changes, including abrupt splicing changes in M3 that occur around birth. We confirmed that this modular organization is highly reproducible using independent datasets (Supplementary Fig. 2 and Fig. 2 below). We also identified a subset of 1266 (44%) exons that are most correlated with the module eigenvectors, and thus the most representative of each module, as "core members" (Fig. 1d and Supplementary Dataset 5).
Given the importance of precise timing for neural development, we tested whether the timing of splicing switches reflects specific gene function. To this end, we performed a "sliding window" gene ontology (GO) analysis (see Methods). This analysis revealed three major clusters of GO terms associated with different developmental times of splicing switches ( Fig. 1e and Supplementary Dataset 6). Early-switch exons are enriched in genes related to ion channels, transmembrane transport, and development of synaptic transmission, which are fundamental for establishing neuronal identity. In contrast, late-switch exons are enriched in genes involved in cytoskeletal remodeling, neuron projection, and synaptic formation, which are critical for wiring of neural circuits. Exons that switch in between (E16.5-P6) are present in genes related to membrane depolarization and formation of the axon initial segment, hallmarks of the early stages of neuronal maturation. Interestingly, genes encoding proteins that localize to different subcellular compartments, such as proteins that are part of the presynaptic machinery or postsynaptic density (PSD), also show splicing switches in distinct time windows (Supplementary Fig. 3). Furthermore, the functional distinction of genes with early and late splicing switches is also evident from significant GO terms enriched in each module, confirming the robustness of the observations (Supplementary Dataset 7). These data suggest a clear link between the timing of regulated splicing switches and cellular events that occur during neural development.
The splicing-regulatory program is pan-neuronal in the CNS. Although cortex tissues represent a mixture of different cell types, the developmental splicing changes we observed are not simply due to changes in cellular composition (Supplementary Figure 4a, b and Supplementary Discussion; see ref. 27 for a similar conclusion from gene expression analysis). Nevertheless, an important question is how well the modular organization of developmental splicing switches discovered in cortical tissues captures dynamics in specific neuronal subtypes or cell populations. To address this question, we analyzed multiple datasets that profiled the developmental transcriptomes of specific subtypes of neurons isolated in situ from mouse CNS tissues or differentiated in vitro from ESCs. We found consistent early splicing switches (exons in M2-M4) in purified cortical pyramidal neurons (cPNs) between E15.5 and P1 (ref. 15 ). Similarly, spinal motor neurons show early splicing switches in M2 exons between E12.5 and P1, as well as late switches in M1 exons between P1 and adults [28][29][30] . Importantly, early splicing switches and, to a certain degree, late splicing switches were also recapitulated in differentiation of mouse ESCs to glutamatergic neurons 16 (Fig. 2a). To be more quantitative, we counted the number of exons with significant developmental splicing changes at different time points in each subtype of neurons by pairwise comparison (|ΔΨ| ≥ 0.2 and FDR ≤ 0.05). This analysis confirmed a large number of earlyswitch exons (M2) showing differential splicing in the cPNs and ESC-derived glutamatergic neurons, and changes of both M1 and M2 exons in motor neurons, depending on the compared time points (Fig. 2b). Importantly, among M1 and M2 core module exons also showing monotonic splicing changes in each subtype of neurons, 92-94% have concordant changes in the same direction between the cortex tissue and the specific subtype of neuron. Taken together, our analysis revealed that in vivo and in vitro maturation of neuronal subtypes in the CNS share a dynamic developmental splicing program.

Maturation stage prediction of different subtypes of neurons.
Using the cortex tissue samples as a reference, we developed a computational method and software tool named Splicescope to model the splicing profile of any neuronal sample and assign it to one of six maturation stages (stages 1-6) corresponding to the cortex reference at E14.5, E16.5, P0, P4, P7, and P15 or older, respectively. In combination with two-dimensional projection, such analysis provides an effective summary and visualization of the developmental trajectory of splicing profiles.
We initially applied Splicescope to the splicing profiles of 92 distinct neuronal samples (merged from 277 biological or technical replicates from 28 studies; Fig. 3 and Supplementary Dataset 1). These include 41 tissue samples derived from the developing CNS (e.g., different brain regions, different cortical laminae, and the spinal cord), 20 samples of specific subtypes of neurons purified from mice at different ages, and 31 samples representing different neuronal subtypes or their progenitors differentiated from stem cells. Among the 56 tissue or neuronal subtype samples with known ages, Splicescope assigned the exact stage for 45 samples (80%). An additional nine samples (16%) were assigned to the neighboring stage and were also considered correct predictions because of the ambiguity in determining the true stage when the age is between those of two reference samples. Therefore, Splicescope achieved an overall accuracy of up to 96% in predicting neuronal maturation stages (Supplementary Dataset 8). Among them, cPNs isolated from older mice are predicted to have later maturation stages as one would expect (Fig. 3). Several misclassifications can be explained by heterogeneity of different cell subpopulations due to neuronal migration and maturation (e.g., different germinal zones of embryonic brains; Supplementary Fig. 5a, b) or unusual characteristics of certain neuronal subtypes (e.g., spinal motor neurons, which mature earlier than cortical neurons 31 ; Fig. 3).
When applying Splicescope to in vitro differentiated neurons, we found samples at later time points of differentiation were predicted to be more mature than samples at earlier time points, as expected (Fig. 3). Nevertheless, we note that the ESC-derived neurons, even after extended culture, only reach a predicted maturation stage of 4 or 5, which correspond to P4 and P7 in the cortex, respectively. This is consistent with the observed partial splicing changes of late-switch exons ( Fig. 2 right panel), and reflects the technical challenge of obtaining fully mature neurons through in vitro differentiation. This analysis confirmed that the modular organization of developmental splicing profiles represents a "pan-neuronal" program that reflects the developmental stage of a variety of neuronal subtypes and origins, and possibly underlies their maturation process.
Distinct RBPs regulate early and late splicing switches. We next sought to elucidate the regulatory mechanisms that control the early and late splicing switches we observed in the developing CNS. Our initial analysis was focused on four families of tissuespecific or neuron-specific RBPs: Rbfox, Nova, Mbnl, and Ptbp. These RBPs have an established role in regulating tissue-specific or development-specific splicing [18][19][20][21][22] , but the temporal specificity of such regulation and their contribution to neuronal maturation stages have not been determined. Our choice to study these RBPs is motivated by their dynamic expression changes during neural . c An example of developmental splicing regulation in exon 8a of the Kdm1a gene. Inclusion of this microexon peaks between postnatal days P0-P7. d Four modules of developmentally regulated exons identified by WGCNA analysis with distinct temporal patterns during cortex development. A non-redundant set of 2883 known and novel cassette exons was included for this analysis, and their mean-substracted inclusion levels across developmental stages are shown in the heatmap. Exons in each module were ranked based on their correlation with the eigenvector of the module, and those with the strongest correlation are defined as core members (black bars on the right). Exons in each module are further divided into two groups (e.g., M1+ and M1−) depending on positive or negative correlation with the eigenvector. e Enrichment of gene ontology (GO) terms in exons showing splicing switches with specific timing. The timing of developmental splicing switches is parameterized by sigmoidal curve fitting, and exons are ranked based on the timing. Exons in each sliding window (with a window size of 300 exons) were compared to all cassette exons with sufficient read coverage in the brain to identify significant GO terms. Only GO terms significant in at least one sliding window are shown (Benjamini FDR ≤0.05). Broad categories and top GO terms in each category are highlighted on the right development ( Fig. 4a). Furthermore, our analysis suggests that binding motifs of these RBPs are enriched in flanking intronic sequences of the core module exons in M1 and M2, which is evident from de novo motif analysis ( Supplementary Fig. 6), as well as analysis of established consensus sequences recognized by each RBP (Supplementary Fig. 7).
To investigate the contribution of the four RBP families in establishing the embryonic or mature splicing program, we defined the downstream splicing-regulatory networks by predicting the target exons they directly regulate using our previously established integrative modeling approach 18,19 (Fig. 4b). This approach achieved high specificity and sensitivity (conservatively estimated to be 95-98% and 57-78%, respectively) by integrating data that measure altered splicing upon RBP depletion, direct protein-RNA interaction sites and additional genomic information ( Supplementary Fig. 8, Supplementary Datasets 9 and 10). We found that target exons directly regulated by the four RBP families are disproportionally enriched in specific modules of developmentally regulated exons, especially among core members (Fig. 4c); overall, 36% of module exons (or 50% of core members) are regulated by at least one of these RBPs, compared to 9% of all known cassette exons (p < 2.2 × 10 −16 in both cases; hypergeometric test). Among them,~12% of module exons are regulated by more than one of the four RBP families, including 10 of the 11 exons regulated by all four families.
To demonstrate how these splicing-regulatory networks can be used to elucidate mechanisms underlying developmental splicing switches, we first focused on GABA receptor gamma 2 subunit (Gabrg2) exon 9, whose inclusion was previously reported to be altered in schizophrenic brains 32 . In developing cortex, inclusion of the exon increases gradually between P4 and P30 (Fig. 4d). We previously showed that this exon is activated synergistically by Nova and Rbfox 19 . Our new data suggest that the same exon is also repressed by Ptbp2 and activated by Mbnl1/2, and depletion of the RBPs individually resulted in dramatic splicing changes (Fig. 4e). The two Ptbp-binding sites consisting of clustered UCUY elements we predicted were previously validated to be important for Ptbp-dependent splicing using in vitro binding and splicing assays 33 . Intriguingly, simultaneous overexpression of Nova, Rbfox, and Mbnl in 293T cells, which have undetectable or low endogenous expression of these RBPs, resulted in dramatically increased inclusion of the exon in a splicing reporter, and the effect is much stronger than the activation by each individual or pair of proteins. This synergistic activation is due to direct regulation, as it was abrogated by mutation of the newly identified Mbnl-binding site, which is located four nucleotides downstream of the Rbfox-binding site (Fig. 4f).
On a global scale, we found that Mbnl, Rbfox, and Nova promote the mature splicing pattern of developmentally regulated exons in the vast majority (80-96%) of cases, while Ptbp mostly suppresses the mature splicing pattern (80% of cases), confirming and extending previous studies 18,20,21 (Fig. 4g). Importantly, targets of these RBPs show splicing switches at different times. Nova and Ptbp2 targets tend to switch early (~P0), Mbnl targets predominantly switch late (~P7 or older), and Rbfox targets switch in between (Fig. 4h). In addition, activation or repression of exon inclusion by these RBPs is predictive of an exon's module and direction of developmental splicing changes ( Fig. 4i and Supplementary Fig. 9). These findings agree well with the expression pattern of these RBPs.   Based on these observations, we built a variant of Splicescope which allows prediction of the neuronal maturation stage using only the exons directly regulated by each of the four RBP families. We applied either the full Splicescope model or the RBP-specific Splicescope model to samples with RBP depletion and their controls to assess the contribution of each RBP to specific stages of neuronal maturation. Analysis of RNA-seq datasets upon RBP depletion using the full Splicescope model shows clear shifts along the maturation trajectory based on the overall splicing profile (Fig. 4j), suggesting that these RBPs have global impacts on neuronal maturation. Importantly, the shift is even more pronounced when we used the RBP-specific Splicescope models (Table 1). For example, depletion of Mbnl1/2 in adult mouse brain results in a shift from stage 6 to stage 5 (corresponding to P7) based on the overall splicing profile, but to stage 4 (corresponding to P4) when only target exons directly regulated by Mbnl were used for analysis. Similarly, although Nova depletion results in a more embryonic splicing profile overall, the magnitude is not large enough to change the predicted maturation stage when using the full model; in contrast, the maturation stage shifted from 3 (corresponding to P0) to 1 (corresponding to E14.5) when only direct Nova targets were used for analysis. In combination with the specific enrichment of RBP targets in early-switch or late-switch exons, these data suggest an instrumental role for Ptbp, Nova, and Rbfox in regulating early splicing switches and for Mbnl in regulating late splicing switches during neuronal maturation.
A combinatorial developmental splicing code. To obtain optimized prediction of the timing of developmental splicing changes based on regulatory mechanisms, we developed random forestbased classification models 34 to predict exons with early and late splicing switches in modules M1 and M2 from all cassette exons. This analysis allowed us to compare and integrate the contribution of a large number of features related to splicing regulation including general splicing signals, regulation by the four RBP families that we focused on, and many additional RBPs (Supplementary Datasets 11 and 12). In addition, such models consider both the additive contribution of individual factors and their combinatorial effects.
The full models achieved high accuracy in prediction (area under receiver operating characteristic (ROC) curve or AUC between 0.87 and 0.92; Fig. 5a, the "All features" models and Supplementary Fig. 10a, b). The models are less predictive when only sequence-based features were used (i.e., RNA-seq and CLIP data were excluded; the "Seq_all" models). Notably, the models constructed only using features related to the four RBP families (RBP4) we focused on achieved similar performance to the full models, suggesting limited extra information provided by motif sites of many additional RBPs. These results strongly suggest that the four RBP families are key players of the developmental splicing program to specify the precise timing, although lack of significant contribution from additional RBPs to the model performance could reflect the redundancy of these features for prediction. When we ranked sequence features based on their importance for prediction, the RBP4 motif sites are indeed among the top. Nevertheless, we also identified additional motifs, including U-rich and UG-rich elements, which resemble binding sites of Elavl, Celf, and nSR100 RBP families ( Fig. 5b and Supplementary Dataset 12). Indeed, depletion of Elavl3/4 (ref. 35 ) or nSR100 (ref. 36 ) led to specific impairment in the splicing switches of M2 exons ( Supplementary Fig. 11), suggesting that they are also part of the early splicing switch regulatory program. Detailed analysis of their contribution was not pursued in this study due to the limited number of datasets currently available for integrative modeling. We also note that although our models were trained on exons with monotonic splicing changes and each model predicts only a single splicing switch, combinations of the models are predictive of more complex splicing patterns such as those in module M4. For example, M4+ exons are predicted to exhibit both earlyswitch on (by the M2+ model) and late-switch off (by the M1− model), while the opposite is true for M4− exons ( Fig. 5c; p < 0.002, Fisher's exact test). This observation suggests that these non-monotonic splicing changes result from independent regulatory events with the opposite direction.
A unique splicing-regulatory program in sensory systems. When we examined the splicing profiles of various other neuronal subtypes, we very unexpectedly found that many alternative  exons in olfactory sensory neurons (OSNs) and dorsal root ganglion (DRG) neurons isolated from adult mice showed an immature-like splicing pattern. OSNs and DRG sensory neurons are both part of the peripheral nervous system. Intrigued by this observation, we investigated whether sensory neurons in general have a distinct developmental splicing program and the functional implication. For this purpose, we examined the splicing profiles of nine in vivo purified samples representing seven different sensory neuronal subtypes. These include OSNs, rod and cone photoreceptors, somatic sensory ganglion neurons (from DRG and trigeminal ganglia), and visceral sensory ganglion neurons (from jugular and nodose ganglia). We also included three types of sensory receptor cells, isolated from the gut and taste buds. These are non-neuronal cells, but nevertheless possess certain properties resembling sensory neurons, such as expression of voltage-gated ion channels and electrical excitability (Supplementary Datasets 1 and 8). For comparison, we used four types of CNS neurons (motor neuron, dopaminergic neuron, Purkinje neuron, and cerebellar granule neurons) isolated from adult mice. When Splicescope analysis was applied to determine the maturation stage of each of these samples, we observed four distinct clusters of maturation stages reflecting four categories of cell types-CNS neurons, sensory neurons, sensory ganglion neurons, and non-neuronal sensory receptors-despite the fact that all these samples represent mature cells or neurons derived from adult mice (Fig. 6a). All CNS neurons were correctly assigned to the maturation stage 6. In contrast, the sensory receptor and sensory neuron samples were predicted to be very immature. For example, although the OSNs were isolated using a genetic reporter for the olfactory marker protein (a marker for mature OSNs), Splicescope analysis predicted that OSNs were in stage 1 (very immature). The somatic or visceral sensory ganglion neurons also deviate substantially from the adult cortex samples and the mature CNS neurons, although in this case they were still predicted as stage 6 of maturity. Given the pronounced difference of the maturation trajectory for the sensory and CNS cell types, we performed more detailed comparison of the sensory neuron and receptor splicing profiles. Strikingly, the sensory receptors, sensory neurons, and sensory ganglion neurons (collectively denoted sensory cells) differ most prominently from mature CNS neurons in that their early-switch exons in module M2 retain the very immature splicing pattern ( Fig. 6b and Supplementary Fig. 12a). In contrast, all cell types show more similar adult splicing patterns among the late-switch exons in module M1 (Fig. 6b). Quantitatively, when comparing the differentially spliced M2 exons between sensory cells and mature CNS neurons, 88-98% show a more immature splicing pattern in sensory cells ( Supplementary Fig. 12a, b). In contrast, when comparing differentially spliced M1 exons, sensory cells showed a much smaller bias (65-81%) towards the immature splicing pattern, and the magnitude of the differences is also much less pronounced. To obtain functional insights into this distinct molecular program, we performed GO analysis using exons differentially spliced between sensory cells and CNS neurons. This analysis revealed significant enrichment of genes involved in synapse, cell projection and neurite outgrowth, and cell-cell junction (Fig. 6c). Thus, sensory neurons and receptors cells retain an immature splicing program specifically in earlyswitch exons, but show a relatively mature splicing pattern for late-switch exons. This "chimeric pattern" is distinct from the CNS neuronal splicing profiles at either embryonic or mature stages.
We asked whether the distinct splicing profiles in sensory neuronal subtypes can be explained by the regulatory programs of the neuronal Rbfox, Nova, Mbnl, and Ptpb RBP families. Intriguingly, although these RBPs are expressed abundantly in most CNS neuronal subtypes, we found no expression of Rbfox1/ 3 or Nova1/2 (and only a low level of Rbfox2) in OSNs or in rod and cone photoreceptors, and no or very low expression of Nova1/2 in somatic or visceral sensory ganglion neurons ( Fig. 6d and Supplementary Fig. 13a, b). In contrast, all these cell types abundantly express Mbnl and Ptbp family members at a level comparable to or higher than the CNS neuronal subtypes. To confirm these observations, we examined OSNs by performing immunofluorescence analysis of olfactory epithelium dissected from adult mice using antibodies recognizing each Rbfox family member. Indeed, mature OSNs (labeled by Calmegin) lack immunoreactivity with the Rbfox1/3 antibodies while moderate expression of Rbfox2 is detectable (Fig. 6e), supporting the RNAseq data in single cells at the protein level. Furthermore, we confirmed the low expression of Nova1 on the basis of the Allen Brain Atlas in situ hybridization data in P4 DRG (Supplementary Fig. 13c; data on Nova2 expression is not available). On the other hand, no global deviation was observed in expression of the 372 assayed RBPs when comparing the various sensory cell types to mature CNS neurons or adult cortical tissues ( Supplementary  Fig. 13a, b and Supplementary Dataset 13).
Given the distinct RBP expression pattern in different neuronal subtypes, we predicted their maturation stages when restricting the analysis to only the splicing profile of target exons regulated by each RBP family. For OSNs and photoreceptors, using only the target exons of Ptbp, Nova, or Rbfox individually predicted very immature stages (stage 1 or 2), while using Mbnl targets predicted maturation stage 5. For somatic and visceral sensory ganglion neurons, using Nova targets only predicted a maturation stage of 1, while using the targets of other RBPs including Rbfox proteins (which are abundantly expressed in DRG) all predicted a maturation stage 5 or 6 ( Fig. 6f and Supplementary Dataset 8). These data suggest a model whereby OSNs and photoreceptors have insufficient expression of "pan-neuronal" RBPs (Nova and/ Fig. 4 A set of tissue-specific or neuron-specific RBPs regulate the timing of developmental splicing switches. a Dynamic expression of four families of tissue-specific RBPs, including Ptbp1/2, Nova1/2, Rbfox1-3, and Mbnl1/2. RPKM values are normalized based on the maximum expression value in each family separately and shown in color scale. b Integrative modeling to define the target alternative exons regulated by each RBP family. The Venn diagram summarizes target exons regulated by each RBP family. Note 11 exons regulated by all four RBP families and an additional 58 exons regulated by three RBP families. c Regulation of WGCNA module exons by each of the four RBP families. Activation and repression of an exon by each RBP resulting from integrative modeling analysis are indicated in red and blue, respectively. The total number of regulators for each exon is shown in the bar on the right in gray scale (the darker, the more regulators). d-f Gabrg2 exon 9 as an example in module M1 under combinatorial regulation by all four RBP families. The exon inclusion level in developing cotex is shown in d and changes upon depletion of Ptbp2 (P0) and Mbnl1/2 (adult) are shown in e. Inclusion of the exon in wild type (WT) and mutant (MUT) splicing reporters, in combination with overexpression of different RBPs, is shown in (f). Rbfox-binding and Mbnlbinding site sequences are shaded. RBP expression and exon inclusion were measured by immunoblot and RT-PCR, respectively. g RBPs either antagonize (Ptbp2) or facilitate (Nova, Rbfox, and Mbnl) the mature splicing pattern through activation or repression of exon inclusion. h Time of the maximal splicing switch for target exons regulated by specific RBPs (*p < 0.05, **p < 0.001, t test). Only exons showing a more mature (for Ptbp) or embryonic (for Nova, Rbfox, and Mbnl) pattern upon RBP depletion were included for this anlaysis. i Prediction performance of exon module membership based on regulation by each RBP family. The performance is measured by partial area under curve (pAUC) of the receiver operating characteristic (ROC) plot with a cutoff at false-positive rate (FPR) ≤0.1. j Changes of predicted maturation stages of mouse brain tissues upon depletion of RBPs or Rbfox) to promote the mature splicing pattern of early-switch exons but have an overwhelming abundance of suppressors (Ptbp), which together results in retention of an immature splicing program. Somatic and visceral sensory ganglion neurons show an intermediate state for early-switch exons because of expression of Rbfox but not Nova. On the other hand, the abundant expression of Mbnl is largely sufficient to promote the mature splicing pattern of late-switch exons in all neuronal subtypes we examined (Fig. 6g).
To validate this model, we used rat embryonic primary DRG neuronal culture to test whether exogenous Nova expression in this system is sufficient to promote the mature splicing pattern of early-switch exons as observed in CNS neurons. We generated a lentivirus which expresses a Nova1-GFP fusion protein (Fig. 7a). Five days post infection of dissociated DRG neurons with the Nova-GFP-expressing lentivirus, but not with the GFP (Mock)-expressing lentivirus, we detected robust expression of Nova at the mRNA and protein levels (Fig. 7b, c). Similar to the subcellular localization of the endogenous Nova protein in CNS neurons, Nova-GFP shows a predominant nuclear localization in DRG neurons, which is required for splicing regulation. We then examined splicing of five of the Nova target exons we identified previously 19 in the presence or absence of Nova expression. For all five Nova targets, the exon inclusion level is high in wild-type (WT) mouse cortex, but is dramatically reduced upon Nova2 depletion, suggesting Nova is essential to activate exon inclusion. Each of these exons shows low inclusion in DRG sensory neurons, similar to the Nova2 knockout (KO) cortex, presumably due to the lack of Nova expression. Importantly, for all five Nova targets, we found that ectopic expression of Nova is sufficient to switch the immature splicing pattern in DRG neurons to the mature pattern observed in WT cortex ( Fig. 7d and Supplementary Dataset 2). These data support the notion that a lack of Nova is necessary to maintain the "embryonic" splicing pattern for a subset of early switch exons in mature DRG sensory neurons.

Discussion
Coordinated regulation of gene expression at multiple levels dictates the numerous morphological and functional changes that occur at specific stages of neuronal development. Previous studies revealed a large number of AS changes in the developing brain 15,16,18 , but the temporal resolution of these studies is low and the underlying regulatory mechanisms are poorly understood. This work represents the first comprehensive analysis of the precise timing of dynamic splicing regulation during neuronal development on a global scale.
Our analysis uncovered two major waves of splicing switches that occur in the mouse brain around birth and in the first two postnatal weeks. We focused on a time window after completion of neurogenesis, and thus the bulk of the observed splicing switches occur during neuronal maturation and likely contribute to this important process. The developmental splicing changes can be driven by intrinsic factors or extrinsic factors, and these two mechanisms do not have to be mutually exclusive. A majority of the developmental splicing switches are monotonic and preserve the reading frame, suggesting that exon inclusion or skipping would generate specific protein isoforms in the adult brain relevant for the establishment and maintenance of homeostasis in mature neurons. This hypothesis is supported by the observation that genes with early splicing switches have distinct functions from genes that exhibit late splicing switches, consistent with the cellular events that occur at the corresponding developmental stages. Interestingly, we also noticed that developmentally regulated exons more frequently have increased inclusion, as opposed to increased skipping (Fig. 1a), so that in a majority of cases an additional peptide would be inserted into the protein product in mature neurons. The functional implication of this asymmetry is currently not clear, but it might be relevant for certain features of the neuronal proteome. For example, phosphorylation sites, which play important roles in signal  19 .
Analysis of a large panel of neuronal samples from diverse origins, including different subtypes of neurons purified in situ or differentiated in vitro suggests that the modular organization of the developmental splicing profiles we initially identified in cortical tissues reflects a pan-neuronal program in the CNS. At the tissue level, developmental splicing changes are much more dramatic than regional variations in different parts of the brain (see ref. 37 for similar observations at the gene expression level). With regard to cellular heterogeneity (in a particular brain region), while some developmental splicing changes observed in brain tissues might occur in non-neuronal cells such as glia, neurons have a particularly large number of cell-type-specific alternative exons 10,11 , and the inclusion of these exons must be established during development. Indeed, purified neurons of specific subtypes such as cPNs and spinal motor neurons exhibit similar developmental splicing changes as compared to developing cortical tissues. Interestingly, when we examined specific types of glial cells at different developmental stages, we did not observe systematic developmental splicing changes on a global scale ( Supplementary Fig. 4).
This study suggests a new dimension to the role of tissuespecific splicing factors in regulating the timing of developmental splicing switches and in defining neuronal maturation stages. Multiple neuron-specific RBP families including Nova, Rbfox, and Ptbp, potentially combined with additional RBPs such as Elavl, Celf, and nSR100, regulate early splicing switches, sometimes in a synergistic manner (Fig. 4d-f). On the other hand, we have so far only identified Mbnl and, to some extent, Rbfox as regulators of late splicing switches on a global scale. Together, we found that regulation of splicing by these RBPs can predict the timing of splicing switches with high accuracy. Although one would expect that similar predictions might also be made from the steady-state mRNA level, due to coordinated regulation of gene expression at multiple levels, splicing-based prediction can provide critical insights into the neuronal maturation process that cannot be obtained by analysis of the steady-state mRNA level. In line with this argument, mutations in RBPs, including the ones we examined here, have been implicated in several neurological disorders 38 . For example, sequestration of MBNL by repeat RNA expansion containing its binding sites is the cause of myotonic dystrophy, which manifests as global defects in splicing and polyadenylation in the muscle and in the CNS but only moderate gene expression changes 20,39 . Mutations in the Rbfox1, 2, and 3 genes have been found in human patients with several neurodevelopmental or neurological disorders, including autism, schizophrenia, and epilepsy 40 . In a recent study, we investigated the function of Rbfox during neuronal maturation by generating Rbfox1-3 triple KO mouse ESCs, which were differentiated into spinal motor neurons. These neurons retain an embryonic neuron-like splicing program, and the abnormality causes defects in neuronal electrophysiology that is normally established during neuronal maturation 41 . Interestingly, we found dramatic disruption in developmental splicing switches with few changes at the steady-state mRNA level.
A surprising finding of this study is that sensory neurons exhibit a splicing program distinct from CNS neurons. Mature sensory neurons (including OSNs and photoreceptors, as well as somatic and visceral sensory ganglion neurons) lack early splicing switches, resulting in a splicing profile reminiscent of immature CNS neurons. This distinct splicing program is presumably due to the fact that sensory neurons do not express all of the "panneuronal" RBPs observed in the CNS (e.g., Rbfox and Nova); consequently, exons directly regulated by these RBPs do not undergo developmental splicing switches in these cell types. In support of this idea, we found that introduction of Nova into primary rat DRG neurons is sufficient to promote the mature splicing pattern of at least a subset of early switch exons that is observed in CNS neurons.
Why do sensory neurons have a distinct splicing program compared to CNS neurons? One possibility is that this molecular distinction might be specified early due to a difference in embryonic origins of these cell types. While CNS neurons originate from the neural plate, with the exception of the photoreceptors, all sensory neurons and sensory ganglion neurons we examined are of placodal or neural crest origin 43 . Photoreceptors derive from the optic vesicles that are also developmentally separated from the "canonical" CNS (see also Supplementary Discussion). Thus, the differential expression pattern of RBPs may be imposed by the tissue and environment of origin and CNS neurons are distinct in this respect. Alternatively, the distinct splicing program of neuronal subtypes in the sensory systems might have evolved to accommodate certain unique cellular properties, such as the relatively high regenerative capacity and plasticity of these cell types. By virtue of their direct interaction with the environment, sensory systems are the "first responders" when it comes to adapting to new environmental pressures, which may require an 'immature' molecular program to maintain the neurons in a state that permits a more flexible molecular adaptation to new environmental challenges. Moreover, with few exceptions, mature neurons in the mammalian CNS have only a limited capacity for self-repair and regeneration, yet many sensory neurons regenerate following injury or as part of their developmental program 44 . For example, the continuous renewal of OSNs and their neural projection to the olfactory bulb is a salient feature of the olfactory system 45,46 . In addition, most somatic and visceral sensory neurons can regenerate their peripheral axons after nerve injury 47 . Although photoreceptors, like the other CNS neurons, have limited regenerative capability in mammals, they can readily regenerate in lower vertebrates 48 , and certain subtypes of retinal ganglion neurons in mice, which share the developmental origin with photoreceptors, also have regenerative potential 49 .
Genes and signaling pathways that potentially underlie the different intrinsic growth capacities of sensory and CNS neurons have been intensely investigated for decades [50][51][52] , with the hope that such knowledge can be leveraged to enhance the regenerative capability of CNS neurons. The distinct splicing program we identified in sensory neurons might inform post-transcriptionally regulated molecular pathways related to neuronal regeneration. Consistent with this notion, it was previously noted that Rbfox3 is transiently downregulated in motor neurons upon axotomy, while recovery of the Rbfox3 expression level follows peripheral axonal regeneration and muscle reconnection 53 . Similarly, our meta-analysis of published transcriptome profiling data 50,54 suggest that Ptbp1 is up-regulated in DRG sensory neurons after sciatic nerve lesion, and its expression is also highly correlated with the extent of axonal growth of injured DRG sensory axons. Thus reduction of Rbfox expression or increase in Ptbp1 may transiently enable a more immature splicing program in these injured neurons to promote axon regrowth. Importantly, differentially spliced exons in sensory neurons are highly enriched in genes involved in neuronal projection and synaptic function, and which were recurrently identified in recent studies to be required for axon regeneration upon injury 50,51 . Taken together, a further exploration of the distinct splicing patterns in CNS and sensory neural subtypes may well shed light onto the molecular mechanisms underlying their intrinsic capacity of regeneration, plasticity, and function.

Methods
Generation and compilation of RNA-seq data. A summary of RNA-seq data generated for this study or compiled from the public domain is provided in Supplementary Dataset 1.
To determine the dynamics of the mammalian brain transcriptome in depth, we performed RNA-seq using mouse cortex at 9 developmental stages (E14.5, E16.5, P0, P4, P7, P15, P30, 4 months, 21 months), each stage in duplicates using the standard Illumina TruSeq poly-dT protocol. All animal-related procedures were conducted according to the Institutional Animal Care and Use Committee (IACUC) guidelines at the Columbia University Medical Center and Rockefeller University. All RNA samples used for this analysis have RNA integrity number (RIN) >8.5. In total, we obtained 987 million paired-end (PE) 101-nt reads (54.8 million per sample on average) 10 .
To further evaluate neural maturation based on developmental regulated alternative exons, we collected 346 public RNA-seq samples that can be classified into three categories: cortex or spinal cord tissues, purified neuronal subtypes, and ESC-derived neurons. Technical or biological replicates were merged to obtain a final list of 111 samples, of which 41 are from tissues, 39 are purified neurons, and 31 are ESC-derived neurons. In total, these data are composed of about 13 billion reads or read-pairs, providing an unprecedented depth and scope to study dynamic splicing changes during neural development.
RNA-seq data derived from Ptbp2 WT/KO and Nova2 WT/KO brains were obtained from published studies 21,55 . For Mbnl, the mammalian brain expresses To compare the splicing profiles of neurons and glial cells, we obtained the RNA-seq dataset that profiled all major CNS cell types in the mouse cortex 11 . To evaluate potential differences of pyramidal neurons and interneurons, we used a single-cell RNA-seq dataset derived from primary visual cortex of adult mice 56 .
Analysis of RNA-seq data and quantification of AS. All RNA-seq data were mapped by OLego v1.1.2 (ref. 23 ) to the reference genome (mm10) and a comprehensive database of exon junctions was provided for read mapping. Only reads unambiguously mapped to the genome or exon junctions (single hits) were used for downstream analysis.
To quantify AS, we used a comprehensive list of both known and novel AS events, as we described previously 10 . Inclusion of known and novel alternative exons (percent spliced in or Ψ) was then quantified based on the number of exonjunction reads using the Quantas pipeline (http://zhanglab.c2b2.columbia.edu/ index.php/Quantas), as we described previously. To reduce uncertainty in estimating Ψ, we kept the estimation only for exons with read coverage ≥20 and standard deviation <0.1 (based on binomial distribution). Gene expression was quantified using the same pipeline. For all quantifications, biological replicates were combined. For the single-cell RNA-seq analysis 56 , we used cell types defined in the original paper, and pooled cells that were assigned to each cell type as core members for AS quantification.
To identify exons with differential splicing in two compared conditions, we evaluated the statistical significance of splicing changes using both exonic and junction reads that support each of the two splice isoforms. For the pairwise comparisons of different stages of the developing cortex, we used the standard Fisher's exact test by pooling read counts of the biological replicates. The remaining RNA-seq datasets used to measure differential splicing upon depletion of specific RBPs or comparing different subtypes of neurons (with an exception of the Ptbp2 KO because the Emx:Cre sample does not have replicates) were analyzed with an updated version of the Quantas pipeline using a generalized linear model (GLM), as described previously 12 . Conceptually, the advantage of the GLM method is that it explicitly models the variation across biological replicates. In practice, we found the results from the GLM and Fisher's exact test to be highly similar, with the GLM method being slightly more stringent. The FDR was estimated by the Benjamini-Hochberg procedure 57 . An AS event was called differentially spliced in the two compared conditions with the following criteria: coverage ≥20, Benjamini FDR ≤0.05 and |ΔΨ | ≥ 0.2 (to identify developmentally regulated exons or neuron subtype-specific exons) or 0.1 (to identify RBP-dependent exons).
To identify exons with developmental splicing changes, we performed pairwise differential splicing analysis among different stages during cortex development. An exon is called to have developmentally regulated splicing if it is differentially spliced in at least one comparison (Fig. 1a). Developmentally regulated exons in cPNs 15 , motor neurons [28][29][30] , and ESC-derived glutamatergic neurons 16 (Fig. 2b) were identified similarly. For each of these datasets, we also identified developmentally regulated exons with monotonic splicing changes if all significant changes occur in the same direction.
For detailed analysis, we focused on a subset of 77,950 non-redundant cassette exons, including 13,500 cassette exons identified from previous expressed sequence tag/complementary DNA (EST/cDNA) sequences (denoted known cassette exons) and 64,450 cassette exons identified from brain RNA-seq data de novo (denoted novel cassette exons). Methods to identify non-redundant cassette exons were described previously 10 .
AS conservation and purifying selection pressure. For each cassette exon observed in the mouse transcriptome, we determined whether it has conserved splicing pattern in human and whether it is under strong purifying selection pressure in mammals, both as described previously 10 . In brief, AS events in human were similarly identified using cDNA/ESTs and RNA-seq data derived from developing human brains. Selection pressure of each exon was quantified based on cross-species conservation in the synonymous position of the alternative exons as well as in flanking intronic sequences in 40 sequenced mammalian species 58 . A subset of exons with the highest conservation was determined to undergo strong purifying selection pressure.
WGCNA to identify exon modules. WGCNA (version 1.34) 26 was performed on the developing cortex data using the splicing profiles of the 2883 developmentally regulated exons. Pearson's correlations between exons were calculated and raised to power 3 to determine the adjacency matrix. Exon modules were identified with default parameters, followed by automatic merging of modules with similar eigenvectors (using dissimilarity threshold = 0.25, which is the default). This analysis initially resulted in five modules (Supplementary Fig. 2). Inspection of the resulting modules suggested that modules 4 and 5 show similar temporal patterns (higher/lower exon inclusion between P0 and P7). Therefore, these two modules were manually merged to form the final module M4 reported in the paper and its module eigenvector was re-calculated. The final module assignments are provided in Supplementary Dataset 5.
The correlation between each individual exon and the eigenvector of the corresponding module was calculated and used to measure the membership of each exon to the module. For each module, a subset of exons with the highest correlation with the module eigenvector was defined as core module members (correlation p < 0.001, corresponding to Pearson's r = 0.9 approximately).
Determine the precise timing of splicing switches. For each exon, we parameterized the temporal pattern of exon inclusion levels by fitting a sigmoidal curve: in which a and k are the low and high exon inclusion levels during development and t is time point represented by post-conception days in log 10 scale. Following this definition, the parameter m is the time point when the maximal splicing switch occurs.
Parameters of the sigmoidal fit were estimated by nonlinear least squares curve fitting in R (using the port algorithm). Quality of fit was measured by the normalized residual: where i is the index of the developmental time point. We applied this method to core members in modules M1 and M2. The sigmoidal fit was considered to be reliable for 964 exons satisfying ε < 0.15, k−a > 0.2, and m > 0.
Sliding window analysis of functional annotations. Developmentally regulated exons were ranked according to the timing of the maximal splicing switch obtained from sigmoidal fit (parameter m), as described above. For the results presented in the paper, we used a sliding window size of 300 exons to obtain lists of foreground genes. The background gene list for comparison is composed of all genes with cassette exons with a sufficient coverage in the cortex (coverage ≥20). GO terms were downloaded from http://git.dhimmel.com/gene-ontology/. The enrichment of each GO term among genes in each sliding window was assessed by a hypergeometric test. Benjamini-Hochberg correction was applied to obtain the final FDR (correcting for 14,514 terms and 665 sliding windows), and only GO terms with corrected FDR ≤ 0.005 were kept for further analysis. Each significant GO term was represented by their log-transformed FDRs across all sliding windows and hierarchical clustering was performed to group GO terms showing similar temporal patterns of enrichment (Fig. 1e in the main text and Supplementary Dataset 6).
We also used the same sliding window analysis to find enrichment of genes with additional functional annotations ( Supplementary Fig. 3). These include genes encoding presynaptic proteins 59 and genes encoding PSD, components of the mGluR5 and the NRC/MASC complexes (http://www.genes2cognition.org) 60 . We also included genes implicated in autism obtained from two sources: genes compiled in the SFARI gene database 61 and genes with likely gene-disrupting mutations in autism patients as determined by exome sequencing [62][63][64][65][66][67] .
For comparison, standard GO analysis was also performed using core exons of each module as foreground, and the same list of genes with brain-expressed cassette exons as background. Statistical tests and multiple test correction were performed as described above and the significant terms (FDR ≤ 0.05) are shown in Supplementary Dataset 7. Similar GO enrichment analysis was performed for genes with differentially spliced exons between the sensory cells and mature CNS neurons (Fig. 6c).
Neuronal maturation stages predicted by splicing profiles. We developed a computational tool named Splicescope to evaluate neuronal maturation using developmental splicing profiles and made it available at http://zhanglab.c2b2. columbia.edu/index.php/Splicescope.
For this purpose, we first defined six distinct maturation stages from the mouse cortex data E14.5, E16.5, P0, P4, P7, and P15 or older, which are represented by stages 1-6. P15 or older were grouped as one stage because of high correlation between samples after P15 (Pearson's correlation r > 0.95). We did not name the stages using the actual ages because developmental timing of different subtypes of neurons can be different in vivo (e.g., maturation of spinal motor neurons is in general earlier than cortical neurons). For each sample, we obtained the splicing profile of the 1909 known module cassette exons defined in the cortex reference, which was used to assign the sample to a specific maturation stage by comparison to the cortex reference, using a two-step strategy.
Considering that different exons may have different contributions toward defining specific maturation stages and that the range of exon inclusion is (0, 1), we first used a β regression method (betareg in R) 68 where E(y i ) = μ i and var y i ð Þ ¼ μ i 1 À μ i À Á = 1 þ ϕ ð Þ. We then modeled the expectation of the inclusion level of each module exon with a β regression: where β=(β 1 , β 2 , …, β t ) T is the vector of regression parameters for the nine time points, x ij is the inclusion level of exon i in the reference sample j, and g(u) is a link function that maps (0,1) to any real number. Here we used the logit function g μ ð Þ ¼ log μ= 1 À μ ð Þ ð Þin the model. We fitted the β regression model to estimate parameters β and ϕ for each sample, and to obtain the fitted inclusion levelμ i for each exon i. Sinceμ i is determined by a linear combination of the inclusion levels of the exon in the cortex reference, the regression essentially projects the sample into the subspace spanned by the cortex reference samples, so that we can evaluate the distance of a new sample to each of the reference samples in this subspace.
In the second step, we defined sample distance D j as the Euclidean distance of projected sample (the fitted exon inclusion levels) and each reference sample j.
iβ . The maturation stage of the sample was then assigned by the k-nearest neighbor method (k = 1 for this study) using D=(D 1 , …, D t ) as the distance measure. The prediction confidence score S for each sample was calculated as: This method was applied to evaluate the overall maturation of each sample. A prediction is considered to be correct if the predicted stage is the same as the true stage, or immediately adjacent to account for ambiguity in assignment of the true stage (the actual age of donor mice sometimes falls between any two consecutive ages of the reference samples).
For data visualization, we also applied principal component analysis (PCA, using the princomp package implemented in R) to the developing cortex samples and projected all the other samples to the space defined by the first two principal components.
To correlate RBPs to neuronal maturation, we quantified the expression of 372 RBPs obtained from RBPDB 69 and ranked these RBPs based on the correlation of their expression with predicted maturation stages (Supplementary Fig. 13a and Supplementary Dataset 13). Differential expression analysis between each sensory cell type and mature CNS neurons was performed using the edgeR method included in the Quantas pipeline ( Supplementary Fig. 13b).
To evaluate the contribution of the four specific RBP families, we focused on neuronal maturation, and we also predicted neuronal maturation stages based on the splicing profile of their direct target exons defined by the Bayesian network analysis (see below) using the same approach ( Fig. 6e and Supplementary Dataset 8).
Motif enrichment analysis. We performed hexamer enrichment analysis using upstream or downstream intronic sequences (200 nt in each region) of the core module exons in M1 and M2, as compared to all cassette exons used as a control. For this analysis, repeat masked sequences were extracted, and the enrichment of each hexamer was evaluated using a hypergeometric test (Supplementary Fig. 6). Similarly, we also evaluated the enrichment of motif sites for Nova, Rbfox, Mbnl, and Ptbp in core module exons as compared to all cassette exons using previously established consensus motifs ( Supplementary Fig. 7).
Splicing-regulatory networks of Nova, Rbfox, Mbnl, and Ptbp. We used an integrative modeling approach we previously developed to define direct target exons of each RBP family, as we previously demonstrated for Nova and Rbfox 18,19 . In brief, this approach employs a Bayesian network to weigh and combine multiple types of data, including evidence of protein-RNA interactions as determined by CLIP data and bioinformatics predictions of motif sites, evidence of RBPdependent splicing as determined by RNA-seq or microarrays, and several evolutionary signatures related to regulated AS.
For this study, we used the Nova target network we previously defined 19 . Among the 363 Nova target cassette exons we originally identified in mm9, 359 exons were annotated in our current database based on mm10 and used in this analysis. An updated version of the Rbfox target network was built to incorporate recently published RNA-seq data. We also extended this approach to define the Mbnl and Ptbp target networks. To simplify analysis and presentation, we limited our analysis to~16,000 known cassette exons annotated from mRNA/ESTs. This is because novel exons discovered de novo in RNA-seq analysis are frequently not covered by exon or exon-junction microarrays and they were not included in our previous analysis of the Nova network. More details for building each network are described below and the results of prediction are summarized in Supplementary Dataset 9.
For each network, we evaluated the performance using sensitivity and specificity (Supplementary Dataset 10). We estimated sensitivity as the percentage of recovered validated target exons, which were compiled in previous studies. Specificity was estimated as the percentage of not-recovered non-validated targets. Since a subset of exons used for this evaluation was used for training of the full Bayesian network model, we obtained the prediction FDRs of these training exons by 10-fold cross-validation, as described below, to avoid potential biases. We note both sensitivity and specificity are conservative estimates, because some validated exons might not represent direct targets of the RBP, and only a small subset of bona fide target exons were previously validated.
Rbfox: In the updated network, the following datasets were used to model evidence of splicing changes in the Bayesian network analysis: control vs. Rbfox2 knockdown in C2C12 cells before and after differentiation 70 , and control vs. RBFOX2 knockdown in HeLa cells 18 . Significance of splicing changes was estimated using the Quantas pipeline as described above. Since splicing changes were modeled using a normal distribution in the Bayesian network, we transformed the p values using the inverse normal cumulative distribution function "norminv(p/ 2)" in R, where p is the p value determined by RNA-seq analysis, and assigned the sign of the direction calculated for ΔΨ. The other datasets, including Rbfox CLIP tag cluster score and motif score, as well as evolutionary signatures, were kept the same as in our previous analysis 18 .
For training of the Bayesian network, we used the list of 121 validated Rbfox target exons we compiled previously. We also included 765 exons showing splicing changes in at least one RNA-seq dataset (509 exons activated and 256 exons repressed), and 500 randomly sampled exons without evidence of Rbfox-dependent splicing but with sufficient read coverage in at least one RNA-seq dataset. Only the 121 validated target exons were assigned a class label during training. For prediction, each cassette exon was assigned three probabilities: activation or repression by Rbfox or no direct regulation, from which an FDR of target prediction was derived. In total, 732 exons were predicted as direct Rbfox targets with FDR <0.01. Among these, we were able to determine the direction (i.e., activation or repression) of splicing regulation for 654 exons with probability of activation or repression ≥0.7.
To ensure that the Bayesian network model is not overfit, we performed 10-fold cross-validation and compared the results predicted by the cross-validation models and the full model, as we previously described 18,19 , which gave very similar results (Supplementary Fig. 8; the same for the other RBPs described below).
Mbnl: To minimize the redundancy of splicing regulation by Mbnl1 and Mbnl2, we generated RNA-seq data to compare frontal cortices of adult Mbnl1/2 dKO and control mice (Supplementary Dataset 1). We also used three additional RNA-seq datasets from published studies: WT vs. Mbnl2 KO mouse in hippocampus 20 (dataset 2) and WT vs. Mbnl1 KO mouse in muscle and heart 71 ( Supplementary Datasets 3 and 4). Similar to the Rbfox network, we used the norminv(p/2) as the representation of splicing change scores in the Bayesian network analysis. We assigned splicing change scores in datasets 2-4 derived from non-cortex tissues only for exons with read coverage ≥20 (Supplementary Dataset 2) or 15 (Supplementary Datasets 3 and 4), to avoid punishment on Mbnl target exons more specifically expressed in the cortex.
We obtained predicted Mbnl-binding clustered YGCY motif site scores from a previous study 72 . In brief, the mCarts algorithm was used to integrate the number and distance of YGCY elements, their cross-species conservation and secondary structure using a hidden Markov model (HMM). To score Mbnl CLIP tag clusters as evidence of Mbnl binding, we combined the unique CLIP tags for Mbnl2 in hippocampus 20 and Mbnl1 in C2C12 cells 71 . We derived motif site and CLIP tag cluster scores with respect to splice sites in the alternatively spliced region as described previously by weighing the strengths of individual sites and their distance to the splice sites 18,19 .
For training of the Bayesian network, we used 964 exons showing splicing changes in at least one RNA-seq dataset (546 activation and 418 repression). In addition, 500 exons with sufficient read coverage in at least one RNA-seq dataset but no evidence of Mbnl-dependent splicing were also included. We compiled 94 unique cassette exons that have been previously validated as Mbnl1 or Mbnl2 targets in human, mouse, or rat; among them 59 exons show splicing changes in at least one RNA-seq dataset and were assigned a class label. The remaining exons were unlabeled during training. For this study, we were able to predict 295 exons as direct Mbnl targets with FDR <0.02.
Ptbp:We used the following datasets to determine Ptbp2-dependent splicing in the mouse brain: RNA-seq of WT and CNS-specific Ptbp2 KO brains derived from either Nes-Cre or Emx-Cre drivers 21 , WT vs. Ptbp2 germline KO brains as measured by Affymetrix exon arrays or exon-junction arrays 22 . We used the norminv(p/2) as the representation of splicing change scores in the Bayesian network analysis, as described above.
We also predicted Ptbp2-binding motif sites using mCarts 72 . In brief, we used Ptbp2 neocortex HITS-CLIP data 22 to identify 5341 CLIP tag clusters with peak height ≥6 tags. Regions defined by these peaks extended by 50 nt in both directions were used for the positive training set of Ptbp2-binding sites by the HMM. In addition, 112,202 regions (exons extended by 1 kb in both directions) containing no CLIP tags were used as the negative training set. We ran mCarts using the UCUY as the consensus motif recognized by Ptbp 22,73 and scored the predicted Ptbp-binding clustered UCUY motif sites for each cassette exon as described previously 18,19 .
For training of the Bayesian network, we compiled 63 unique cassette exons that have been previously validated as Ptbp1 or Ptbp2 targets in human, mouse, or rat. We included 992 additional exons that show significant change in at least one of the RNA-seq or microarray datasets as a positive training set (|ΔΨ| ≥ 0.1 and FDR ≤ 0.01 for RNA-seq, p ≤ 0.01 for exon arrays, and |dIrank| ≥ 8 for junction arrays). For the negative training set, we randomly selected 300 exons with sufficient coverage for at least one RNA-seq dataset and that were not significantly differentially spliced in any of the four datasets. Only the validated target exons were assigned a class label during training. For this paper, we defined 469 exons as direct Ptbp targets with FDR < 0.01.
Overlap between WGCNA module exons and RBP targets. To evaluate how well regulation by each RBP family predicts the timing of developmental splicing switches, we ranked all known cassette exons based on the predicted probability of activation or repression by each RBP (resulting from the Bayesian network analysis), and used the ranking to predict core members of module exons in M1 and M2 (e.g., M1+ core vs. remaining exons) with different thresholds. The resulting ROC curve was evaluated by the partial area under curve (pAUC) focusing on the left side of the curve with high specificity (false positive rate or FPR <0.1); pAUC was scaled so that perfect prediction will give a pAUC of 1 (Fig. 4i in the main text). In the second approach, we obtained the list of exons activated (or repressed) by each RBP using stringent thresholds in the Bayesian network analysis as described above and performed a gene set enrichment analysis 74 using exons ranked by their module membership ( Supplementary Fig. 9). The contribution of Elavl 35 and nSR100 (ref. 36 ) for developmental splicing switches was also evaluated using pAUC ( Supplementary Fig. 11), but in this case, exons were ranked by the extent of RBP-dependent splicing, as determined by exon-junction microarrays (ΔIrank; ref. 35 ) and RNA-seq (FDRs), respectively.
Construction of a neurodevelopmental splicing code. To predict developmental splicing switches with specific timing, we constructed a database of 1240 features relevant for splicing regulation for each known cassette exon. These include sequence features (exon lengths, exon and intron conservation, RBP motif scores (based on mCarts 72 or motif enrichment or conservation score, or MECS 18 for the four RBPs we focused on and RNAcompete 75 for many additional RBPs), monomer, dimer, and trimer counts, splice site strength, exon and intron regulatory element counts, and exon NMD potential), CLIP features (CLIP tag cluster scores in each alternative exon and upstream and downstream flanking introns), perturbation features (differential splicing upon genetic-based or cell-based depletion of RBPs, as measured by RNA-seq or microarrays), and results of Bayesian network predictions (probabilities of target and non-target for each RBP). More details about these features are provided in Supplementary Datasets 11 and 12.
We decided to use random forest to predict module membership and directions for core member exons in modules M1 and M2 using all or subsets of features we compiled. The advantages of random forest include its flexibility in accepting both quantitative and qualitative features, its superior classification performance demonstrated in many application domains, and the ability to provide a measure of feature importance 34 . Specifically, we used the randomForest package (version 4.6.10) 76 in R for model construction and prediction. Four separate models for M1 +, M1−, M2+, and M2− were constructed for binary classification of whether an exon belongs to a module of the specified direction or not (e.g., M1+ vs. not M1+). For each model, the positive training set consisted of core member exons in the currently tested module and direction, while the negative training set consisted of all remaining cassette exons annotated in mouse, with the exception of the noncore exons in the current module, because their classification is ambiguous.
Random forest is in general very robust with regard to the choice of model parameters. When we trained the models, we performed stratified sampling to obtain the same number of positive and negative training exons. We also varied different parameters including the number of trees (ntree) in the forest and the number of features per tree (mtry), and found that the results are in general very stable ( Supplementary Fig. 10). For results presented in this paper, we used ntree = 1000 and mtry = 300 when we used all features to build the models (Fig. 5a, c). For each exon, the prediction is represented by the out-of-bag (OOB) probability based on a bootstrap procedure used by randomForest. This is essentially a crossvalidation procedure, in which each decision tree is only used to predict independent samples not used for training of the tree, so it provides an unbiased measure of prediction confidence. The performance of each model was evaluated by the area under the ROC curve (AUC) using the ROCR package (version 1.0.7) 77 . The importance of each RBP for prediction was evaluated using the reduction of Gini impurity (Fig. 5b and Supplementary Dataset 12). We note that each exon might be predicted as positive by multiple models depending on the threshold (e.g., both M1+ and M2+ with different confidence). To resolve ambiguity, OOB probabilities were transformed into FPR and the prediction with the minimal FPR was chosen to obtain the final class label. This approach was used to generate results presented in Supplementary Fig. 10.
We also built random forest models using subsets of features such as sequence features (Seq_all, mtry = 100) and features related to regulation by the four RBP families we focused on (RBP4, mtry = 2) (Fig. 5a).
Gabrg2 exon 9 splicing reporter assay. The WT GABA A receptor minigene was generated in a previous study 78 . To generate the mutant minigene, the WT vector was cleaved using XbaI restriction enzyme and a DNA fragment (gBlock, IDT) with all three TGCT Mbnl-binding sites mutated to TGGT was integrated back into the vector backbone together with a PCR product containing a part of the WT minigene using InFusion cloning mix (Clontech). Sequences used for cloning are listed in Supplementary Dataset 2. All vectors and mutations were confirmed by Sanger sequencing.
Primary DRG neuron culture and lentivirus transduction. All animal work was conducted in accordance with NIH guidelines for laboratory animal care and approved by the IACUC of Columbia University. DRG neurons were isolated from rat embryos (E15) and plated on the poly-L lysine/laminin substrate. Cells were maintained in the media containing: Neurobasal media, B27, 2 mM glutamate, 20 μM 5′-fluorodeoxyuridine, and 50 ng ml −1 nerve growth factor. Sensory neurons were transduced with lentivirus at multiplicity of infection = 20. Media were exchanged 24 h post infection and every other day thereafter. Cells were collected 5 days post infection for imaging analysis and RNA isolation. Three replicate experiments were performed to quantify Nova expression and to test Novadependent splicing of five selected exons.
For immonostaining, DRG sensory neurons were fixed (4% PFA in PBS) and blocked for 20 min at room temperature (blocking buffer 1× PBS, 10% horse serum, 0.2% Triton X-100, 0.05% sodium azide). Primary antibodies were diluted in Antibody buffer (1× PBS, 5% horse serum, 0.2% Triton X-100, 0.05% sodium azide) and incubated overnight at +4°C. The following primary antibodies were used at the specified dilution: Nova1 (Abcam, ab183723, 1:250) and Neurofilament 2H3 (DSHB, 1:1000). After three PBS washes secondary antibodies were applied, samples were incubated for 2 h at 4°C. After three PBS washes, coverslips were mounted on slides using DAPI Fluoromount-G (SouthernBiotech). Images were taken using a Leica SP8 system. RNA isolation and cDNA preparation. Total RNA was isolated with Trizol reagent (Invitrogen) using the manufacturer's instruction. cDNA was prepared using SuperScript III reverse transcriptase (Invitrogen) with oligodT or random hexamer primers. To measure exon inclusion, alternative exons of interest were amplified with primers listed in Supplementary Dataset 2. PCR products were resolved on 1.5-2% agarose gel. quantitative PCR was performed using FastStart SYBR Green Master (Roche) on CFX96 Real Time System (BioRad).
Code availability. The Splicescope software package is freely available for noncommercial uses at http://zhanglab.c2b2.columbia.edu/index.php/Splicescope and in GitHub https://github.com/chaolinzhanglab/splicescope. Data availability. The RNA-Seq data generated in this study have been deposited to NCBI Short Read Archive under accessions SRP055008 and SRP142522.