Introduction

Fused in sarcoma (FUS, also referred to as Translocated in liposarcoma (TLS)), is a member of the FET family of RNA-binding proteins (RBP) that contain multiple domains with a potential for RNA binding, including an RRM domain, zinc-finger domain and three RGG boxes1. Cytoplasmic inclusions containing FUS are the pathological hallmark of a subset of patients with amyotrophic lateral sclerosis (ALS) and frontotemporal lobar degeneration (FTLD)2,3,4,5,6. ALS-associated mutations in FUS gene are most often located in the nuclear localization sequence, which inhibits the import of FUS protein into the nucleus and promotes the formation of cytoplasmic aggregates within affected neurons2,7. In cell culture, FUS was found to co-localize with TAR DNA-binding protein 43 (TDP-43), another RNA processing protein with mutations and common pathologic inclusions in ALS and FTLD8,9. Biochemical studies confirmed that a fraction of TDP-43 is in complex with FUS10,11 and both proteins are part of the large Drosha complex that is involved in miRNA biogenesis12. However, the two proteins do not co-localize within the pathologic inclusions4 and studies in yeast show that TDP-43 and FUS do not influence the aggregation of each other13. It is therefore unclear if the two proteins cooperate in recognizing their RNA targets and employ related mechanisms to regulate gene expression in the brain.

Here we performed individual-nucleotide resolution crosslinking and immunoprecipitation (iCLIP) with FUS, TDP-43 and U2 small nuclear RNA auxiliary factor 2 (U2AF65). We found that in contrast to the highly clustered binding of TDP-43 and U2AF65, FUS binding is distributed across the whole length of pre-mRNAs. All three proteins had increased RNA binding towards the 5′ end of introns, indicating that they are recruited to the nascent RNA soon after its transcription. Whereas binding of TDP-43 and U2AF65 is strongly determined by the RNA sequence, FUS has a very limited sequence preference for G-rich sequences. In agreement with the different sequence specificity of FUS and TDP-43, we did not find a significant overlap between their binding sites. Nevertheless, we found that both proteins regulate genes involved in neuronal development.

Results

FUS binds along the whole length of nascent RNA soon after its transcription

In order to compare the RNA binding of FUS and TDP-43, we determined their transcriptome-wide binding maps in E18 mouse brain using iCLIP14 (Supplementary Fig. S1a). As control, we performed iCLIP with a general splicing factor U2 small nuclear RNA auxiliary factor 2 (U2AF65) as well as without any antibody to test for non-specific binding. We identified 3.5, 4.4 and 4.0 million unique cDNAs with FUS, TDP-43 and U2AF65 iCLIP, respectively, but only 6,000 with the no-antibody control (Tables S1, S2). Since we obtained a similar number of iCLIP cDNAs with the three proteins, which all primarily bind to introns (Fig. 1a), we comparatively evaluated the binding of the three proteins in this study. A past study found that FUS binds to non-coding RNAs produced from the region 5′ to the CCND1 gene to repress transcription of this gene15. We detected binding of FUS to antisense RNAs in this region, but similar binding was also seen for TDP-43 or U2AF65 (data not shown). Interestingly, all three proteins had increased binding to antisense RNAs upstream of transcription start sites of protein-coding genes and the enrichment of FUS was no greater than TDP-43 or U2AF65 (Fig. 1b).

Figure 1
figure 1

FUS binds along the whole length of nascent RNAs.

(a) The proportion of cDNAs (out of all cDNAs that mapped to the mouse genome) from the FUS, TDP-43 and U2AF65 iCLIP experiments that mapped to different RNA regions. (UTR: untranslated region, ORF: open reading frame, ncRNA: non-coding RNA). (b) The map of FUS, TDP-43 and U2AF65 iCLIP crosslink sites at the 5′ regions of all protein-coding genes annotated by ENSEMBL 59, from 1 kb upstream of transcription start to 200 nts downstream. The data of each protein was first normalised to the average crosslinking on the sense strand within this region and then to the crosslinking of U2AF65 within the interval 150-200 nts downstream of the transcription start site. Crosslinking distribution was then evaluated in 50 nt intervals. The solid lines show binding to antisense and the dashed lines of the same colour show binding to the sense RNAs, relative to the orientation of the downstream gene. (c) The map of clustered crosslink sites (flank = 15) of FUS, TDP-43 and U2AF65 iCLIP identified at intron-exon junctions of all exons annotated by ENSEMBL 59. Crosslinking enrichment was determined by comparing to the average crosslinking of the same protein in the intronic region of 300–100 nts upstream of exons. (d) The average gene-normalized iCLIP cDNA density in 5 kb intervals of introns longer than 100 kb, at different distances relative to 3′ splice sites. (e) Gene-normalized cDNA density in 0.5 kb intervals within the Gabrb3 gene. The cDNA density in each interval was normalized to the average cDNA density within the whole gene. The maximum normalized cDNA density is shown at the right of each track.

We applied a peak-finding algorithm to identify clusters of crosslink nucleotides with significant enrichment of crosslink events relative to the local environment14. Using a flank size of 15 nucleotides on either side of crosslink sites identified only 1.7% of intronic FUS crosslink events, in agreement with the previous finding that FUS does not bind narrowly defined RNA sites16. Moreover, we saw no enrichment of FUS or TDP-43 crosslink clusters at 3′ splice sites, which contrasts a 150-fold of enrichment of intronic U2AF65 crosslink clusters (Fig. 1c). We also evaluated binding further from 3′ splice sites, which showed that the average crosslinking density of all three proteins was highest at the 5′ end of long introns and decreased towards the 3′ end of introns, in a pattern similar to the results obtained by total RNA sequencing from the human brain17 (Fig. 1d). FUS binds along the whole length of pre-mRNAs, therefore this pattern is visible when analysing FUS crosslinking in individual genes (Fig. 1e). Taken together, iCLIP data indicate that the intronic binding of all three proteins corresponds to the abundance of the nascent RNA, which is higher at the 5′ end of long introns17. Thus, binding to promoter-associated antisense RNAs and the increased binding at the 5′ end of long introns demonstrate that FUS, TDP-43 and U2AF65 interact with the nascent RNA soon after its transcription.

FUS binds RNA with limited sequence specificity

A recent study used PAR-CLIP to find that FUS preferentially crosslinks to uridines in the loop region of stem-loop RNA structures16. However, crosslinking methods can introduce RNA sequence and structure biases, therefore it is important to compare data of the multiple proteins18. We compared the probability of single-strandedness around the crosslink sites of FUS, TDP-43 and U2AF65, which showed that all three proteins had an increased probability of single-strandedness directly at the crosslink sites compared to surrounding region (Fig. 2a). Since this increase is common to the three proteins, it indicates a general preference of protein crosslinking to single-stranded RNA. The probability of single-strandedness was greatest for U2AF65 (where it continued around the crosslink sites) and lowest for TDP-43, which most likely reflects the preference of U2AF65 for single-stranded pyrimidine tracts and the capacity of TDP-43 to bind double-stranded nucleic acids19. In contrast to the past study, we did not observe a significant decrease in single-strandedness around the FUS crosslink sites. Taken together, since the preferential crosslinking of FUS to single-stranded RNA could be of technical nature and we find little evidence for decreased single-strandedness in the surrounding RNA, iCLIP does not conclusively support the preferential binding of FUS to stem-loop RNA structures.

Figure 2
figure 2

Analysis of RNA sequence and structure specificity of FUS.

(a) Analysis of the probability of single-strandedness at positions −100 to +100 around the FUS, TDP-43 and U2AF65 crosslink sites. We calculated the probability of tetramers being located in single-stranded regions in 70-nucleotide sliding windows using RNAplfold, as described previously50,51. (b–e) Comparison of pentamer enrichment at crosslink sites between replicate FUS experiments, or between FUS, TDP-43 and U2AF65 iCLIP. Pearson correlation coefficient (r) is shown. Pentamers centered at positions −4 to 0 relative to the crosslink site were evaluated and enrichment is determined by comparison to randomised positions in the same genomic regions. The sequences of the most enriched pentamers at TDP-43 (d) and U2AF65 (e) crosslink sites are shown and two groups of pentamers circled in (e) are evaluated further in panels 2f and 2g. (f) Position of the average enrichment of the four listed pentamers (circled in red in Fig. 2e). The middle position of the pentamers is evaluated relative to the position of cross-link sites and enrichment is determined by comparison to randomised positions in the same genomic regions. (g) Position of the average enrichment of the two listed pentamers (circled in purple in Fig. 2e). The middle position of the pentamers is evaluated relative to the position of cross-link sites and enrichment is determined by comparison to randomised positions in the same genomic regions.

To study the sequence-specificity of FUS, we evaluated the occurrence of pentamers overlapping with crosslink sites. We observed a high correlation in pentamer occurrence between the replicate FUS experiments (r>0.91, Pearson correlation coefficient, Fig. 2b, c), a low correlation with TDP-43 (r = 0.48, Fig 2d) and a surprisingly high correlation between FUS and U2AF65 (0.85, Fig. 2e). Uridine-rich pentamers had the higest occurrence at FUS and U2AF65 crosslink sites (Fig. 2e). Whereas, the pentamers consisting purely of pyrimidines were more enriched at U2AF65 crosslink sites, the pentamers containing two (circled in red in Fig. 2e) or three (circled in purple in Fig. 2e) purines were more enriched at FUS crosslink sites. Enrichment of the first pentamer group was seen for all three proteins (Fig. 2f), therefore we cannot exclude the possibility that it represents a sequence bias of crosslinking. However, pentamers containing GG or GGG upstream of the uridines were enriched at FUS, but not TDP-43 and U2AF65 crosslink sites (Fig. 2g). Most of these pentamers contained a GGU motif, indicating that this motif increases the affinity of FUS for RNA. Nevertheless, the 2-fold enrichment of these pentamers in the area overlapping the FUS crosslink sites (Fig. 2e) contrasts the 15-fold enrichment of GU-rich pentamers at the TDP-43 crosslink sites (Fig. 2d), indicating that FUS binds RNA with limited sequence specificity.

Since we did not find any overlap between sequence specificity of FUS and TDP-43, we instead evaluated if the two proteins overlapped in binding to longer RNA regions. For this purpose, we identified 0.5 kb regions within genes longer than 50 kb that had at least 3-fold enriched normalized crosslink density compared to the rest of the gene. We then compared the overlap between these regions of the three proteins. This showed that regions with enriched FUS binding overlapped with a higher proportion of U2AF65 than TDP-43 binding regions (Supplementary Fig. S1b). Taken together, our analyses of iCLIP data indicate that FUS and TDP-43 generally bind to distinct sites on pre-mRNAs.

FUS binds to long RNA regions around the repressed exons

FUS is essential for the viability of inbred mice, as FUS null mice (FUS−/−) show marked chromosomal instability and immune defects and die within 16 hours of birth20. Outbred FUS−/− mice show male sterility and enhanced sensitivity to radiation21. We isolated RNA from the brain tissue of day 18 embryo FUS−/− (KO, n = 3) and wildtype (WT, n = 3) littermates from an inbred strain20. We analyzed alternative splicing using Affymetrix high-resolution splice-junction microarrays. Since past studies have suggested that FUS plays a role in transcriptional regulation22, we first analysed transcript-level changes using ASPIRE 3 software14. Microarray probes detected signal of 25,539 genes, but only ten of these (including the FUS gene) showed at least a two-fold change in transcript levels in the FUS−/− brains that was significant (p-value<0.05, Student's t test, one-tailed, unequal variance, Table S3). Due to the limited extent of transcript-level changes, we proceeded to evaluate changes in alternative splicing.

We identified splicing changes in 68 alternative cassette exons and 48 other types of splicing changes when using ASPIRE software with a threshold of |ΔIrank|≥1. Notably, 69% (47/68) of the changed cassette exons increased their inclusion in the FUS−/− brain. Moreover, when using a higher threshold (|ΔIrank|≥1.5), 77% (6/26) of the cassette exons increased their inclusion. This indicates that FUS primarily represses exon inclusion in WT brain. To validate the splicing changes in FUS−/− brain, we assessed 21 of the cassette exons using RT-PCR and capillary electrophoresis. The expected alternative isoforms were detected for 17 of these exons and 14 of these showed a significant change that agreed with change detected by microarray, resulting in an 82% (14/17) validation rate (Supplementary Fig. S2 and Table S4). In addition, we validated splicing changes in six variable-length exons, a retained intron and a terminal exon within the Ewing sarcoma breakpoint region 1 (Ewsr1) gene, which encodes a FET-family protein homologous to FUS (Supplementary Fig. S2 and Table S4).

To explore the possibility that some splicing changes were an indirect effect of proximity to post-natal death, we compared our data with changes in the brain of Nova1/Nova2−/− mice, which also die within 16 hours of birth23. Nova1 and Nova2 are neuron-specific proteins containing three KH-family RNA-binding domains, which regulate brain-specific splicing of genes with synapse-related functions24. RNA from E18 Nova1/Nova2−/− mouse brains was analysed using the same type of splice junction microarray as our study25 and we detected 304 splicing changes using ASPIRE 3 analysis (|ΔIrank|≥1). Of these, only five changes were also present in the FUS−/− mice and two of these changes were in the opposite direction (data not shown). This indicates that the splicing changes in these mouse models most likely reflect a specific role of these proteins in splicing regulation, rather than non-specific ante-mortem changes.

To study if the exons regulated by FUS have an increased incidence of FUS crosslinking, we first identified FUS crosslink clusters using the flank side of 200 nucleotides and FDR < 5% on the combined replicate experiments, which identified 49879 FUS crosslink clusters. We also identified TDP-43 and U2AF65 crosslink clusters in the same way, to compare their distribution around the exons regulated by FUS. Five out of the 20 exons with lower (, dIrank<−1.5) inclusion in WT than FUS−/− brain overlapped with a FUS crosslink cluster, which was a significant increase compared to control exons (p-value = 0.0003, Fisher's exact test, Table 1, Figs. 3, 4). In contrast, TDP-43 crosslink clusters were most often positioned far from the exons regulated by FUS and were not enriched at the repressed exons (Fig. 4, Table 1). Interestingly, the number of FUS and U2AF65 crosslink sites around the repressed exons was highly correlated (r = 0.94, Pearson correlation coefficient): all of the FUS-regulated cassette exons that contained a proximal FUS crosslink cluster also contained a proximal U2AF65 crosslink cluster (Table 2), whereas 67% (26/39) of the remaining repressed exons lacked a U2AF65 crosslink cluster. However, U2AF65 crosslink clusters were most often positioned at the 3′ splice site, whereas the FUS crosslink clusters covered several hundred nucleotide long regions (Figs. 3, 4). The correlation between the crosslink density of U2AF65 and FUS, in spite of their binding to different RNA regions, might result from the variable abundance of the intronic RNA that flanks the repressed exons: introns that are slowly spliced are available for protein binding for longer, resulting into increased crosslink density of proteins.

Table 1 The number of FUS-regulated exons with proximal crosslink clusters of FUS, TDP-43 or U2AF65. The exons with lower (, dIrank<−1.5) or higher (, dIrank>1.5) inclusion in WT than FUS−/− brain are evaluated and compared to control exons (all alternative exons detected by microarray). The last column shows the total number of these exons and the first three columns show the number of these exons that contain FUS, TDP-43 or U2AF65 crosslink cluster within 250nt from the middle of the exon. Crosslink clusters were determined using flank = 200 and FDR<0.05. The star indicates that the proportion of exons repressed by FUS with a FUS crosslink cluster (5/20) is significantly increased compared to control exons with a FUS crosslink cluster (177/5779) (p-value = 0.0003, Fisher's exact test)
Table 2 The list of FUS-regulated exons with proximal crosslink clusters of FUS. The genomic positions of the exons are shown, as well as the number of FUS, TDP-43 or U2AF65 crosslink sites in crosslink clusters located within 250nt from the middle of the exon. Crosslink clusters were determined using flank = 200 and FDR<0.05.
Figure 3
figure 3

FUS, TDP-43 and U2AF65 crosslinking around two exons regulated by FUS.

(a) Crosslinking density within the D4Wsu53e gene. The exon repressed by FUS is marked by a blue arrow. The bar graph shows the cDNA count at individual crosslink sites in replicate FUS iCLIP experiments and the grouped TDP-43 and U2AF65 iCLIP experiments. (b) Crosslinking density within the Ewsr1 gene. The exon enhanced by FUS is marked by a red arrow. The bar graph shows the cDNA count at individual crosslink sites in replicate FUS iCLIP experiments and the grouped TDP-43 and U2AF65 iCLIP experiments.

Figure 4
figure 4

Analysis of iCLIP crosslink clusters around the exons regulated by FUS.

The map of crosslink clusters at positions within 500 nt of alternative exons and flanking exons. Crosslink clusters were determined using flank = 200 and FDR<0.05. The exons are grouped by sequential analysis of crosslink cluster positions in three regions: group 1 is identified by clusters within the exon; group 2 by clusters downstream of the exon; and group 3 by clusters upstream of the exon. The cassette exons with ΔIrank > 1 (enhanced exons, red clusters) or ΔIrank < −1 (repressed exons, blue clusters) in analysis of FUS−/− brain RNA that contain at least one crosslink cluster in these regions are shown. The positions of FUS (a), TDP-43 (b) and U2AF65 (c) crosslink clusters are shown and the gene symbol and exon number (according to genomic annotation) is shown on the left. Only the FUS-regulated exons that contain crosslink clusters for the corresponding RBP in the evaluated regions are shown.

FUS regulates genes involved in neuronal development

To understand the biological relevance of splicing regulation by FUS, we performed GO term analysis of genes containing FUS-regulated exons and compared these with all alternative exons that were detected by the microarray. This identified a significant enrichment of genes associated with positive regulation of cell adhesion, negative regulation of apoptosis, neuronal development and axonogenesis (Table S5). Since the GO annotation is incomplete, we referred to the relevant publications to confirm well-defined neuronal functions of 19 proteins encoded by the FUS-regulated transcripts (Table 3). 10 of these regulate axonogenesis, neurite outgrowth or axon guidance and 10 are associated with neurologic disorders. We don't find a significant overlap between exons regulated by FUS identified in this study and those regulated by TDP-43 in human neuroblastoma cells26,27. Nevertheless, we find that both proteins primarily regulate genes involved in neuronal development.

Table 3 Neurologic disease and neuronal functions of proteins encoded by the alternative mRNA isoforms regulated by FUS

Discussion

In this study we aimed to understand how FUS interacts with nascent RNA to regulate gene expression in the brain. We studied the RNA binding of FUS using iCLIP and determined its role in regulating alternative splicing in the mouse brain, thereby providing insights both into mechanisms of FUS-dependent splicing regulation, as well as its function in neuronal biology. FUS crosslinking is enriched at GGU motifs, but this enrichment is only two-fold, which might explain why FUS does not bind clearly defined RNA sites; instead, its binding is enriched over long intronic regions. Analysis of FUS−/− brain identified a role of FUS in regulating splicing of specific alternative exons and showed that the repressed exons were often flanked by long FUS crosslink clusters. Interestingly, the intensity of FUS crosslinking around the regulated exons was closely correlated with the intensity of U2AF65 crosslinking at the 3′ splice site.

Our data does not indicate a strong preference of FUS for stem-loop RNA structures that were observed in a past study16. We found that uridine-rich pentamers were enriched at crosslink sites of FUS, TDP-43 and U2AF65, but we recently demonstrated that UV-C crosslinking has a slight preference for uridines18, therefore the uridine enrichment at FUS crosslink sites could partly reflect a sequence bias of UV-C crosslinking. However, pentamers containing a GGU motif were specifically enriched at the FUS crosslink sites, indicating that GGU motif confers specificity to RNA recognition by FUS. This agrees with in vitro selection and NMR studies, which showed that the zinc finger-like domain in FUS recognises GGUG motif28,29. Moreover, GGT motif is also the critical part of sequences that were recently demonstrated to be important for interaction between FUS and single-stranded DNA30, therefore it appears likely that FUS recognises similar sequence motifs in RNA and single-stranded DNA.

The sequence specificity of FUS is very limited compared to U2AF65 and TDP-43, therefore FUS has a more widespread crosslinking in pre-mRNAs. This results in a saw-tooth pattern of crosslinking density in long genes that reflects the increased abundance of nascent RNA towards the 5′ end of long introns17. Interestingly, when summarizing iCLIP data over multiple introns, the saw-tooth binding pattern in long genes becomes apparent also in U2AF65 and TDP-43 iCLIP. U2AF65 is a splicing factor that recruits U2 snRNP to the nascent transcripts, therefore its increased binding to long introns could lead to recognition of cryptic splicing elements. Active repression of cryptic splicing elements might therefore be particularly important for long genes. Therefore, long genes could be more sensitive to perturbations in RBPs that affect binding of U2AF65 to pre-mRNA; this could potentially explain the greater sensitivity of long genes to TDP-43 knockdown31.

We did not identify an increased association between FUS and promoter-associated RNAs compared to TDP-43 and U2AF65. However, this does not exclude the possibility that FUS plays a specific role by linking promoter-associated RNAs to chromatin-associated proteins, such as the CREB-binding protein (CBP)15. Recently, it was shown that incompletely spliced transcripts are retained on the chromatin until fully spliced32. Moreover, it was shown that knockdown of FUS leads to accumulation of incompletely spliced transcripts33. We find that introns flanking the exons repressed by FUS have an increased crosslink density of FUS (and partly also U2AF65), indicating that these introns have a slow splicing kinetics and are therefore bound to FUS for a longer time than the rest of the pre-mRNA. Taken together, our data are consistent with a model where FUS regulates alternative splicing by binding introns that are slowly spliced, possibly by maintaining chromatin retention of these introns.

We find that FUS and TDP-43 generally bind distinct RNA sites and regulate distinct alternative exons. This agrees with their differential mRNA binding that was recently documented in motoneuron-like NSC-34 cells34. Nevertheless, both proteins regulate splicing of genes enriched in functions related to neuronal development and neurodegenerative diseases26. For instance, FUS regulates two exons in enabled homolog (ENAH), an actin regulatory protein involved in cell motility, axon guidance, neural tube closure and cell-cell adhesion35,36,37,38,39. Both exons are also associated with metastatic progression40,41. FUS also promotes inclusion of exon 3 in activity-dependent neuroprotective protein (ADNP) that contains the start codon. The protein-coding transcript is the main isoform in the wild-type brain, but represents only 23% of the transcripts made in the FUS−/− brain. ADNP−/− mice die at embryonic day 942 and a neuroprotective peptide NAP (Davunetide), derived from ADNP, is being used in pilot and clinical trials for treatment of neurodegenerative disorders43. FUS promotes inclusion of exon 21 of Sortilin 1 (Sort1), which is, interestingly, silenced by TDP-4331,44. Sort1 is a type 1 receptor, which has been functionally associated with progranulin and FTLD and exon 21 was reported to regulate trafficking of Sort145. In agreement with the FUS knockdown study in hippocampal neurons46, we also identified a modest splicing change in exon 10 of microtubule associated protein Tau (Mapt), which codes for the second microtubule binding domain47. Finally, we found that FUS promotes the use of an alternative terminal exon in EWSR1, indicating cross-regulation among the FET protein family members. In conclusion, our study places FUS alongside TDP-43 in regulating splicing of transcripts associated with neuronal development and neurodegenerative diseases.

Methods

iCLIP

We used the iCLIP method as described previously48, with the following modifications. Following dissociation and UV-crosslinking of embryonic day 18 (E18) mouse brain, iCLIP immunoprecipitation was performed with protein A Dynabeads (Invitrogen) conjugated to rabbit anti-FUS (Novus Biologicals, NB100-565) or rabbit anti-TDP-43 (Proteintech, 10782-2-AP) or without conjugation of any antibody, or protein G Dynabeads (Invitrogen) conjugated to mouse anti-U2AF65 (U4758, Sigma). The size of the protein-RNA complexes present in the high RNase condition corresponded to a single protein molecule bound to the RNA and a shift upwards on the gel was seen under low RNase conditions, corresponding to the proteins that were crosslinked to longer RNA molecules (Supplementary Fig. S1a). The region corresponding to 80–130 kDa, 50–100 kDa and 70–120 kDa complexes was excised from the membrane to isolate the RNA bound to FUS, TDP-43 and U2AF65, respectively.

High-throughput sequencing was done either using 50 cycles on Illumina GAII or 70 cycles on Illumina MiSeq and the barcode sequences corresponding to the individual experiment were as described (Table S1). The randomers were registered and the barcodes were removed before mapping the sequences to the genome sequence (mm9/NCBI37) allowing two mismatches using Bowtie version 0.12.7 (command line: -v 2 -m 1 -a --best --strata). The nucleotide preceding the iCLIP cDNAs mapped by Bowtie was used to define the crosslink sites identified by truncated cDNAs. The method for the randomer evaluation, annotation of genomic segments and identification of significantly clustered crosslinking events was performed with FDR<0.05 and a maximum spacing of 15 or 200 nt (as indicated in the text), as described earlier14,49, except that the Ensembl 59 gene annotation was used and that the positions of crosslink sites were randomised within the whole gene, rather than within individual introns. Unless information on replicates is specifically shown, the replicate iCLIP experiments for the same protein were grouped before performing the analyses.

Analysis, normalisation and identification of regions with enriched crosslinking density

To identify regions with increased crosslink density within pre-mRNAs, we studied genes longer than 50 kb that were highly transcribed in the brain. These genes were selected based on five-fold higher density of iCLIP cDNAs compared to the average iCLIP cDNA density in all ENSEMBL genes. The genes were fragmented into 0.5 kb regions. To normalise the saw-tooth effect of intronic read density (see Fig. 1e), we first analysed the read density in total RNAseq data from fetal human brain15 in all introns longer than 100 kb. The read density in these introns increased by a factor of 0.0077 per 1 kb distance from the 3′ splice site. Thus, the read density increases by a factor of (1 + 0.00385 * intron_1len,kb) as the distance from the 3′ end of intron increases. The read density in genes after normalising the saw-tooth effect is therefore:

geneden norm: cDNA density in each gene, normalized by the relative increase in cDNA density per intron increases by a factor of (1 + 0.00385 * intron_1len,kb)

genelen: gene length

intron_1: the first intron in the gene, including also the preceding exon

intron_n: the last intron in the gene, including also the preceding exon

intron_1len, kb: length of the intron_1 in kilobases

intron_1cDNA: sum of cDNA counts within the intron_1

For Fig. 1d and 1e, we used the gene-normalized read density to determine the relative read density at each intronic region:

regionden, gene norm: cDNA density in each region, normalized by the cDNA density of the whole gene

regioncDNA: sum of cDNA counts within a region

regionlen: region length

To study crosslinking density independently of the saw-tooth effect, we performed regional normalization of iCLIP cDNA density using the following formula:

regionden, region norm: cDNA density in each region, normalized by cDNA density of the whole gene, as well by the increase in intronic cDNA density at a distance from the 3′ splice site

regiondist, kb: distance of the 3′ end of region from the 3′ splice site in kilobases

We used the regionally normalized data in genes longer than 50 kb to identify 0.5 kb regions of enriched crosslink density. These were identified by finding the 0.5 kb intervals where the regionally normalized crosslinking density was higher by at least a factor of three compared to the average crosslinking density in the whole gene.

RNA sequence and structure analysis

We first identified a set of pentamers that were most enriched in the region of −30 to +30 nucleotides around the crosslink sites in the iCLIP data of a specific RBP compared to the other two RBPs. In this way we avoided identifying motifs that would result from sequence biases associated with UV-C crosslinking or other aspects of iCLIP method. We then evaluated the positional enrichment of these sets of pentamers around the crosslink sites. This was determined by comparing the occurrence at each pentamer around the true crosslink site with the occurrence at randomized crosslink positions. The positions of crosslink sites were randomised within the same regions of the gene (i.e., within the same intron, CDS or UTR). To determine the single-strandedness probability around crosslink sites, we calculated unpaired probability of all tetramers in the regions using RNAplfold program with default parameters, as described previously50,51.

Splice-junction microarray

RNA was isolated from E18 whole brain of three FUS−/− and three wild-type littermates and the cDNA samples were prepared using the GeneChip WT cDNA Synthesis and Amplification Kit (Affymetrix 900673), followed by GeneChip Hybridization, Wash and Stain Kit (Affymetrix 900720) using Affymetrix GeneChip Fluidics Station 450. Mouse high-resolution AltSplice splice-junction microarrays (Affymetrix) were then scanned on Affymetrix GeneChip Scanner 3000 7G. Data was analyzed with version 3 of ASPIRE (Analysis of SPlicing Isoform Reciprocity), which determined ΔIrank value, which was used to rank the splicing changes based on their significance14. Changes with |ΔIrank|≥1 were further validated and were used for iCLIP binding and GO term analysis. GO term analysis was performed as described previously26.

RT-PCR

Total RNA was extracted using the miRNeasy Kit (Qiagen) and 200 ng of total RNA was used for reverse transcription using Superscript III (Invitrogen) according to the manufacturer's instructions. For analysis of splicing, PCR was performed using Immomix (Bioline) using primers listed in Table S3. The PCR products were visualized using QIAxcel capillary electrophoresis system (Qiagen). To calculate exon inclusion/exclusion, the value of each peak was first divided by the size of the amplicon (nucleotide number) it represented to even out the changes brought about by stronger staining of longer amplicons. The percentage of the peak representing exon inclusion was obtained from the ratio of size modified values for the exon inclusion peak divided by the sum of size modified values for peaks representing exon inclusion and skipping. Splicing change was calculated by subtracting the exon inclusion in the knockdown cells from the inclusion in wild-type cells (thus, a positive ΔI represents exons enhanced by TDP-43 and negative those silenced by TDP-43).

Gene ontology (GO) analysis

GO term enrichment analysis was performed on exons with |ΔIrank|≥1 in annotated genes (108 exons) and all exons in annotated genes that had sufficient signal on the microarray were used as controls (15975). To make GO annotation exon-centric, we multiplied each gene annotation record to include all exons on the splicing microarray for the gene and the p-value was calculated as described previously52.