Characterization of alternative mRNA splicing in cultured cell populations representing progressive stages of human fetal kidney development

Nephrons are the functional units of the kidney. During kidney development, cells from the cap mesenchyme—a transient kidney-specific progenitor state—undergo a mesenchymal to epithelial transition (MET) and subsequently differentiate into the various epithelial cell types that create the tubular structures of the nephron. Faults in this transition can lead to a pediatric malignancy of the kidney called Wilms’ tumor that mimics normal kidney development. While human kidney development has been characterized at the gene expression level, a comprehensive characterization of alternative splicing is lacking. Therefore, in this study, we performed RNA sequencing on cell populations representing early, intermediate, and late developmental stages of the human fetal kidney, as well as three blastemal-predominant Wilms’ tumor patient-derived xenografts. Using this newly generated RNAseq data, we identified a set of transcripts that are alternatively spliced between the different developmental stages. Moreover, we found that cells from the earliest developmental stage have a mesenchymal splice-isoform profile that is similar to that of blastemal-predominant Wilms’ tumor xenografts. RNA binding motif enrichment analysis suggests that the mRNA binding proteins ESRP1, ESRP2, RBFOX2, and QKI regulate alternative mRNA splicing during human kidney development. These findings illuminate new molecular mechanisms involved in human kidney development and pediatric kidney cancer.


Manual calculation inclusion levels:
The gene FAT1 (Fig. S18) contains an alternatively skipped exon (SE) that we found in the literature but was not included in our gene model. For this exon we calculated the inclusion levels manually as follows. The inclusion level of an alternatively spliced exon is defined as the fraction of transcripts that include the exon out of the total number of transcripts that either include the exon or skip over it [1,2]. This can be estimated from the density of reads that align to the upstream splice junction, the alternative exon itself, and the downstream splice junction, vs. the density of reads that align to the skipping splice junction that directly connects the upstream exon to the downstream exon. It is also possible to use only reads that span the splice junctions.
We define to be the number of reads that map to the exon inclusion isoform and to be the number of reads that map to the exon skipping isoform. Since we are interested in the density of reads, we need to normalize the read counts and by the "effective lengths" -the lengths of the isoform-specific segments that they align to. We therefore define as the effective length of the exon-inclusion isoform, that is, the number of the unique positions to which reads can be aligned in the upstream splice junction, the alternative exon itself, and the downstream splice junction. Likewise we define as the effective length of the exon-skipping isoform, that is, the number of unique positions to which reads can be aligned in the skipping splice junction that directly connects the upstream exon to the downstream exon (see Fig. S1 of [1] and supplementary note of [2]).
Given the effective lengths and for each isoform, the inclusion level can now be estimated from read counts and as:

̂= ( / ) ( / ) + ( / )
For our manual calculation we counted only reads that span the splice junctions, and approximated the effective lengths as the number of junctions from which each isoform is composed (see Fig. S1 of [1]), that is, for a skipped exon: = 2 and = 1 -two splice junctions for the inclusion isoform and one splice junction for the skipping isoform.

Quantifying alternative splicing using DEXSeq:
For some of the genes, we also used DEXSeq [3] to validate differential expression in specific exons by counting the number of reads that align to each exon (or segment of an exon) within each cell population. In order to take into account the different depths of each transcriptome due to the varying number of cells in each population, we normalized the exon-specific counts by dividing by the total number of reads that align to the specific gene.
DEXSeq was used either to complement rMATS (e.g. for the mutually exclusive exons in FGFR2, Fig. S36) or to quantify alternative splicing of types that are not covered by rMATS (e.g. for the alternative 3' ends in the gene EPB41L5, Fig. S42). Figure S1: We selected a set of 395 genes that were found by intersecting all the gene sets that were upregulated at least 2-fold (log2foldChange > 1) in hFK3 (the mature fetal developmental fraction) with respect to hFK1, WT11, WT14, and WT37, and found that they are related to epithelial differentiation ( Fig. 2D and Table S2). Shown is a Venn diagram of differentially expressed genes with log2foldChange > 1. Each oval represents a set of genes that were found to be upregulated in hFK3 with respect to the labeled cell fraction (hFK1, WT11, WT14, or WT37). We note that the enrichment that we observed was quite robust and did not depend much on the exact parameters for choosing these genes (e.g. the exact threshold for log2foldchange etc.) Figure S2: Cells at the early stage of human kidney development (hFK1) have a mesenchymal splice-isoform profile that is similar to that observed in blastemalpredominant Wilms' tumor patient-derived xenografts (WT37, WT14, and WT11). Shown are Sashimi plots for the genes ENAH [4][5][6], CD44 [7,8], CTNND1 [6,9], FGFR2 [9,10], EPB41L5 [6,11], FAT1 [4,12], PLOD2 [4,13,14], and ARHGEF10L [15][16][17]. These genes were previously shown to be alternatively spliced in mesenchymal and epithelial tissues. The genes ENAH and CD44 contain cassette exons that are low in the Wilms' tumor xenografts and increase during kidney development, whereas CTNND1 contains a cassette exon that is high in Wilms' tumors and decreases during kidney development. FGFR2 contains two mutually exclusive exons, one of which is high in Wilms' tumors and decreases during kidney development, and the other which is low in Wilms' tumors and increases during kidney development. Likewise, EPB41L5 has two isoforms with alternative 3' ends, one of which is predominant in the Wilms' tumor xenografts and decreases during kidney development, and the other which is low in Wilms' tumors and increases during kidney development. FAT1 and PLOD2, two genes for which alternative splicing was previously found to be regulated by the RNA binding protein RBFOX2 [4], contain cassette exons that are highly expressed in Wilms' tumors and decrease during kidney development, while ARHGEF10L, a gene for which alternative splicing was previously found to be regulated by the RNA binding proteins ESRP1 and ESRP2 [15], contains a cassette exon that is low in Wilms' tumors and increases during kidney development. Figure S3: No consistent relationship between gene expression levels and exon inclusion levels. Shown is a histogram of correlation values between expression levels and cassette exon inclusion levels within the same gene, as well as specific examples. It can be seen that for some genes expression and splicing are strongly correlated (USO1) or anti-correlated (ACOT9) while for others they are not (SLK). Figure S4: A comparison of cassette exons that were found in the present study to be alternatively spliced between cell fractions representing early (hFK1) and late (hFK3) stages of human kidney development vs. alternatively spliced cassette exons that were previously observed by Yang et al. [14] in cells from a human H358 epithelial non-small cell lung cancer (NSCLC) cell line undergoing Epithelial to Mesenchymal Transition (EMT). Shown are Venn diagrams comparing the findings of our study ("hFK") to four different experiments performed in Yang et al. [14]: (1) Zeb1-induced EMT, (2) ESRP1/2 depletion, (3) RBM47 depletion, and (4) QKI depletion. In order to perform a consistent comparison, we followed the selection criteria of Yang et al. [14] and selected cassette exons for which the FDR < 0.05 and the absolute difference in inclusion levels > 0.05. For example, we found that the cassette exons located within the genes CD44 and ENAH are common to both our experiments (hFK) and also to the Zeb1-induced EMT, ESRP1/2 depletion, and RBM47 depletion experiments.  [14] is also differentially over-expressed in hFK3 vs. hFK1, we did not find significant motif enrichment for this gene. Figure S6: Additional RNA binding proteins that are differentially expressed between the cell fractions representing the early (hFK1) and late (hFK3) stages of human kidney development or between the blastemal-predominant Wilms' tumor patient-derived xenografts (WT37, WT14, and WT11) and hFK3. These RBP's might also be involved in splicing regulation in the developing fetal kidney or in the development of Wilms' tumors, but we did not find any significant motif enrichment for these genes. Figure S7: The gene ACOT9 contains a cassette exon (skipped exon, SE) that is highly expressed in the Wilms' tumor xenografts (WT37, WT14, and WT11) and the cells representing the early stage of human kidney development (hFK1), and whose expression gradually decreases during later stages of kidney development (hFK2 and hFK3). Shown are a Sashimi plot and a barplot of inclusion levels obtained from rMATS. Figure S8: The gene ADD3 contains a cassette exon (skipped exon, SE) that is lowly expressed in two out of three of the Wilms' tumor xenografts (WT37 and WT11) and the cells representing the early stage of human kidney development (hFK1), and whose expression gradually increases during later stages of kidney development (hFK2 and hFK3). Shown are a Sashimi plot and a barplot of inclusion levels obtained from rMATS. Figure S9: The gene AP1B1 contains a cassette exon (skipped exon, SE) that is highly expressed in the Wilms' tumor xenografts (WT37, WT14, and WT11) and the cells representing the early stage of human kidney development (hFK1), and whose expression gradually decreases during later stages of kidney development (hFK2 and hFK3). Shown are a Sashimi plot and a barplot of inclusion levels obtained from rMATS. Figure S10: The gene ARHGEF10L [15][16][17], a gene for which alternative splicing was previously found to be regulated by the RNA binding proteins ESRP1 and ESRP2 [15], contains a cassette exon (skipped exon, SE) that is lowly expressed in the Wilms' tumor xenografts (WT37, WT14, and WT11) and the cells representing the early stage of human kidney development (hFK1), and whose expression gradually increases during later stages of kidney development (hFK2 and hFK3). Shown are a Sashimi plot and a barplot of inclusion levels obtained from rMATS. Figure S11: The gene ARHGEF11 contains a cassette exon (skipped exon, SE) that is highly expressed in the Wilms' tumor xenografts (WT37, WT14, and WT11) and the cells representing the early stage of human kidney development (hFK1), and whose expression gradually decreases during later stages of kidney development (hFK2 and hFK3). Shown are a Sashimi plot and a barplot of inclusion levels obtained from rMATS. Figure S12: The gene CCDC50 contains a cassette exon (skipped exon, SE) that is lowly expressed in the Wilms' tumor xenografts (WT37, WT14, and WT11) and the cells representing the early stage of human kidney development (hFK1), and whose expression gradually increases during later stages of kidney development (hFK2 and hFK3). Shown are a Sashimi plot and a barplot of inclusion levels obtained from rMATS. Figure S13: The gene CD44 [7,8], that was previously shown to be alternatively spliced between mesenchymal and epithelial tissues, contains cassette exons (skipped exons, SE) that are lowly expressed in the Wilms' tumor xenografts (WT37, WT14, and WT11) and the cells representing the early stage of human kidney development (hFK1), and whose expression gradually increases during later stages of kidney development (hFK2 and hFK3). Shown are a Sashimi plot and barplots of inclusion levels obtained from rMATS. Figure S14: The gene CLSTN1 contains cassette exons (skipped exons, SE) that are highly expressed in the Wilms' tumor xenografts (WT37, WT14, and WT11) and the cells representing the early stage of human kidney development (hFK1), and whose expression gradually decreases during later stages of kidney development (hFK2 and hFK3). Shown are Sashimi plots and barplots of inclusion levels obtained from rMATS. Figure S15: the gene CTNND1 [6,9], that was previously shown to be alternatively spliced between mesenchymal and epithelial tissues, contains a cassette exon (skipped exon, SE) that is highly expressed in the Wilms' tumor xenografts (WT37, WT14, and WT11) and the cells representing the early stage of human kidney development (hFK1), and whose expression gradually decreases during later stages of kidney development (hFK2 and hFK3). Shown are a Sashimi plot and barplots of inclusion levels obtained from rMATS. Note that this exon is composed of two parts. Figure S16: The gene DNM1 contains a cassette exon (skipped exon, SE) that is highly expressed in the Wilms' tumor xenografts (WT37, WT14, and WT11) and the cells representing the early and intermediate stages of human kidney development (hFK1 and hFK2), and whose expression decreases during the late stage of kidney development (hFK3). Shown are a Sashimi plot and a barplot of inclusion levels obtained from rMATS. Figure S17: The gene ENAH [4][5][6], that was previously shown to be alternatively spliced between mesenchymal and epithelial tissues, contains a cassette exon (skipped exon, SE) that is lowly expressed in the Wilms' tumor xenografts (WT37, WT14, and WT11) and the cells representing the early stage of human kidney development (hFK1), and whose expression gradually increases during later stages of kidney development (hFK2 and hFK3). Shown are a Sashimi plot and a barplot of inclusion levels obtained from rMATS. Figure S18: The gene FAT1 [4,12], a gene for which alternative splicing was previously found to be regulated by the RNA binding protein RBFOX2 [4], contains a cassette exon (skipped exon, SE) that is highly expressed in the Wilms' tumor xenografts (WT37, WT14, and WT11) and the cells representing the early stage of human kidney development (hFK1), and whose expression gradually decreases during later stages of kidney development (hFK2 and hFK3). Shown are a Sashimi plot and a barplot of inclusion levels that were manually calculated. Figure S19: The gene INF2 contains a cassette exon (skipped exon, SE) that is lowly expressed in the Wilms' tumor xenografts (WT37, WT14, and WT11) and the cells representing the early stage of human kidney development (hFK1), and whose expression gradually increases during later stages of kidney development (hFK2 and hFK3). Shown are a Sashimi plot and a barplot of inclusion levels obtained from rMATS. Figure S20: The gene MYL6 contains a cassette exon (skipped exon, SE) that is highly expressed in the Wilms' tumor xenografts (WT37, WT14, and WT11) and the cells representing the early stage of human kidney development (hFK1), and whose expression decreases during later stages of kidney development (hFK2 and hFK3). Shown are a Sashimi plot and a barplot of inclusion levels obtained from rMATS. Figure S21: The gene NFYA contains a cassette exon (skipped exon, SE) that is highly expressed in the Wilms' tumor xenografts (WT37, WT14, and WT11) and the cells representing the early stage of human kidney development (hFK1), and whose expression gradually decreases during later stages of kidney development (hFK2 and hFK3). Shown are a Sashimi plot and a barplot of inclusion levels obtained from rMATS. Figure S22: The gene NUMB contains a cassette exon (skipped exon, SE) that is lowly expressed in the Wilms' tumor xenografts (WT37, WT14, and WT11) and the cells representing the early stage of human kidney development (hFK1), and whose expression gradually increases during later stages of kidney development (hFK2 and hFK3). Shown are a Sashimi plot and a barplot of inclusion levels obtained from rMATS. Figure S23: The gene PLEKHA1 contains a cassette exon (skipped exon, SE) that is highly expressed in the Wilms' tumor xenografts (WT37, WT14, and WT11) and the cells representing the early stage of human kidney development (hFK1), and whose expression gradually decreases during later stages of kidney development (hFK2 and hFK3). Shown are a Sashimi plot and a barplot of inclusion levels obtained from rMATS. Figure S24: The gene PLOD2 [4,13,14], a gene for which alternative splicing was previously found to be regulated by the RNA binding protein RBFOX2 [4], contains a cassette exon (skipped exon, SE) that is highly expressed in the Wilms' tumor xenografts (WT37, WT14, and WT11) and the cells representing the early stage of human kidney development (hFK1), and whose expression gradually decreases during later stages of kidney development (hFK2 and hFK3). Shown are a Sashimi plot and a barplot of inclusion levels obtained from rMATS. Figure S25: The gene RAC1 contains a cassette exon (skipped exon, SE) that is lowly expressed in the Wilms' tumor xenografts (WT37, WT14, and WT11) and the cells representing the early stage of human kidney development (hFK1), and whose expression gradually increases during later stages of kidney development (hFK2 and hFK3). Shown are a Sashimi plot and a barplot of inclusion levels obtained from rMATS. Figure S26: The gene RAI14 contains a cassette exon (skipped exon, SE) that is lowly or moderately expressed in the Wilms' tumor xenografts (WT37, WT14, and WT11) and the cells representing the early and intermediate stages of human kidney development (hFK1 and hFK2), and whose expression increases during the later stage of kidney development (hFK3). Shown are a Sashimi plot and a barplot of inclusion levels obtained from rMATS. Figure S27: The gene RPS24 contains a cassette exon (skipped exon, SE) that is highly expressed in the Wilms' tumor xenografts (WT37, WT14, and WT11) and the cells representing the early stage of human kidney development (hFK1), and whose expression gradually decreases during later stages of kidney development (hFK2 and hFK3). Shown are a Sashimi plot and a barplot of inclusion levels obtained from rMATS. Figure S28: The gene SCRIB contains a cassette exon (skipped exon, SE) that is highly expressed in the Wilms' tumor xenografts (WT37, WT14, and WT11) and the cells representing the early stage of human kidney development (hFK1), and whose expression gradually decreases during later stages of kidney development (hFK2 and hFK3). Shown are a Sashimi plot and a barplot of inclusion levels obtained from rMATS. Figure S29: The gene SEC31A [18], that was previously shown to be alternatively spliced between mesenchymal and epithelial tissues, contains a cassette exon (skipped exon, SE) that is highly expressed in the Wilms' tumor xenografts (WT37, WT14, and WT11) and the cells representing the early stage of human kidney development (hFK1), and whose expression gradually decreases during later stages of kidney development (hFK2 Figure S30: The gene SLK contains a cassette exon (skipped exon, SE) that is lowly expressed in the Wilms' tumor xenografts (WT37, WT14, and WT11) and the cells representing the early stage of human kidney development (hFK1), and whose expression gradually increases during later stages of kidney development (hFK2 and hFK3). Shown are a Sashimi plot and a barplot of inclusion levels obtained from rMATS. Figure S31: The gene SPAG9 contains a cassette exon (skipped exon, SE) that is highly expressed in two of the Wilms' tumor xenografts (WT37 and WT11) and the cells representing the early stage of human kidney development (hFK1), and whose expression gradually decreases during later stages of kidney development (hFK2 and hFK3). Shown are a Sashimi plot and a barplot of inclusion levels obtained from rMATS. Figure S32: The gene SPTAN1 contains a cassette exon (skipped exon, SE) that is lowly expressed in the Wilms' tumor xenografts (WT37, WT14, and WT11) and the cells representing the early stage of human kidney development (hFK1), and whose expression gradually increases during later stages of kidney development (hFK2 and hFK3). Shown are a Sashimi plot and a barplot of inclusion levels obtained from rMATS. Figure S33: The gene TBC1D23 contains a cassette exon (skipped exon, SE) that is highly expressed in the Wilms' tumor xenografts (WT37, WT14, and WT11) and the cells representing the early stage of human kidney development (hFK1), and whose expression gradually decreases during later stages of kidney development (hFK2 and hFK3). Shown are a Sashimi plot and a barplot of inclusion levels obtained from rMATS. Figure S34: The gene USO1 contains a cassette exon (skipped exon, SE) that is lowly expressed in the Wilms' tumor xenografts (WT37, WT14, and WT11) and the cells representing the early stage of human kidney development (hFK1), and whose expression gradually increases during later stages of kidney development (hFK2 and hFK3). Shown are a Sashimi plot and a barplot of inclusion levels obtained from rMATS. Figure S35: The gene VPS29 contains a cassette exon (skipped exon, SE) that is highly expressed in the Wilms' tumor xenografts (WT37, WT14, and WT11) and the cells representing the early stage of human kidney development (hFK1), and whose expression gradually decreases during later stages of kidney development (hFK2 and hFK3). Shown are a Sashimi plot and a barplot of inclusion levels obtained from rMATS. Figure S36: The gene FGFR2 [9,10], that was previously shown to be alternatively spliced between mesenchymal and epithelial tissues, contains two mutually exclusive exons (MXE), one of which is lowly expressed in the Wilms' tumor xenografts (WT37, WT14, and WT11) and the cells representing the early stage of human kidney development (hFK1), and whose expression gradually increases during later stages of kidney development (hFK2 and hFK3). Shown are a Sashimi plot, a barplot of inclusion levels obtained from rMATS, and barplots of normalized DEXseq counts. Figure S37: The gene FYN contains two mutually exclusive exons (MXE), one of which is lowly expressed in the Wilms' tumor xenografts (WT37, WT14, and WT11) and the cells representing the early stage of human kidney development (hFK1), and whose expression gradually increases during later stages of kidney development (hFK2 and hFK3). Shown are a Sashimi plot, a barplot of inclusion levels obtained from rMATS, and barplots of normalized DEXseq counts. Figure S38: The gene CYB561A3 contains an alternative 3' splice-site (A3SS), with an exon segment that is highly expressed in the Wilms' tumor xenografts (WT37, WT14, and WT11) and the cells representing the early stage of human kidney development (hFK1), and whose expression gradually decreases during later stages of kidney development (hFK2 and hFK3). Shown are a Sashimi plot and a barplot of inclusion levels obtained from rMATS. Figure S39: The gene RPS24 contains an alternative 3' splice-site (A3SS), with an exon segment that is lowly expressed in the Wilms' tumor xenografts (WT37, WT14, and WT11) and the cells representing the early stage of human kidney development (hFK1), and whose expression gradually increases during later stages of kidney development (hFK2 and hFK3). Shown are a Sashimi plot and a barplot of inclusion levels obtained from rMATS. Figure S40: The gene CTNND1 contains an alternative 5' splice-site (A5SS), with an exon segment that is lowly expressed in the Wilms' tumor xenografts (WT37, WT14, and WT11) and the cells representing the early stage of human kidney development (hFK1), and whose expression gradually increases during later stages of kidney development (hFK2 and hFK3). Shown are a Sashimi plot and a barplot of inclusion levels obtained from rMATS. Figure S41: The gene LUC7L3 contains a retained intron (RI), with an exon segment that is highly expressed in the Wilms' tumor xenografts (WT37, WT14, and WT11) and the cells representing the early stage of human kidney development (hFK1), and whose expression gradually decreases during later stages of kidney development (hFK2 and hFK3). Shown are a Sashimi plot, a barplot of inclusion levels obtained from rMATS, and a barplot of normalized DEXseq counts. Figure S42: The gene EPB41L5 [6,11], that was previously shown to be alternatively spliced between mesenchymal and epithelial tissues, has two alternative 3' ends, with a long isoform that is highly expressed in the Wilms' tumor xenografts (WT37, WT14, and WT11) and the cells representing the early stage of human kidney development (hFK1), and whose expression gradually decreases during later stages of kidney development (hFK2 and hFK3). Shown are barplots of normalized DEXseq counts for selected exons from the long and short isoforms.