Recent genome-wide studies have demonstrated that fungi possess the machinery to alternatively splice pre-mRNA. However, there has not been a systematic categorization of the functional impact of alternative splicing in a fungus. We investigate alternative splicing and its functional consequences in the model mushroom forming fungus Schizophyllum commune. Alternative splicing was demonstrated for 2,285 out of 12,988 expressed genes, resulting in 20% additional transcripts. Intron retentions were the most common alternative splicing events, accounting for 33% of all splicing events, and 43% of the events in coding regions. On the other hand, exon skipping events were rare in coding regions (1%) but enriched in UTRs where they accounted for 57% of the events. Specific functional groups, including transcription factors, contained alternatively spliced genes. Alternatively spliced transcripts were regulated differently throughout development in 19% of the 2,285 alternatively spliced genes. Notably, 69% of alternatively spliced genes have predicted alternative functionality by loss or gain of functional domains, or by acquiring alternative subcellular locations. S. commune exhibits more alternative splicing than any other studied fungus. Taken together, alternative splicing increases the complexity of the S. commune proteome considerably and provides it with a rich repertoire of alternative functionality that is exploited dynamically.
Alternative splicing is an important regulatory mechanism, which in many eukaryotes provides additional complexity to the proteome required to sustain cell functions1. In humans, the overwhelming majority of genes are alternatively spliced, and mutations in alternative transcripts can give rise to diseases such as cancer2. Alternative splicing events have also been shown to contribute to the evolution of new species3.
Splicing results in removal of introns in pre-mRNAs, and alternative splicing gives rise to different mRNA variants of a gene. All resulting mRNAs consist of a coding region flanked by UnTranslated Regions (UTRs). We distinguish primitive events, i.e. basic splice operations which only affect a single transcript, from composite events that relate to observed primitive events over multiple transcripts from the same gene (Fig. 1a,b). These events represent regions of especially high sequence heterogeneity between transcripts. Alternative splicing events occur not only in coding regions but also in UTRs, resulting in transcripts with alternative coding regions. Consequently, alternative splicing increases the complexity of the proteome which may also result in functional changes of the proteome when, for example, alternative splicing results in gain or loss of protein domains and functional elements. Alternative splicing events may be triggered by changes in environmental conditions4 or are part of developmental pathways5. On the other hand, alternative splicing may also be an error of the splicing machinery and not necessarily lead to alternative functionality. For instance, RPL30 of Saccharomyces cerevisiae produces a transcript that fails to undergo splicing and later decays6.
Alternative splicing has been shown to occur in fungi. The first reported example of alternative splicing was in the glaA gene of Aspergillus niger, resulting in two isoforms of the glucoamylase enzyme7. Since then, other alternative splicing events have been described. For instance, the primary transcript of ZAS1+ of Schizosaccharomyces pombe contains two zinc finger domains, while its secondary transcript contains three of these domains8. Genome-wide comparative studies have shown that alternative splicing is common in most fungi, but that many yeasts exhibit few alternative splicing events9,10. Developmentally more complex fungi appear to have more alternative splicing (see Fig. 1e, where we compared the phylogeny of fungal species with reported richness of their alternative splicing). Genome-wide studies also showed that development impacts alternative splicing in Fusarium graminearum11, while growth medium composition impacts splicing in Trichoderma longibrachiatum12. In all studies, intron retentions were the most common primitive splicing events.
Alternative splicing at a genome-wide scale can be detected from expressed sequence tags9,10. With the advent of Next Generation Sequencing, detecting alternative splicing in fungi is mostly done by splice junction analysis. By comparing splice junctions that appear as gaps in the alignment of short reads to a reference genome, primary splicing events can be detected12. To reconstruct the structure of alternative transcripts, probabilistic modeling methods are used, which are based on primary event co-occurrence in single reads13,14. Along with splice junction analysis, this approach has also been separately used to study alternative splicing in fungi11. However, the high gene density and polyscystronic transcripts in many fungal genomes15, make this approach inapplicable. Gene UTRs overlapping with transcripts of neighboring genes cause the transcript reconstruction methods to erroneously fuse transcripts from different genes together (for more details see Supplementary Note S2). This shortcoming has prevented a systematic characterization of the alternative splicing repertoire for fungi with gene-dense genomes.
Despite the existing work to identify alternative splicing in fungi, a systematic analysis of the functional impact of alternative splicing is missing. In this work, we show the potential functional impact of alternative splicing in the mushroom forming fungus Schizophyllum commune. We designed a region-restricted probabilistic modeling method to detect alternative splicing for the gene-dense genome of S. commune. We detected alternative splice variants from RNA-seq data at different developmental stages of S. commune. We further analyzed the functional impact of alternative splicing and show its relationships with transcriptional regulation, subcellular localization, and alternative use of functional domains.
Alternative splicing is a common mechanism in S. commune
Sexual development of S. commune proceeds via different stages. Dikaryotic colonies resulting from mating of two compatible partners grow symmetrical in the dark and at high CO2 conditions. Asymmetrical growth and the formation of aggregates is induced when the colony is exposed to light and ambient CO2. The aggregates further develop into primordia and mature mushrooms. mRNA of five different stages of development of S. commune (Fig. 1c) was sequenced to study the incidence and functional impact of alternative splicing. In addition, RNA was used from deletion strains that are affected at different stages of mushroom formation16,17,18 (see Materials and Methods).
Based on RNA sequencing of all samples, transcripts were predicted using our region-restricted probabilistic modeling method (RRPM, see Materials and Methods). Figure 2 gives an overview of features of the found transcripts. 3,331 genes that were not expressed or whose exon boundaries were not supported by the RNA-seq data were excluded from the 16,319 genes in the S. commune v3.0 reference genome. 15,522 transcripts were predicted for the remaining 12,988 genes. 10,703 genes (82%) had only one transcript, 85% of them being identical to the original annotation (Fig. 2a). A total number of 4,819 transcripts were found for the other 2,285 genes (see Fig. 2b). Most of these genes (90%) had two splice variants (Fig. 2b and Supplementary Note S3) and 72% of the alternative transcripts had only one alternative splicing event (Supplementary Note S4). The alternatively spliced genes are located over the entire genome (Supplementary Note S5). Together, 6,130 primitive splicing events resulted in the 4,819 transcripts of the 2,285 genes of which 65% (3,075) occur in the coding region of the gene, and 35% (2,155 genes) in the UTR (Fig. 2c).
Intron retention (33%) and alternative 5′ splicing site (25%) were the most common primitive events (Fig. 2f(i)). Alternative 3′ splicing sites and exon skipping events occurred at similar levels (21%). Mutually exclusive exons accounted for 93% of the 1,491 composite events, while multiple alternative 5′ and 3′ splicing sites were relatively rare at 4% and 3%, respectively (Fig. 2g(i)). These mutually exclusive exon events occurred in 171 genes representing 8% of the alternative spliced genes. Most alternative splicing events are not reading frame neutral (Supplementary Note S6).
UTRs of genes might stretch so far that they overlap with a neighboring gene. As a consequence, paired-end reads may lie within the boundary of another gene. To investigate whether such reads have a strong effect on event counts, we selected only those genes that are flanked on both sides by intergenic regions with zero coverage (i.e. no overlap). A total of 306 alternatively spliced genes were found in the selected set of 2,516 genes. We observed the same distribution of event counts in this set as compared to all genes (Supplementary Note S7). From this, we conclude that our analysis is not confounded by reads from neighboring genes.
Alternative splicing is not limited to coding regions
Alternative splicing occurs not only in the translated regions of transcripts, but also in the UTRs of transcripts. RRPM is able to detect alternative splicing events that lie within the original gene boundaries of known genes. RRPM can detect alternative splicing events after premature stop codons introduced by alternative splicing events that still lie within the original gene boundaries (Supplementary Note S8). RRPM cannot investigate alternative splicing events in the UTRs extending outside of these original gene boundaries, and which are generally unknown. In the unknown UTR regions, we can use splice junction analysis (Supplementary Note S9).
A total of 3,975 (65%) alternative splicing events were found within coding transcript regions (Fig. 2c). Of these, 44% are intron retentions, and only 1% were exon skipping events (Fig. 2f(ii)). Due to splicing, some regions at the 5′ and 3′ ends of the original coding regions may become non-coding UTRs, and this is where the remaining 2,155 (35%) events are located. Of these, 1,225 (57%) were classified as exon skipping events and 293 (14%) as intron retentions (Fig. 2f(iii) and Supplementary Note S10). Mutually exclusive exons occur primarily in the untranslated regions of transcripts (Fig. 2g(ii,iii)). They account for 99% of all composite events in the UTR regions, but for only 15% of those in the coding regions. Multiple alternative 5′ and 3′ splicing sites, which are rare, occur mostly in coding regions.
Splice junction analysis enables a semi-quantitative analysis of alternative splicing in coding and non-coding regions (see Methods). We observed 186,221 splice junctions across all our RNA-seq samples. 29% of these occur only once, and are likely errors of the splicing machinery or sequencing. In the remaining 71%, we find 15,054 primary alternative splicing events. 62% lie within the coding regions of defined genes, and 38% are in the UTRs of these genes. We found 7,851 alternative (5′ or 3′) splicing sites and 1,527 exon skipping events inside coding regions, while we discovered 4,574 alternative splicing sites and 1,102 exon skipping events in UTRs. Alternative splicing is therefore clearly not limited to translated regions.
Expression over developmental stages implicates alternative splicing in development
Time Course Switching (TCS) genes differentially express splice variants during development5. The primary transcript (defined as the transcript that has the highest average expression over development), is replaced in expression by one of its alternative transcripts at some point in time. S. commune was found to have 425 of such genes, representing 865 transcripts. The aggregate stage contained more expressed alternative transcripts than any other stage (Fig. 3a). At this stage, the alternative transcripts of 142 genes were more abundant than their primary transcript. For example, the alternative transcript (ID CUFF.11357.2) of the carbohydrate active enzyme (cazyme) gene G2634198 (Fig. 4a) entirely replaces the primary transcript (ID 2634198) during the aggregate stage, after which its abundance decreased to zero at the primordia stage, and in turn, replaced by the primary transcript. Gene G2629174 (Fig. 4b) which encodes a secreted protein also exhibits this behavior. The polypeptide encoded by the alternative transcript lacks a large part of its N-terminus due to intron retention.
Genes with alternative transcripts exhibit alternative functionality through protein domains
Alternative splicing may result in the gain or loss of functional domains. Out of the 2,285 genes with alternative transcripts, 1,257 (55%) have predicted functional domains. A functional domain is lost/gained in an alternative transcript in 897 (70%) genes (Fig. 3b,c and Supplementary Note S11). For 274 (12%) genes a domain is exchanged for another (Fig. 4c). For example, gene G2502024, encoding a predicted secreted cazyme, has two splice variants (Fig. 4c) both annotated with a glycosyl hydrolase domain (IPR domain IPR023296) involved in the hydrolysis of polysaccharide carbohydrates. The primary transcript (ID 2502024) is annotated with a glycoside hydrolase family 43 domain (IPR domain IPR006710), while the alternative transcript (ID CUFF.10781.2) has a different glycosyl hydrolase domain (IPR013148). The presence of alternative domain families in the two transcripts indicates alternative carbohydrate degradation abilities. 102 (24%) of the time course switching genes are predicted to have alternative functionality through alternative domain usage, implicating a change of function of those genes throughput development.
Alternative splicing can affect the subcellular localization of encoded proteins
Alternative splicing may result in the gain or loss of a localization signal leading to a function in another compartment of the cell. For 936 alternatively spliced genes (41%) we were unable to predict a subcellular location for any transcripts (see Supplementary Note S12), and for 1278 genes (56%) we were unable to predict a subcellular location for all transcripts, but in 71 genes (3%), transcripts have different predicted localization signals (Fig. 3d).
Gene G2607155 is an example of a predicted metabolic gene with two transcripts encoding proteins with alternative subcellular localizations (see Fig. 4d). The primary and alternative transcript, 2607155 and CUFF.4351.1, respectively, are both expressed. The protein encoded by the primary transcript is predicted to be transported to the mitochondrion, while the polypeptide encoded by the alternative transcript has an intron retention near the 5′ end of the transcript. As a result, it loses its first exon, destroying a valine catabolism-related3-hydroxyisobutyrate dehydrogenase site (IPR002204) specific to mitochondria, and the signaling peptides. The alternative transcript still retains the four deoxyhydrogenase (IPR029154, IPR008927, IPR013328, IPR006115) and the NADP binding domain (IPR016040) of the protein encoded by the primary transcript. Based on its remaining protein sequence, it is predicted that this secondary transcript is located in the cytoplasm, which indicates that alternative splicing may regulate the location of proteins in S. commune.
Alternative splicing is present in key functional groups
Of particular interest are five gene functional groups: i) transcription factors, ii) cazymes, iii) secreted proteins, iv) cytochrome P450s, and v) metabolic genes. Alternative splicing occurs in all these groups (see Fig. 2e and Supplementary Note S13), and the set of cytochrome P450s are enriched in the set of alternatively spliced genes. The peak of alternative transcript expression in the aggregate stage persists across all these functional groups regardless of expression threshold, demonstrating that alternative splicing is actively regulated in these functional groups (see Supplementary Note S14).
Alternative splicing in a transcription factor gene may impact a DNA binding domain or a protein interaction domain. As an example, the alternative transcript of gene G2679567 has an alternative 5′ splicing site in its second exon (see Fig. 4e) resulting in a shortened DNA binding domain (IPR021715) due to the loss of most of the third exon. This alternative transcript is differentially expressed across developmental stages, suggesting that the regulation program also changes with alternative splicing. Genes encoding secreted proteins that are alternatively spliced may impact the signal sequence for secretion located in the N-terminal of the protein. The alternative transcript of the TCS gene G2629174 is the result of an intron retention (see Fig. 4b). As a result, the N-terminal signal sequence is affected, which is predicted to prevent the protein from being be secreted. If this is the case then, based on the expression of these transcripts, this protein is secreted at different rates throughout development. Many secreted proteins may take on alternative locations in the cell (Fig. 3d). In 22 cases, secreted proteins (in the extracellular compartment) are predicted to end up in other subcellular locations (8 in the mitochondrion, 8 in the nucleus, 4 in the plasma membrane and 2 in the cytoplasm), and in 62 cases, we were unable to predict the subcellular location (Supplementary Note S12).
Implications on previously studied genes of S. commune
Only 84 S. commune genes have been assigned names, 13 of them have alternative splicing variants (see Supplementary Note S15). Among these genes are the mating type genes aay4, abu6, and bbp2–1, transcription factor genes crea and HOM1 that encode the carbon catabolite repression regulator, and the fruiting transcription factor Hom1. The alternative transcripts in the mating type genes have no effect on function. In contrast, alternative splicing impacts hom1. Its primary transcript retains the second intron (see Fig. 4f). This produces a frame shift that prematurely ends the coding region. Due to this, it loses its homeobox domain and thus will not function as a transcription factor. Notably, this hom1 transcript is the most abundant throughout development.
Previous studies characterizing alternative splicing in fungi unequivocally found intron retentions to be the most common events10,11,12. This study confirmed this finding with intron retentions accounting for 33% of the events. Notably, they are enriched in coding regions, where they account for 43% of the events, while exon skipping occurred primarily in UTRs. With 1% of the events, exon skipping was rare in coding regions, but they represented 57% of the events in UTRs. The prevalence of intron retention contrasts the dominance of exon skipping in mammals19,20. Our results strongly suggest that fungi use alternative splicing events differently when compared to mammalian genomes. Based on event prevalence they are more similar to plants, where intron retentions are also the most common event21.
Most alternative splicing events (64%) were not reading frame neutral, indicating that there is no strong selective pressure for reading frame neutral alternative splicing events. This is also in contrast with alternative splicing in mammals, where there appears to be a selection for reading-frame neutral events19,20.
The incidence of alternative splicing was much higher compared to previous studies. Previous studies examined primarily ascomycetes and few basidiomycetes, only two of which form mushrooms9,10. This may imply that more developmentally complex fungal species have more alternative splicing (see Fig. 1e and Supplementary Note S1) with basidiomycetes having the most alternative splicing, followed by pathogenic ascomycetes, then plant pathogenic ascomycetes and finally, saprobic ascomycetes and yeasts.
The genome of S. commune encodes 16,319 genes. The median intergenic distance is 539 bp, making the gene density very high when compared to plants and humans22,23. This hampers studying alternative splicing in S. commune, and therefore, we focused only on annotated genes. This prevents the discovery of splicing events at novel loci but allowed us to identify alternatively spliced transcripts of the annotated genes and to study the functional impact of these splicing events. Recent studies from the ENCODE project23 have shown that significant portions of non-coding regions of the human genome are expressed as mRNAs, and are spliced. By analyzing splice junctions, we observed 5,676 alternative splicing events in non-coding regions, demonstrating that S. commune also makes use of alternative splicing in non-coding regions. Due to the density of the genome, events that lie in UTRs cannot be associated with a gene using short read data. Use of long read technologies will alleviate technological problems in assembling the splice variants as those reads will cover entire transcripts.
In predicting the protein sequences of our transcripts, we assumed that the longest Open Reading Frame (ORF) is the one that is most likely to be translated. This is a common assumption in gene prediction software, yet, as a consequence, different transcripts within a gene may have different start codons. The Kozak consensus sequence is a good indicator of translation initiation24. The coding sequences predicted in our alternative transcripts indeed are supported by Kozak consensus sequences (see Supplement Note S17). But, translation initiation is still poorly understood and predicting exactly where translation should initiate (i.e. which ORF is actually translated) is still difficult25. It has been shown that there are alternative AUG start codons which produce different proteins from the same transcript26,27. Even beyond that, there are ORFs in the upstream UTRs that are also translated28. Future studies of alternative splicing should extend their perspective from the current “one gene, many transcripts”, to the “one transcript many proteins” viewpoint.
Finally, we should note that we have been conservative in our prediction of alternative transcripts and alternative splicing events. Additionally, although we sampled across the whole development of S. commune, all samples were grown under the same environmental conditions (see Methods) in an isogenic strain (see Supplementary Note S16). Hence, alternative splicing may be much more abundant and provide much more functionality than we observe in this study.
Alternative splicing was found to increase proteome complexity in S. commune by the gain or loss of functional domains or localization signals. Alternative splicing was found in 18% of expressed genes, resulting in 20% more transcripts. Yet, the real incidence of alternative splicing is probably higher. More than two thirds of the alternatively spliced genes exhibited alternative functionality by coding for different domains or by acquiring alternative subcellular localizations. Selected functional groups were shown to contain alternatively spliced genes, with the cytochrome P450 genes showing the highest proportion. Time Course Switching genes that differentially express gene variants throughout development represented 19% of the alternatively spliced genes. Of these genes, 24% were predicted to have alternative functionality. We thus show that alternative splicing dynamically varies the complexity of the proteome encoded in the dense genome of S. commune.
It has long been believed that alternative splicing does not contribute functionality to the cell in plants, or fungi, despite the presence of alternative splicing events29. Alternative splicing events were generally thought to result in non-functional proteins or to be similar to other transcripts of the same gene, resulting in retained functionality. Similarly, in the early stages of the ENCODE project it was also commonly believed that the vast majority of alternative mammalian transcripts do not have functional roles30. Now, the role of alternative splicing is commonly accepted in humans31 and its tremendous potential influence on function is being accepted in plants32,33,34. With this work, we extend this notion to fungi.
S. commune genome and annotations
The genome sequence of S. commune v3.0 was retrieved from the Joint Genome Initiative database35, together with gene definitions (GFF), Interpro and GO annotations.
mRNA was isolated from S. commune strain H4–836 grown at 25 °C on minimal medium containing 1% glucose and 1.5% agar37. RNA was sampled during (i) vegetative growth, (ii) induced vegetative growth, (iii) aggregate formation,(iv) primordia formation, and (v) when mushroom had developed. In the induced vegetative growth stage, the colony has been exposed to light for some time but has not yet started forming aggregates. RNA was also isolated from 9 dikaryotic deletion strains at two time points16,17,18: Δwc-1Δwc-1; Δwc-2Δwc-2; Δhom1Δhom1; Δhom2Δhom2; Δfst3Δfst3; Δfst4Δfst4; Δbri1Δbri1; Δgat1Δgat1; and Δc2h2Δc2h2. The sample at the first time point was taken at the time where a simultaneously growing wildtype sample reached the aggregates stage of development. The sample at the second time point was taken when the wildtype had formed mature mushrooms (see Supplementary Note S18).
Samples were sequenced with 100 bp paired-end sequencing on the HiSeq 2000 Illumina platform with an average of 1.7 Gb of reads per sample (average 43x genome coverage). Exceptions are the wildtype stages i) vegetative growth, ii) induced vegetative growth and v) primordia, which were sequenced at a later date with 125 bp paired end sequences on the HiSeq 2500 platform with an average of 5.0 Gb of reads per sample. All samples were produced in duplo for use in transcript abundance estimation. Raw RNA-seq reads are filtered for quality using trimmomatic38. These RNA-seq samples have been made available under BioProject PRJNA323434.
Region-restricted probabilistic modeling (RRPM) for alternative transcript discovery
Due to the high gene-density of S. commune, very few neighboring genes have intergenic regions without expression. In order to reduce the interference of neighboring UTRs, we restricted the discovery of alternative transcripts to gene coding regions only. To this end, the genome was split at gene boundaries into as many fragments as there are genes, each fragment representing a different annotated gene region on the S. commune genome.
Reads from all samples were aligned to the collection of gene-based fragments using STAR39, producing a splice junction database used to optimize STAR alignments in a second alignment round (we use all samples to have a comprehensive database). The second round of STAR produced BAM files sorted by coordinate. Reads not aligning entirely within a gene-based fragment were discarded. Using these BAM files, Cufflinks14 was run in Reference Annotation Based Transcript Assembly (RABT) mode, to produce a set of predicted transcripts. Although the transcripts were predicted on the basis of all samples, it does not mean that these transcripts are also active in every sample. These transcripts were then projected back onto the original genome to restore their context. See Supplementary Note S19 for a schematic of RRPM, and Supplementary Note S20 for parameter settings used for the tools mentioned above.
The output from Cufflinks was filtered to remove transcripts resulting from noise: 1) predicted transcripts which do not lie on the same strand as the original gene description were removed (they cannot be accurately called without strand-specific reads); 2) transcripts which had no expression or did not have reads supporting its splice junctions were removed; 3) transcripts which did not contain Open Reading Frames (ORFs) of at least length 40 nt were removed. See Supplementary Note S21 for an overview of how many genes and transcripts were filtered at each step. The final output is a list of genes with their corresponding transcripts in GFF format.
Calling alternative splicing events
To identify alternative splicing events, we must compare the found transcripts against a reference. Comparing transcripts to the original reference annotation may result in peculiarities due to a lagging quality of annotation. For example, the original annotation may describe a large exon, whereas all predicted transcripts indicate only two smaller exons. To find a suitable reference, we established a consensus annotation based on all expressed transcripts for that gene. The consensus annotation contains all distinctly observed exons in a gene, without intron retentions (see Supplementary Note S22). Consequently, the consensus annotation is not necessarily one of the transcripts of the gene. In fact, it might not even contain a valid ORF. It just serves as a reference to describe observed splice variations.
We called primary splice events by′ aligning the exons of an observed transcript to the consensus annotation. From this alignment we determine the absence of an exon (ES), the fusion of exons (IR), or the extension or contraction of exon boundaries (A5SS, A3SS) (see Supplementary Note S23). Composite events “multiple alternative 5′ and “multiple alternative 3′ splicing sites are classified by grouping exons of transcripts within one gene and counting the number of alternative exon boundaries for each exon. Mutually exclusive exons were detected by counting how often pairs of exons occur across all transcripts. If a pair was never observed across all transcripts in a gene but individually the exons are observed, then we called that pair a mutually exclusive exon event.
We counted the number of alternative splicing events at three levels. At the event level, we counted each alternative splicing event. If, for example, three transcripts exist in a gene, and in two of these transcripts the third exon is skipped, we count this as two events. At the transcript level, we counted each transcript with a particular event. Thus, when a transcript skips two exons, we count it only once. At the gene level, we count the number of genes with a specific alternative splicing event. For example, when a gene has two transcripts that each skip different exons we would counted that as one event.
Detection of untranslated splicing events
We quantify splicing events outside of the translated regions similar to Xie et al.12 through the study of clusters of splice junctions12. The slight differences are described in Supplementary Note S9.
Estimating transcript expression profile of development
Abundance estimates for each transcript were computed by the Cufflinks suite. For each sample separately, we aligned both replicates to the entire genome with STAR in the same two-round procedure as used for transcript prediction. We provided the resulting BAM files of both replicates to Cufflinks, which estimates the abundances of the transcripts in both replicated samples together. Doing this for each developmental stage results in an expression profile for each predicted transcript.
Time Course Switches
Based on the expression profiles of each transcript, we detected time course switching genes for each gene using the method described by Lees et al.5. Briefly, we compare the expression of the primary transcript (which has the highest average expression across time) as well as the expression of an alternative transcript. Intuitively speaking, this comparison scores the relative expression of both transcripts in such a way that the score is larger when the expression profiles of the two transcripts are more dissimilar in shape. As in Lees et al.5, genes which have a score greater than 0.5 are considered to be time course switches. This results in a list of genes which have alternative transcripts that swap expression with another.
Functional annotations and group definitions
Functional domain annotations
InterPro domain annotations were predicted for the translated protein sequence of the longest ORF in each predicted transcript using InterProScan40. We re-index these annotations based on location. If a gene has the same annotation twice, but in two different locations, we assign each of them new identifiers. This allows us to more specifically identify alternative functionality by domain annotations.
Transcription factor definitions
Transcription factors were predicted based on 83 InterProDNA-binding or regulatory domains (see Supplementary Note S24) suggested by the Fungal Transcription Factor Database41,42. Transcription factors are predicted for the original gene definitions. If one of these InterPro domains was present in the protein sequence of a gene, then the gene was called as a transcription factor.
Carbohydrate-active enzymes prediction
Using the Cazymes Analysis Toolkit (CAT)43, we predicted carbohydrate-active enzymes based on the original gene definitions. If a gene’s protein sequence was predicted to be a cazyme by either the sequence-based annotation method or the Pfam-based annotation method then we considered it a cazyme.
Secreted Proteins prediction
We used the same procedure as44 to predict secreted proteins. Briefly, genes with SignalP45 signal peptides, or a TargetP46 Loc = S were kept. The remaining genes were further filtered with TMHMM47, keeping only genes with zero or one transmembrane domains. Finally, genes were filtered using Wolf PSort48 to select genes with a Wolf PSort extracellular score greater than 17.
Metabolic and Cytochrome P450 gene groups
Genes with the GO annotation “metabolic process” (annotation ID: GO:0008152) were called as metabolism genes. Genes with the IPR annotation IPR001128 were used as Cytochrome P450 genes.
Subcellular location prediction
We used Wolf PSort48 to predict the subcellular localization of alternatively spliced transcripts. Protein sequences for each transcript were constructed, and provided to Wolf PSort. All annotations with a score below 17 were removed44.
Accession codes: The RNA-seq data and relevant metadata can be found under BioProject PRJNA323434.
How to cite this article: Gehrmann, T. et al. Schizophyllum commune has an extensive and functional alternative splicing repertoire. Sci. Rep. 6, 33640; doi: 10.1038/srep33640 (2016).
We thank Aurin Vos for his invaluable biological insights, Marc Hulsman for his input in early experiments, Alex Salazar for his feedback on this manuscript, and Alexey Gritsenko for sharing his knowledge of translation initiation. The sequence data of S. commune H4–8 version 3 were produced by the US Department of Energy Joint Genome Institute http://www.jgi.doe.gov/ in collaboration with the user community. This research is supported by the Dutch Technology Foundation STW, which is part of the Netherlands Organisation for Scientific Research (NWO), and which is partly funded by the Ministry of Economic Affairs.
This work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/