Much remains to be learned about the biology of mushroom-forming fungi, which are an important source of food, secondary metabolites and industrial enzymes. The wood-degrading fungus Schizophyllum commune is both a genetically tractable model for studying mushroom development and a likely source of enzymes capable of efficient degradation of lignocellulosic biomass. Comparative analyses of its 38.5-megabase genome, which encodes 13,210 predicted genes, reveal the species's unique wood-degrading machinery. One-third of the 471 genes predicted to encode transcription factors are differentially expressed during sexual development of S. commune. Whereas inactivation of one of these, fst4, prevented mushroom formation, inactivation of another, fst3, resulted in more, albeit smaller, mushrooms than in the wild-type fungus. Antisense transcripts may also have a role in the formation of fruiting bodies. Better insight into the mechanisms underlying mushroom formation should affect commercial production of mushrooms and their industrial use for producing enzymes and pharmaceuticals.
The importance of mushroom-forming fungi in agriculture, human health and ecology underscores their biotechnological potential for a wide range of applications. The most conspicuous forms of these species, most of which are basidiomycetes, are their fleshy, spore-bearing fruiting bodies. Although these are primarily of economic value because of their use as food1,2 (worldwide production of edible mushrooms amounts to ∼2.5 million tons annually), mushrooms also produce antitumor and immunostimulatory molecules1,2, as well as enzymes used for bioconversions3. Moreover, they have been identified as promising cell factories for the production of pharmaceutical proteins4.
Despite their economic importance, relatively little is known about how mushroom-forming fungi obtain nutrients and how their fruiting bodies are formed. The vast majority of mushroom-forming fungi cannot be genetically modified, or even cultured under laboratory conditions. The basidiomycete Schizophyllum commune, which completes its life cycle in ∼10 d, is a notable exception insofar as it can be cultured on defined media and there are a wealth of molecular tools to study its growth and development. It is the only mushroom-forming fungus for which genes have been inactivated by homologous recombination. The importance of S. commune as a model system is also exemplified by the fact that its recombinant DNA constructs will express in other mushroom-forming fungi5. In contrast, constructs that have been developed for ascomycetes are often not functional in mushroom-forming basidiomycetes.
S. commune is one of the most commonly found fungi and can be isolated from all continents, except for Antarctica. S. commune has been reported to be a pathogen of humans and trees, but it mainly adopts a saprobic lifestyle by causing white rot6. It is predominantly found on fallen branches and timber of deciduous trees. At least 150 genera of woody plants are substrates for S. commune, but it also colonizes softwood and grass silage7. The mushrooms of S. commune that form on these substrates are used as a food source in Africa and Asia.
In the life cycle of S. commune8, meiospores germinate to form a sterile monokaryotic mycelium, in which each hyphal compartment contains one nucleus. Initial growth of this mycelium occurs beneath the surface of the substrate, with formation of aerial hyphae a few days after germination (Fig. 1a,b). Monokaryons that encounter each other fuse, and a fertile dikaryon forms when the alleles of the mating-type loci matA and matB of the partners differ. A short exposure to light is essential for fruiting, whereas a high concentration of carbon dioxide and high temperatures (30–37 °C) are inhibitory. Mushroom formation is initiated with the aggregation of aerial dikaryotic hyphae. These aggregates (Fig. 1c,d) form fruiting-body primordia (Fig. 1e,f), which further develop into mature fruiting bodies (Fig. 1g,h). Karyogamy and meiosis occur in the basidia within the mature fruiting body, and the resulting basidiospores can give rise to new monokaryotic mycelia.
Here we report the genomic sequence of the monokaryotic S. commune strain H4-8 and illustrate the potential of this basidiomycete as a model system to study mushroom formation. Besides the importance of understanding the sexual reproduction of S. commune for the commercial production of mushrooms, insight into the basis of this species' capacity to degrade lignocellulose may inspire more effective strategies to degrade lignocellulosic feedstocks for biofuel production.
The genome of S. commune
Sequencing of the genomic DNA of S. commune strain H4-8 with 8.29× coverage (Supplementary Table 1) revealed a 38.5-megabase genome assembly with 11.2% repeat content (Supplementary Results 1). The assembly is contained on 36 scaffolds (Supplementary Table 2), which represent 14 chromosomes9. We predict 13,210 gene models, with 42% supported by expressed sequenced tags (ESTs) and 69% similar to proteins from other organisms (Supplementary Tables 3 and 4). Clustering of the proteins of S. commune with those of other sequenced fungi (a phylogenetic tree of the organisms used in the analysis is shown in Supplementary Fig. 1) identifies 7,055 groups containing at least one S. commune protein (Supplementary Table 5). Analysis of these clusters suggested that 39% of the S. commune proteins have orthologs in the Dikarya and are thus conserved in the Basidiomycota and Ascomycota (Supplementary Table 6). Notably, a similar percentage of proteins (36%) are unique to S. commune, as based on OrthoMCL analysis. Of these proteins, 46% have at least one inparalog (a gene resulting from a duplication within the genome) in S. commune. The uniqueness of the S. commune proteome is also illustrated by the over- and under-representation of protein family (PFAM) domains compared to other fungi (Supplementary Results 2) and the fact that only 43% of the predicted genes (5,703 out of the 13,210) could be annotated with a gene ontology (GO) term.
Global gene expression analysis
We used massively parallel signature sequencing (MPSS) to compare whole-genome expression at the four developmental stages, defined by monokaryons, stage I aggregates, stage II primordia and mature fruiting bodies (Fig. 1). The majority of genes are either expressed in all four stages (4,859 genes) or not expressed in any of them (5,308 genes) (Fig. 2 and Supplementary Table 7). Of the 13,210 predicted genes, 59.8% are expressed in at least one developmental stage (Supplementary Table 7). Fewer of the unique S. commune genes meet this criterion, whereas a higher percentage was observed for genes that share orthologs with Agaricomycetes or more distant fungi (Supplementary Table 6). This suggests that S. commune genes lacking homology to any reported sequences are more stringently regulated than orthologs of genes reported for other species. This is consistent with the observation that genes that are apparently unique to S. commune are over-represented in the pool of genes that are differentially expressed during the four developmental stages studied (Supplementary Tables 8 and 9).
Antisense transcription is a widespread phenomenon in S. commune (Fig. 2b,c). Of the tags that could be related to a gene model, 18.7% originate from an antisense transcript; and 42.3% of the predicted genes have antisense expression during one or more of the four developmental stages studied (Supplementary Tables 7 and 10). Northern hybridization with strand-specific probes confirmed the existence of antisense transcripts of sc4 (DOE JGI Protein ID 73533; data not shown). Whereas a relatively large number of genes expressed in the antisense direction are uniquely expressed in stage II (2,888 genes), relatively few genes are expressed in the antisense direction in all stages (1,195 genes) (Fig. 2b). Our data suggest that 4,302 genes are expressed in both the sense and antisense directions during stage II (Fig. 2c). This overlap is larger for genes expressed during this phase of the life cycle than for the other developmental stages studied.
We performed an enrichment analysis of functional annotation for the expression profiles of the developmental stages defined by monokaryons, stage I aggregates, stage II primordia and mature fruiting bodies. Functional terms involved in protein or energy production, or associated with hydrophobins, are over-represented in genes upregulated during formation of stage I aggregates (Fig. 1 and Supplementary Table 9). Genes involved in signal transduction, regulation of gene expression, cell wall biogenesis and carbohydrate metabolism are enriched in the group of genes downregulated during the formation of stage I aggregates. These functional terms are enriched in the upregulated genes during formation of stage II primordia, whereas terms involved in protein and energy production are enriched in the downregulated genes (Fig. 1 and Supplementary Table 9). Genes encoding transcription factors and genes involved in amino acid, glucose and alcohol metabolism are enriched in the group of genes downregulated during the formation of mature fruiting bodies.
As whole-genome expression was previously analyzed during mushroom formation in Laccaria bicolor10, we next investigated whether the regulation of orthologous gene pairs of L. bicolor and S. commune might be correlated during fruiting. When we compared microarray expression profiles of free-living mycelium and mature fruiting bodies of L. bicolor to the MPSS expression profiles of monokaryotic mycelium and mature fruiting bodies of S. commune, we found that 6,751 expressed genes from S. commune had at least one expressed ortholog in L. bicolor. We determined the correlation of changes in expression of the functional annotation terms to which these orthologous pairs belong. There were 15 gene ontology terms, 2 KEGG terms, 4 KOG terms and 4 PFAM terms that showed a positive correlation in expression (P < 0.01; Supplementary Table 11). These terms include metabolic pathways (such as valine, leucine and isoleucine biosynthesis) and regulatory mechanisms (such as transcriptional regulation by transcription factors and signal transduction by G-protein α subunit). This indicates that regulation of these processes during mushroom formation is conserved in S. commune and L. bicolor.
Analysis of the matA and matB gene loci
Formation of a fertile dikaryon is regulated by the matA and matB mating-type loci. Proteins encoded in these loci activate signaling cascades (Supplementary Results 3) upstream of target genes. The target genes include those encoding enzymes and proteins that fulfill structural functions, such as hydrophobins (Supplementary Results 4), needed for the formation of fruiting bodies.
The matA locus of S. commune strain H4-8 appears to have more homeodomain genes than any fungal mating-type locus described thus far. This locus consists of two subloci, Aα and Aβ, which are separated by 550 kilobases (kb) on chromosome I of strain H4-8. Annotation revealed that the Aα locus of H4-8 contains two divergently transcribed genes, which encode the Y and Z homeodomain proteins of the HD2 and HD1 classes, respectively (Fig. 3 and Supplementary Table 12). These two genes, aay4 and aaz4, have been described previously1. A homeodomain gene has also been identified previously in the Aβ locus of H4-8 (ref. 11). Our genomic sequence revealed that this locus actually contains six predicted homeodomain genes: abq6 (HD1), abr6 (HD2), abs6 (HD1), abt6 (HD1, but lacking the nuclear localization signal), abu6 (HD1) and abv6 (HD2) (Fig. 3 and Supplementary Table 12).
Annotation of the genomic sequence of S. commune reveals that the matB system contains more genes than previously envisioned. The matB locus comprises two linked loci, Bα and Bβ, which both encode pheromones and pheromone receptors1 (Fig. 3). Previously, one pheromone receptor gene was identified in both Bα3 and Bβ2 of strain H4-8 (called bar3 and bbr2, respectively)12. The genome sequence of S. commune reveals four additional genes with high sequence similarity to these pheromone receptor genes, which we call B receptor–like genes 1 to 4 (brl1 to brl4; Fig. 3). Three of these genes are located near bar3 and bbr2 on scaffold 10, whereas one (brl4) is located on scaffold 8. MPSS analysis shows that the brl genes are expressed (Supplementary Table 13). In fact, of all receptor and receptor-like genes, brl3 shows the highest expression under the conditions tested.
Three and eight pheromone genes have previously been identified at the Bα3 and Bβ2 loci, respectively13. We identified one additional pheromone gene, named B pheromone–like-5 (bpl5), at the Bα3 locus. Moreover, four additional pheromone-like genes were detected at the Bβ2 locus, called bpl1 to bpl4 (Fig. 3). Of these, only bpl2 showed no expression in MPSS analysis (Supplementary Table 13). The Bα gene bpl5 and three of the new Bβ pheromone-like genes show deviations from the consensus farnesylation signal, CAAX (where C is cysteine, A is aliphatic and X is any residue), with the variant motifs CASR, CTIA, CRLT and CQLT for Bpl5, Bpl1, Bpl2 and Bpl3, respectively. Previously, one of the pheromone genes (bbp2(6)) was shown to function with the deviant farnesylation signal CEVM12. This suggests that in S. commune only one amino acid residue in the consensus sequence of the farnesylation signal needs to be aliphatic.
The genome of S. commune reveals genes encoding 471 putative transcription factors, of which 311 are expressed during at least one developmental stage (Supplementary Table 14). Of these genes, 56% are expressed in all developmental stages; 268 were expressed in the monokaryon, 200 during formation of stage I aggregates, 283 during formation of stage II aggregates and 253 during formation of mushrooms. We identified a cluster of monokaryon-specific transcription factors and a group of transcription factors upregulated in stage II primordia or in mature mushrooms, or both (Fig. 4). The latter group includes fst3 (NCBI Protein ID: 257422) and fst4 (NCBI Protein ID: 66861), which encode transcription factors that contain a fungus-specific Zn(II)2Cys6 zinc-finger DNA binding domain.
We inactivated the fst3 and fst4 genes via targeted gene deletions. The Δfst3 and Δfst4 monokaryons showed no phenotypic differences from the wild-type monokaryons. In contrast, the Δfst4 Δfst4 dikaryon did not fruit, but produced more aerial hyphae when compared to the wild type (Fig. 5). This suggests that Fst4 is crucial in the switch between the vegetative and reproductive phases of the S. commune life cycle. In contrast, the Δfst3 Δfst3 dikaryon formed more, albeit smaller, reproductive structures than those of the wild type (Fig. 5). As spatial and temporal regulation of fruiting-body formation and sporulation were not altered in the Δfst3 Δfst3 strain, we conclude that Fst3 inhibits the formation of clusters of mushrooms.
Wood degradation by Schizophyllum commune
As a white-rot fungus6, S. commune degrades all woody cell wall components; in contrast, brown-rotters efficiently degrade cellulose but only modify lignin, leaving a polymeric residue. Lignin-degrading enzymes, which are commonly classified as FOLymes14, comprise lignin oxidases (LO families) and lignin-degrading auxiliary enzymes that generate H2O2 for peroxidases (LDA families). The LO family consists of laccases (LO1), lignin peroxidases, manganese peroxidases, versatile peroxidases (LO2) and cellobiose dehydrogenases (CDHs; LO3). S. commune contains 16 FOLyme genes and 11 genes that encode enzymes distantly related to FOLyme enzymes (Table 1 and Supplementary Table 15). The genome lacks genes encoding peroxidases of the LO2 family. However, it contains a CDH gene (LO3), two laccase genes (LO1) and 13 LDA genes, including four genes encoding glucose oxidases (LDA6) and benzoquinone reductases (LDA7) (Table 1).
S. commune appears to possess a more diverse assortment of FOLymes than the brown-rot fungus Postia placenta and the fungi that are known not to have ligninolytic activity (that is, Ustilago maydis, Cryptococcus neoformans, Aspergillus nidulans, Neurospora crassa and Saccharomyces cerevisiae; Table 1). In contrast, it has fewer FOLymes than either the coprophilic fungus Coprinopsis cinerea and the white-rot fungus Phanerochaete chrysosporium, which are predicted to possess 40 and 27 members, respectively14.
Regarding polysaccharide degradation, S. commune has the most extensive machinery for degrading cellulose and hemicellulose of all of the basidiomycetes we examined. The Carbohydrate-Active Enzyme database (CAZy) identified 240 candidate glycoside hydrolases, 75 candidate glycosyl transferases, 16 candidate polysaccharide lyases and 30 candidate carbohydrate esterases encoded in the genome of S. commune (Table 1 and Supplementary Table 16). Compared to the genomes of other basidiomycetes, S. commune has the highest number of glycoside hydrolases and polysaccharide lyases. S. commune is rich in genes encoding enzymes that degrade pectin, hemicellulose and cellulose (Supplementary Table 17). In fact, S. commune has genes in each family involved in the degradation of these plant cell wall polysaccharides. The S. commune genome is particularly rich in members of the glycosyl hydrolase families GH93 (hemicellulose degradation) and GH43 (hemicellulose and pectin degradation), and the lyase families PL1, PL3 and PL4 (pectin degradation) (Supplementary Table 17). The pectinolytic capacity of S. commune is further complemented by the presence of pectin hydrolases from families GH28, GH88 and GH105.
The phylum Basidiomycota contains roughly 30,000 described species, accounting for 37% of the true fungi15. The Basidiomycota comprises two class-level taxa (Wallemiomycetes and Entorrhizomycetes) and the subphyla Pucciniomycotina (rust), Ustilaginomycotina (smuts) and Agaricomycotina16. The Agaricomyotina include the mushroom- and puffball-forming fungi, crust fungi and jelly fungi. Genomic sequences are currently available for five members of the Agaricomycotina: P. chrysosporium17, L. bicolor10, P. placenta18, C. neoformans19 and C. cinerea20. Our 38.5-megabase assembly of the S. commune genome represents the first genomic sequence for a member of the family Schizophyllaceae. Thirty-six percent of the encoded proteins have no ortholog in other fungi. Only 43% of the predicted genes could be annotated with a gene ontology term, underscoring that much about the proteome of S. commune remains unknown. This percentage resembles that seen in other basidiomycetes: 30% in L. bicolor10, 48% in P. placenta18 and 49% in P. chrysosporium17.
S. commune invades wood primarily by growing through the lumen of vessels, tracheids, fibers and xylem rays. Adjacent parenchymatic cells in the xylem tissue are invaded via simple and bordered pits. As a consequence of this approach to invasion, cellulose, hemicellulose or pectin can serve as the primary carbon source for S. commune. Indeed, the genome of S. commune probably encodes at least one gene in each family involved in the degradation of cellulose, hemicellulose and pectin. The large number of predicted pectinase genes is consistent with earlier studies describing S. commune as one of the best pectinase producers among the basidiomycetes21. S. commune also encodes carbohydrate-active enzymes that degrade other polymeric sugars, such as those acting on starch, mannans and inulins. Consistent with the wide variety of substrates that support its growth, S. commune has the most complete polysaccharide breakdown machinery of all basidiomycetes examined.
We know much less about how fungi degrade lignin than how they digest plant polysaccharides. Fungi are assumed to use FOLymes to degrade lignin14. Although members of the LO2 family of lignin oxidases are known to degrade lignin, it remains controversial whether laccases (LO1) and cellobiose dehydrogenases (CDHs; LO3) share this capacity. S. commune contains 16 genes encoding FOLymes. There are no members of the LO2 family, but the genome contains one CDH gene and two laccase genes. CDHs may participate in the degradation of cellulose, xylan and, possibly, lignin by generating hydroxyl radicals in a Fenton-type reaction. Laccases catalyze the one-electron oxidation of phenolic, aromatic amines and other electron-rich substrates with the concomitant reduction of O2 to H2O. They are classified as having either low or high redox potential22, but it is not clear whether the two S. commune gene products belong to the high– or low–redox potential enzyme categories.
When the genomes of the white-rot fungi S. commune and P. chrysosporium17 and the brown-rot fungus P. placenta18 are compared, it is clear that S. commune has evolved its own set of FOLymes. P. chrysosporium lacks genes encoding laccases (LO1). It is thought to degrade lignin with the enzymes encoded by 16 isogenes of peroxidases (LO2), one CDH gene (LO3) and four genes of the multicopper oxidase superfamily. In contrast, P. placenta contains two laccase-encoding genes (LO1) but lacks members of the LO2 and LO3 families. As S. commune and P. placenta lack true LO2 FOLymes, one would expect a low number of LDAs that are responsible for H2O2 production for the peroxidases. This is not the case. S. commune contains more LDAs than P. chrysosporium. For instance, S. commune contains four glucose oxidase (LDA6) genes, whereas fungi seldom express more than one of these. In the absence of peroxidases of the LO2 family, it is expected that the glucose oxidases of S. commune serve another function. Glucose oxidases convert glucose into gluconic acid. This acid solubilizes inorganic phosphate and thus aids in the uptake of the nutrient23.
The matA and matB mating-type loci of S. commune regulate the formation of a fertile dikaryon after the fusion of monokaryons that encounter one other. The genome sequence of this species now reveals that the mating type loci of S. commune contain the highest number of reported genes within such loci in the fungal kingdom. The matB locus comprises two linked loci, Bα and Bβ, which both encode pheromones and pheromone receptors1. Nine allelic specificities have been identified for both loci, resulting in 81 different mating types for matB. It was previously reported that the Bα3 and Bβ2 loci of H4-8 contain three and eight pheromone genes, respectively, and each contain one pheromone receptor gene12,13. We identified five additional pheromone genes and four additional pheromone receptor–like genes in the genome of H4-8. These newly identified receptor-like genes are present in a matB deletion strain, which has no pheromone response with any mate (T.J.F., unpublished data). This raises the question of whether the four receptor genes function in matB-regulated development. Expression of these genes, as discerned using MPSS, suggests that they do not represent pseudogenes.
The matA locus consists of two subloci, Aα and Aβ, of which 9 and 32 allelic specificities, respectively, are expected to occur in nature1. These loci are separated by 550 kb on chromosome I of strain H4-8. Such a large distance has not been found in other fungi that have a tetrapolar mating system. The functionally well-characterized Aα locus showed no substantial differences from the published descriptions1. It is composed of two genes encoding Y and Z homeodomain proteins of the HD2 and HD1 classes, respectively. The Y and Z proteins, as in other basidiomycetes, interact in non-self combinations to activate the A-pathway of sexual development1,24. Notably, a nuclear localization signal is present in Y but not in Z. This is consistent with non-self interaction of the two proteins taking place in the cytosol, followed by the translocation of the active protein complex into the nucleus1.
The Aβ locus of S. commune has been studied much less than the Aα locus. Notably, Aβ reflects the highest degree of homeodomain-gene complexity for any fungal mating-type locus described to date. It contains four homeodomain genes of the HD1 class and two of the HD2 class. The Aβ locus of S. commune thus resembles that of C. cinerea, which consists of two pairs of functional HD1 and HD2 homeodomain genes (b and d)25. The large number of genes in matAβ would explain why recombination analyses predict as many as 32 mating specificities for this locus26. Overall, S. commune seems ideal for identifying the evolutionary pathways that have created high numbers of allelic specificities for enhancing outbreeding versus inbreeding rates.
As little is known about molecular processes that control formation of fruiting bodies in basidiomycetes, other than the role of the mating-type loci8, we compared genome-wide expression profiles at four developmental stages. MPSS showed that relatively few genes were specifically expressed in the monokaryon (284 genes) and in stage I aggregates and the mature mushrooms (128 genes in both cases). Notably, 467 genes were specifically expressed in stage II primordia. This suggests that this stage represents a major developmental switch, an idea supported by the fact that genes involved in signal transduction and regulation of gene expression are enriched in the group of upregulated genes during formation of stage II primordia. A positive correlation of expression of these gene groups during mushroom formation in both S. commune and L. bicolor suggests that regulation of mushroom formation is a conserved process in the Agaricales.
Our analysis of gene expression in S. commune reveals a high frequency of antisense expression. About 20% of all sequenced mRNA tags originated from an antisense transcript, and >5,600 of the predicted genes showed antisense expression in one or more developmental stages. Antisense transcription was most pronounced in stage II primordia. At this stage, >4,300 genes were expressed in both the sense and antisense directions, and >800 genes were expressed in the antisense direction only. Previously, MPSS has revealed antisense transcripts in Magnaporthe grisea27. Little is known about the function of these transcripts in fungi. The circadian clock of N. crassa is entrained in part by the action of an antisense transcript derived from a locus encoding a component of the circadian clock28, possibly through RNA interference. It is tempting to speculate that antisense transcripts also regulate mRNA levels in S. commune. Natural antisense transcripts in eukaryotes have also been implicated in other processes, such as translational regulation, alternative splicing and RNA editing29. The antisense transcripts of S. commune may likewise have such functions. In all these cases, the antisense transcripts could function in a developmental switch that occurs when stage II primordia are formed.
The apparently high conservation of gene regulation in the Agaricales led us to study the 471 genes predicted to encode transcriptional regulators. Of these, 268 were expressed in the monokaryon, whereas 200, 283 and 253 were expressed during formation of stage I aggregates, stage II primordia and mushrooms, respectively. The relatively high number of transcription factors expressed during formation of stage II primordia again points to a major switch that probably occurs during this developmental stage.
We identified a group of monokaryon-specific transcription factors and a group of transcription factors that are upregulated in stage II primordia or mature mushrooms, or in both. The fst3 and fst4 genes encode transcriptional regulators belonging to the latter group. Growth and development were not affected in monokaryotic strains in which fst3 or fst4 were inactivated. Phenotypic differences were, however, observed in the dikaryon. The Δfst4 Δfst4 dikaryon did not fruit but produced more aerial hyphae than the wild type. In contrast, the Δfst3 Δfst3 dikaryon formed more, albeit smaller, fruiting bodies than the wild type. This suggests that Fst4 is involved in the switch between the vegetative and the reproductive phase, and that Fst3 inhibits formation of clusters of mushrooms. Inhibition of such clusters could be important in a natural environment to ensure that sufficient energy is available for full development of fruiting bodies. As fst3 and fst4 have homologs in other mushroom-forming fungi, it is tempting to speculate that they have similar functions in these organisms. This is supported by the observation that the homologs of fst3 and fst4 are upregulated in young fruiting bodies of L. bicolor compared to free-living mycelium10. In mature fruiting bodies of L. bicolor, the expression level of the homolog of fst3 remains constant compared to young fruiting bodies, whereas the fst4 homolog returns to the level expressed in the free-living mycelium.
In conclusion, the genomic sequence of S. commune will be an essential tool to unravel mechanisms by which mushroom-forming fungi degrade their natural substrates and form fruiting bodies. The large variety of genes that encode extracellular enzymes that act on polysaccharides probably explains why S. commune is so common in nature. Moreover, the genome sequence suggests that S. commune may have a unique mechanism to degrade lignin. Our MPSS data has provided leads on how mushroom formation is regulated, highlighting both the roles of certain transcription factors and the possible involvement of antisense transcription. Better understanding of the physiology and sexual reproduction of S. commune will probably have an impact on the commercial production of edible mushrooms and the use of mushrooms as cell factories.
Strains and culture conditions.
S. commune was routinely grown at 25 °C on minimal medium (MM) with 1% (wt/vol) glucose and with or without 1.5% (wt/vol) agar30. Liquid cultures were shaken at 225 r.p.m. Glucose was replaced with 4% (wt/vol) glycerol for cultures used in the isolation of genomic DNA. All S. commune strains used were isogenic to strain 1-40 (ref. 31). Strain H4-8 (matA43 matB41; FGSC no. 9210) was used for sequencing. EST libraries were generated from H4-8 and from a dikaryon that resulted from a cross between H4-8 and strain H4-8b (matA4 matB43)32. Strains 4-39 (matA41 matB41; CBS 341.81) and 4-40 (matA43 matB43; CBS 340.81) were used for MPSS. These strains show a more synchronized fruiting compared to a cross between H4-8 and H4-8b. Partial sequencing of the haploid genome revealed that strains 4-40 and 4-39 have minor sequence differences (<0.2%) with strain H4-8 (data not shown).
Isolation of genomic DNA, genome sequencing and assembly.
Genomic DNA of S. commune was isolated as described30 and sequenced using a whole-genome shotgun strategy. All data were generated by paired-end sequencing of cloned inserts with six different insert sizes using Sanger technology on ABI3730xl sequencers. The data were assembled using the whole-genome shotgun assembler Arachne (http://www.broad.mit.edu/wga/).
EST library construction and sequencing.
Cultures were inoculated on MM plates with 1% (wt/vol) glucose using mycelial plugs as an inoculum. Strain H4-8 was grown for 4 d in the light, whereas the dikaryon H4-8 × H4-8.3 was grown for 4 d in the dark and 8 d in the light. Mycelia of the dikaryotic stages were combined and RNA was isolated as described30. The poly(A)+ RNA fraction was obtained using the Absolutely mRNA Purification kit and manufacturer's instructions (Stratagene). cDNA synthesis and cloning followed the SuperScript plasmid system procedure with Gateway technology for cDNA synthesis and cloning (Invitrogen). For the monokaryon, two size ranges of cDNA were cut out of the gel to generate two cDNA libraries (JGI library codes CBXY for the range 0.6 kb–2 kb and CBXX for the range >2 kb). For the dikaryon, cDNA was used in the range >2 kb, resulting in library CBXZ. The cDNA inserts were directionally ligated into vector pCMVsport6 (Invitrogen) and introduced into ElectroMAX T1 DH10B cells (Invitrogen). Plasmid DNA for sequencing was produced by rolling-circle amplification (Templiphi, GE Healthcare). Subclone inserts were sequenced from both ends using Big Dye terminator chemistry and ABI 3730 instruments (Applied Biosystems).
Gene models in the genome of S. commune were predicted using Fgenesh33, Fgenesh+33, Genewise34 and Augustus35. Fgenesh was trained for S. commune with a sensitivity of 72% and a specificity of 74%. Augustus ab initio gene predictions were generated with parameters based on C. cinerea gene models20. In addition, about 31,000 S. commune ESTs were clustered into nearly 9,000 groups. These groups were either directly mapped to the genomic sequence with a threshold of 80% coverage and 95% identity, included as putative full-length genes, or used to extend predicted gene models into full-length genes by adding 5′ and/or 3′ UTRs. Because multiple gene models were generated for each locus, a single representative model at each locus was computationally selected on the basis of EST support and similarity to protein sequences in the NCBI nonredundant database. This resulted in a final set of 13,210 predicted genes, of which 1,314 genes have been manually curated. In 66 cases, models were created or coordinates were changed.
All predicted gene models were functionally annotated by homology to annotated genes from the NCBI nonredundant set and classified according to Gene Ontology36, eukaryotic orthologous groups (KOGs)37, KEGG metabolic pathways38 and Protein Family (PFAM) domains39.
RepeatModeler 1.0.3 (http://www.repeatmasker.org/RepeatModeler.html) was used to generate de novo repeat sequence predictions for S. commune. Repeats were classified by comparison to the RepBase database (http://www.girinst.org/repbase/index.html). RepeatModeler produced 76 families of repeats used as a search library in RepeatMasker (http://www.repeatmasker.org/).
Orthologs of S. commune proteins in the fungal kingdom.
Proteins of S. commune were assigned to orthologous groups with OrthoMCL version 2.0 (ref. 40) with an inflation value of 1.5. Members of such groups were assigned as orthologs (in the case of proteins from another species) or inparalogs (in the case of proteins from S. commune). Orthologs were determined in C. cinerea20, L. bicolor10, P. placenta18, P. chrysosporium17, C. neoformans19, U. maydis41, S. cerevisiae42, A. nidulans43 and N. crassa44. All-versus-all BLASTP analysis was performed using NCBI standalone BLAST version 2.2.20, with an E value of 10−5 as a cutoff. Custom scripts were used to further analyze the orthologous groups resulting from the OrthoMCL analysis. The evolutionary conservation for each orthologous group was expressed as the taxon this orthologous group was most specifically confined to (see Supplementary Fig. 1).
FuncAssociate 2.0 (ref. 45) was used to study over- and under-representation of taxon-specific genes and of functional-annotation terms in sets of differentially regulated genes. Default settings were used, with a P value of 0.05 or 0.01 as the cutoff.
The PFAM database version 24.0 (ref. 39) was used to identify PFAM protein families. Custom scripts in Python were written to group genes on basis of their PFAM domains. Differences in the number of predicted proteins belonging to a PFAM family across the fungal domains were determined using Student's t-test. When Agaricales were compared to the rest of the Dikarya, or when S. commune was compared to the Agaricales, only groups with a minimum of five members in at least one of the fungi were analyzed. When S. commune was compared to the rest of the Dikarya, only groups with a minimum of five members in at least four of the fungi were analyzed. In all cases, a P value of 0.05 was used as a cutoff. Similar results were obtained using the nonparametric Mann-Whitney U-test.
Annotation of carbohydrate-related enzymes was performed using the CAZy annotation pipeline46. Ambiguous family attributions were processed manually along with all identified models that presented defects (such as deletions, insertions or splicing problems). Each protein was also compared to a library of experimentally characterized proteins found in CAZy to provide a functional description.
Lignin oxidative enzymes (FOLymes)14 were identified by BLASTP analysis of the S. commune gene models against a library of FOLy modules using an e value <0.1. The resulting 68 protein models were analyzed manually using the BLASTP results as well as multiple-sequence alignments and functional inference based on phylogeny47. Basically, a protein was identified as a FOLyme when it showed a similarity score above 50% with sequences of biochemically characterized enzymes. When the similarity score was <50% the proteins were scored as a FOLyme-related protein.
MPSS expression analysis.
Total RNA was isolated from the monokaryotic strain 4-40 and from the dikaryon resulting from a cross between 4-40 and 4-39. A 7-day-old colony grown on solid MM at 30 °C in the dark was homogenized in 200 ml MM using a Waring blender for 1 min at low speed. Two milliliters of the homogenized mycelium was spread out over a polycarbonate membrane placed on top of solidified MM. Vegetative monokaryotic mycelium was grown for 4 d in the light. The dikaryon was grown for 2 and 4 d in the light to isolate mycelium with stage I aggregates and stage II primordia, respectively. Mature mushrooms 3 d old were picked from dikaryotic cultures that had grown for 8 d in the light. RNA was isolated as described30. MPSS was performed essentially as described48 except that after DpnII digestion MmeI was used to generate 20-bp tags. Tags were sequenced using the Clonal Single Molecule Array technique (Illumina). Between 4.2 and 7.6 million tags of 20 bp were obtained for each of the stages. Programs were developed in the programming language Python to analyze the data. Tag counts were normalized to tags per million (TPM). Those with a maximum of <4 TPM in all developmental stages were removed from the data set. This data set consisted of a total of 40,791 unique tags. Of these tags, 61.7% and 58.6% could be mapped to the genome sequence and the predicted transcripts, respectively, using a perfect match as the criterion. The mapped tags accounted for 71.4% and 70.8% of the total number of tags, respectively. For comparison, 97.4% of the ESTs from S. commune strain H4-8 could be mapped to the assembly. Unmapped tags can be explained by sequencing errors in either tag or genomic DNA. Moreover, RNA editing may have altered the transcript sequencing to produce tags that do not match the genome perfectly. It may also be that the assigned untranslated region is incomplete or that the DpnII restriction site that defines the 5′ end of the tag is too close to the poly(A) tail of the mRNA. TPM values of tags originating from the same transcript were summed to assess their expression levels. A transcript is defined as the predicted coding sequence extended with 400-bp flanking regions at both sides.
Comparison of gene expression in L. bicolor and S. commune.
Whole-genome expression analysis of L. bicolor10 and S. commune was done essentially as described49. For L. bicolor, the microarray values from replicates were averaged. Expression values of genes were increased by 1, and the ratio between monokaryon and mushrooms (for S. commune), and between free-living mycelium and mature fruiting bodies (for L. bicolor), was log-transformed. All expressed genes from S. commune that had at least one expressed ortholog in L. bicolor were taken into account, resulting in a total of 6,751 orthologous pairs. These pairs were classified on the basis of functional-annotation terms. Correlation of changes in expression of these gene classes was expressed as the Pearson correlation coefficient. Only gene ontology terms with 10–200 pairs were used in the analysis. In the case of PFAM domains, a minimum of ten ortholog pairs were used.
Deletion of transcription factors fst3 and fst4.
The transcription factor genes fst3 (NCBI Protein ID: 257422) and fst4 (NCBI Protein ID: 66861) were deleted using the vector pDelcas32. Transformation of S. commune strain H4-8 was done as described30. Regeneration medium contained no antibiotic, whereas selection plates contained 20 μg ml−1 nourseothricin. Deletion of the target gene was confirmed by PCR. Compatible monokaryons with a gene deletion were selected from spores originating from a cross of the mutant strains with wild-type strain H4-8.3.
Data availability and accession codes.
S. commune assemblies, annotations and analyses are available through the interactive JGI Genome Portal at http://jgi.doe.gov/Scommune. Genome assemblies, together with predicted gene models and annotations, were also deposited at DDBJ/EMBL/GenBank under the project accession number ADMJ00000000. MPSS data have been deposited in NCBI's Gene Expression Omnibus with accession number GSE21265.
Gene Expression Omnibus
This work was performed under the auspices of the US Department of Energy's Office of Science, Biological and Environmental Research Program and the University of California, Lawrence Berkeley National Laboratory under contract no. DE-AC02-05CH11231, Lawrence Livermore National Laboratory under contract no. DE-AC52-07NA27344, and Los Alamos National Laboratory under contract no. DE-AC02-06NA25396. The work was also supported by the Dutch Technology Foundation STW, the Applied Science division of the Netherlands Organization for Scientific Research and the Technology Program of the Dutch Ministry of Economic Affairs.
Inparalogs and orthologs of the predicted genes of S. commune
Expression levels of the predicted genes of S. commune in the sense and antisense direction during four developmental stages
Over- and under-representation of groups with orthologs in a specific taxon, GO-terms, KOG-terms, KEGG-terms and PFAM-terms in differentially regulated genes
Functional annotation terms of which the regulation of the orthologous pairs showed a positive correlation between S. commune and L. bicolor
Predicted transcription factors of S. commune and their expression as analyzed by MPSS
Predicted CAZy enzymes of S. commune and their expression
Annotation of genes in a cluster of 366 genes that are highly expressed and differentially regulated during development in S. commune
This article is distributed under the terms of the Creative Commons Attribution-Noncommercial-Share Alike license (http://creativecommons.org/licenses/by-nc-sa/3.0/), which permits distribution, and reproduction in any medium, provided the original author and source are credited. This license does not permit commercial exploitation, and derivative works must be licensed under the same or similar license.