Differential gene retention as an evolutionary mechanism to generate biodiversity and adaptation in yeasts

The evolutionary history of the characters underlying the adaptation of microorganisms to food and biotechnological uses is poorly understood. We undertook comparative genomics to investigate evolutionary relationships of the dairy yeast Geotrichum candidum within Saccharomycotina. Surprisingly, a remarkable proportion of genes showed discordant phylogenies, clustering with the filamentous fungus subphylum (Pezizomycotina), rather than the yeast subphylum (Saccharomycotina), of the Ascomycota. These genes appear not to be the result of Horizontal Gene Transfer (HGT), but to have been specifically retained by G. candidum after the filamentous fungi–yeasts split concomitant with the yeasts’ genome contraction. We refer to these genes as SRAGs (Specifically Retained Ancestral Genes), having been lost by all or nearly all other yeasts, and thus contributing to the phenotypic specificity of lineages. SRAG functions include lipases consistent with a role in cheese making and novel endoglucanases associated with degradation of plant material. Similar gene retention was observed in three other distantly related yeasts representative of this ecologically diverse subphylum. The phenomenon thus appears to be widespread in the Saccharomycotina and argues that, alongside neo-functionalization following gene duplication and HGT, specific gene retention must be recognized as an important mechanism for generation of biodiversity and adaptation in yeasts.


Molecular Function
Supplementary Figure S7b. Distribution of predicted molecular functions in the genome of G. candidum in Gene ontology categories.

Biological Process
Supplementary Figure S7d. Distribution of predicted functions in the genome of G. candidum in Gene ontology Biological process categories.

Fig S8p
Supplementary Figure S9. Species tree reconstruction from the G. candidum phylome. The tree species was reconstructed using the RAxML program as described in Materials and Methods. Proteins with a one-to-one orthology relationship to all the considered species were selected from the G. candidum phylome. A total of 302 protein alignments were concatenated into a multiple sequence alignment. The final alignment contained 170,787 amino acids. The bar indicates the scale of branch length.

Differential gene retention as an evolutionary mechanism to generate biodiversity in yeasts
The No hit SAD1 GECA05s02463g

Differential gene retention as an evolutionary mechanism to generate biodiversity in yeasts
Guillaume . Various inteins have been found in filamentous fungi 9 . This is the first to be found in a fungal gene encoding a translation factor.
The mitochondrial DNA sequence is 28,008 bp long and has a GC content of 28%, placing it midway between those of S. cerevisiae (20%) and D. hansenii (38%) (Supplementary Figure 1). It carries 14 protein-coding genes: COB, COXI, COXII, COXIII, ATP6, ATP8, ATP9, VAR1, plus six ubiquinone oxidoreductase complex I protein genes NADH1, NADH2, NDH3, NADH4, NADH5 and NADH6. A total of 23 tRNA genes and SSU and LSU rRNA genes were found (Supplementary Fig. 1). The ATP9 gene, the downstream tRNA-Phe gene and a tRNA-Arg gene are oriented counterclockwise. Interestingly, there is only one intron, (carrying an endonuclease, in the COB gene), in the mtDNA, rendering G. candidum the Saccharomycotina species with the lowest number of introns.
Candida phangngensis carries two introns splitting the COB gene 10 . Similarly, the branch point sequence is also less conserved than is the case in the other Saccharomycotina yeasts. The most common sequence in G. candidum was found to be NNCTAAC (72% of the total), followed by NNCTAAT (12%), NNTTAAC (7%) and NNCTGAC (5% . G. candidum has only one of these two cofactors; it displays low sequence similarity with the Saccharomycotina counterparts but it is well conserved compared to the filamentous fungal orthologs.

Mating type
The GcMATA coding sequence presented similarity with the previously-described HMG-box proteins in yeasts and is located between the APC5 and SLA2 orthologs in G. candidum (Supplementary Figure 6). Comparison of the structure of the mating type loci in several yeast and fungal species (Supplementary Figure   6) indicated that the G. candidum locus contains only two genes-GcMATA and GcMATB-and thus resembles some filamentous fungi such as Aspergillus species 15 , but not Saccharomycotina yeasts. A roughly 2 kb region separates the MATA gene from the gene upstream of MATA, but we did not identify a valid CDS in this region.
The Interestingly, the left flanking region of the G. candidum sexual locus is not conserved and has very likely been rearranged. Indeed, the neighboring gene of the G. candidum sexual locus is APC5, which is found near the sexual locus of some Pezizomycotina species but separated from this locus by two genes, COX13 and APN2. A search for G. candidum COX13 and APN2 indicated that they are localized on different scaffolds. Genomic rearrangements located at the border of the sexual locus have previously been seen in yeast species that are able to switch mating type; this switching was proposed as responsible for an erosion of this locus 16 .  18 . The endoglucanases of one of the GH families, GH45, randomly cleave glycosidic bonds on cellulose polymers, releasing cello-oligosaccharides as end-products 19 . One GH45 family, comprising four members (not found in Saccharomycotina except one gene belonging to one of these families in K. pastoris) was detected in G. candidum (Supplementary data 3).
Surprisingly, four members of family AA9 of lytic polysaccharide monooxygenases (LPMOs) were also identified. LPMOs participate in cellulose targeting but via oxidative mechanisms, contrary to the hydrolytic mechanism of the GH counterparts. The presence of four LPMOs is unexpected in G. candidum as AA9 members are only found in the wood-decaying fungi, prevalently in white-rot basidiomycete fungi. Comparison with other yeasts (Supplementary Data "CAZy annotation") showed that AA9 members were exclusively identified in G. candidum. Moreover, G. candidum is the only Saccharomycotina yeast to possess genes encoding enzymes containing domains of the carbohydrate-binding module family 1 (CBM1), which specifically binds crystalline cellulose 20 . CBM1 are primarily found in the fungal kingdom 21 and are usually restricted to the genome of wood-rot fungi. Previous works showed that carbohydrate binding modules enable increase of the enzyme concentration in the vicinity of the substrate 22 . Moreover, CBMs may also be involved in the destructuration of polysaccharides on the substrate fibrils 23 . Remarkably, G. candidum carries eight CBM1 members, and each CBM1 is linked to all the GH45 endoglucanases. Finally, G. candidum has two AA1_2-family ferroxidases like other yeasts but only G. candidum has one AA1 multicopper oxidase close to laccases. To our knowledge, G. candidum is the only yeast to retain this broad lignocellulolytic repertoire with representatives of typical filamentous fungi-associated families (AA1, AA9, CBM1).

HGT from Basidiomycota
Polyamines are involved in numerous processes and are essential for growth 24 . In S. cerevisiae, polyamine synthesis is initiated by two reactions: decarboxylation of L-ornithine by the SPE1 gene results in putrescine, and decarboxylation of S-adenosyl-L-methionine by the SPE2 gene results in S-adenosyl-methionamine.
Transfer of an aminopropyl group from S-adenosyl-methioninamine to putrescine by spermidine synthase (encoded by the SPE3 gene) results in spermidine. A second aminopropyl group is then incorporated into spermidine by spermine synthase (SPE4) to yield spermine. While SPE3 is essential for S. cerevisiae growth, SPE4 is not 25 . Most filamentous fungi do not contain spermine, and these organisms contain a spermidine synthase encoded by an ortholog of SPE3 as well as a second gene encoding a spermidine synthase, phylogenetically unrelated to either SPE3 or SPE4.
G. candidum possesses a gene, GECA15s02364g, which shows a high degree of conservation with SPE3 and SPE4. Interestingly, G. candidum also carries another gene, GECA13s02485g, that is similar to the second spermidine synthase of filamentous fungi, and indeed groups with the Basidiomycota sequences in phylogenetic analysis (Figure 4). This indicates that G. candidum has a very unusual complement of spermidine synthases, an SPE3-like spermidine synthase and a second spermidine synthase, derived from that of the filamentous fungi. This could imply that the SPE4 gene, very likely derived from a duplication of the SPE3 gene 25 , was lost in G. candidum and that a spermidine synthase has been acquired through HGT from a basidiomycete. Therefore, the polyamine synthase gene complement of G. candidum is consistent with an involvement of polyamines in hyphal and pseudo-hyphal growth by a mechanism similar to that acting in filamentous fungi 26 .