In silico definition of new ligninolytic peroxidase sub-classes in fungi and putative relation to fungal life style

Ligninolytic peroxidases are microbial enzymes involved in depolymerisation of lignin, a plant cell wall polymer found in land plants. Among fungi, only Dikarya were found to degrade lignin. The increase of available fungal genomes allows performing an expert annotation of lignin-degrading peroxidase encoding sequences with a particular focus on Class II peroxidases (CII Prx). In addition to the previously described LiP, MnP and VP classes, based on sequence similarity, six new sub-classes have been defined: three found in plant pathogen ascomycetes and three in basidiomycetes. The presence of CII Prxs could be related to fungal life style. Typically, necrotrophic or hemibiotrophic fungi, either ascomycetes or basidiomycetes, possess CII Prxs while symbiotic, endophytic or biotrophic fungi do not. CII Prxs from ascomycetes are rarely subjected to duplications unlike those from basidiomycetes, which can form large recent duplicated families. Even if these CII Prxs classes form two well distinct clusters with divergent gene structures (intron numbers and positions), they share the same key catalytic residues suggesting that they evolved independently from similar ancestral sequences with few or no introns. The lack of CII Prxs encoding sequences in early diverging fungi, together with the absence of duplicated class I peroxidase (CcP) in fungi containing CII Prxs, suggests the potential emergence of an ancestral CII Prx sequence from the duplicated CcP after the separation between ascomycetes and basidiomycetes. As some ascomycetes and basidiomycetes did not possess CII Prx, late gene loss could have occurred.

Intron size changes were visualized through the graphical representation provided by GECA.
Conserved common introns analysis. Gene structure and common introns (or cintrons) were analyzed from all fungi sequences. First, the protein alignment generated with MAFFT 41 was completed with the identification of common introns in the corresponding genes with CIWOG. Cintrons were extracted from the CIWOG database and only those present in one or more sub-classes with a conservation rate higher than 50% were considered as conserved. Finally, the sequences were placed in order of appearance in the phylogenetic tree and the conserved cintrons were highlighted for each sequence.
Duplication analysis. In order to test whether the presence of transposable elements can explain a high duplication rate, RepeatMasker 42 version 4.0.3 (with fungi specified as "species") was run on all analyzed Basidiomycete genomes. No correlation can be made between the number of paralogs in an organism and the number of repeated sequences. Deeper analysis of repeated sequences positions was conducted for Tramete versicolor and Galerina marginata genomes (which possess the highest number of CII Prxs): neither transposable elements nor other repeated sequences were systematically detected nearby to a gene copy.
New PROSITE profiles design and WebLogo. Using a global phylogenetic analysis, different protein clusters have been defined to update the existing PROSITE profiles 33 and to design new specific profiles using the silenced residues. These profiles were built from full length alignments of each protein cluster. First, all the sequences from the different protein clusters were aligned with MAFFT. The sequence alignment was split into several sub-alignments according to the cluster definitions. Each cluster alignment contains an annotation line where residues conserved in the whole family are tagged. This annotation line is used to downweight family-conserved columns during the profile construction; therefore only cluster specific residues are taken into consideration. The reliability of each cluster is supported by both the analysis of the gene structures and the presence/absence of the key residues specific to the well described LiP, MnP and VP families. Furthermore, graphical sequence logos were created for each group with Weblogo3 43 and aligned manually with the others in order to identify the amino acids conserved between the sub-classes.

Results and Discussion
Definition of new sub-classes of ligninases. A high quality of annotation is mandatory to perform a global analysis of multigene families evolution such as those of the CII Prxs 44 . A set of 150 genomes from ascomycetes, basidiomycetes and early diverging fungi (Table 1) has been carefully annotated for CII Prx encoding sequences and used for phylogeny, clustering analysis and profile design. No ligninase-like sequence has been detected in any early diverging fungi analyzed. The CII Prx numbers and gene structures are highly variable. Between 1 and 15 isoforms can be detected per species and may contain up to 15 introns in a single sequence, with short exons and introns (e.g. 6 nt for the last exon). Characteristic residues necessary for haem binding and electron transport were looked for in all CII Prxs analyzed (Fig. 1). Well conserved clusters have been identified by phylogenetic analysis and the corresponding sequences were used to update the existing profiles and to construct new profiles and sub-profiles. When available, the gene structure (introns/exons) was also used to support the tree topology. This procedure, combining phylogenetic analysis, presence of key residues, gene structure analysis and construction of HMM profiles, allowed identifying mis-classifications (false positives and false negatives) and reassigning them to their appropriate sub-classes.
Sequences detected in ascomycetes form a group distinct from the well-described basidiomycete ligninases ( Supplementary Fig. S1). The clear distribution of ascomycete ligninases in three clusters (Fig. 2) helps defining and designing three sub-classes of ascomycete ligninases, thereafter referred as ascomycete sub-class A (CIIAA), sub-class B (CIIAB) and sub-class C (CIIAC), and their corresponding profiles (Fig. 1a). By comparing the conserved residues found in basidiomycetes and ascomycetes ( Fig. 1), we can see that (i) many residues dispersed throughout the sequences are conserved (~20 aa), between these two very distant phyla; (ii) the 3 sub-classes defined in ascomycetes do not share all of the defined residues with catalytic properties found in basidiomycetes, and miss most of the cysteine residues responsible for protein stability; (iii) the ascomycete CIIAA sub-class has the more divergent sequence profile, which is consistent with its phylogenetic position (Fig. 2). Gene structure analysis doesn't reveal conserved common introns between the three sub-classes. Extensive changes in the exon-intron structure (intron gain and loss) have already been described for members of the Fusarium clade and appear to be the normality in ascomycetes 45 .
Members of these 3 new sub-classes are only detected in Pezizomycotina and are absent from the other ascomycete sub-phyla. Moreover, within the Pezizomycotina sub-phylum, ligninase encoding sequences are not detected in all species. They are mainly found in species known to interact with plants: pathogenic ascomycetes, either necrotrophic or hemibiotrophic, possess up to 7 sequences, whereas few or no sequence were detected in saprotrophic species (Table 1).
The situation for basidiomycetes is much more complex. The exhaustive mining of 68 basidiomycete genomes has demonstrated the need to redefine the existing profiles and highlighted that some sequences do not belong to the four previously identified groups (LiP, MnP, VP, GP). Three new basidiomycete ligninase sub-classes have been defined, basidiomycete sub-class A (CIIBA), sub-class B (CIIBB) and sub-class C (CIIBC) (Fig. 3). The definition of these 3 new sub-classes also led to re-affect some sequences previously known as MnP or VP. Notably, most sequences of our CII BB class were previously attributed to the MnP-class (mostly short-MnPs), but the phylogenetic analysis suggests a more restricted definition of the MnP class. The analysis of our phylogenetic tree together with the conservation of the catalytic tryptophan (conserved in LiP and VP) and Mn 2+ oxidation sites (in MnP and VP) (Fig. 3) clearly show that the sub-classes cannot be resumed to the presence/absence of key residues. It is noteworthy that sequences possessing all the pointed residues and thus susceptible to be VP www.nature.com/scientificreports www.nature.com/scientificreports/ sequences are scattered in the branches of the CIIBB and CIIBA sub-classes. Interestingly such sequences are also present at the basis of the LiP clade, with two sequences from C. subvermispora, described as phylogenetically and catalytically intermediate between classical LiPs and VPs 8 . Few specific residue conservations can be detected in each sub-class (Fig. 1b), but the CIIBC sub-class is apparently the more divergent one.
Phylogenetic and profile analysis were mostly supported by gene structure (intron number and position) conservation in basidiomycetes. Out of 57 common introns (cintrons) detected with CIWOG, 21 were considered conserved since they were present in one or several sub-classes with a conservation rate higher than 50% (Fig. 4). www.nature.com/scientificreports www.nature.com/scientificreports/ totally specific and two others very marginal elsewhere. Three introns are mostly common to CIIBB and LiP, which confirms the phylogenetic hypothesis that LiP sequences originate from CIIBB. As to the VP sequences, they are rather intron-rich, with some sequences resembling the CIIBB introns' pattern, and some others the CIIBA one.
In basidiomycetes, extensive recent gene duplications in LiP, MnP and the three other new sub-classes were identified. The 24 CII Prx sequences of Trametes versicolor distributed among LiP, VP, CIIBA and CIIBB are mainly clustered in 3 genomic regions and are the result of tandem (TD), segmental (SD) and whole genome duplications (WGD) (Fig. 5). These events have been defined as following: TD as successive duplicated genes, SD as blocks of DNA that map to different loci in the same chromosome and WGD as blocks of DNA that map different chromosomes. However, all these duplications are very recent since they form well-supported clusters specific for each species (Fig. 3). It begs the question of duplication events widespread among basidiomycetes but it appears that other peroxidase families such as Cytochrome C peroxidases (CcP) or glutathione peroxidases were less or not subjected to duplication. Besides, no correlation can be made with the distribution of transposable elements. This suggests that these duplications are probably an evolutionary response to selection pressure.
Working hypothesis of ligninase evolution. Ligninases are detected in ascomycetes and basidiomycetes but are absent from the early diverging fungi analyzed (Choanephora cucurbitarum, Mucor circinelloides, Phycomyces blakesleeanus, Rhizopus oryzae). These two phyla are monophyletic sister groups belonging to Dikarya which emerged from a common ancestral organism. Key residues described in basidiomycete MnP, LiP, and VP are also detected in CII Prx ascomycete sequences (Fig. 1). This suggests that CII Prxs detected in ascomycetes and basidiomycetes could have evolved independently from a similar ancestral sequence following convergent evolution. Notably, CII Prxs belong to the same superfamily as the class I Cytochrome C peroxidases (CcP), and share common key residues 23 . The taxonomic distribution of the CI Prxs was clarified in order to enable a better understanding of the overall CII Prxs evolution 46 . CI Prxs are found in plants, fungi, and prokaryotes. Unlike CII and CIII Prxs, they are not glycosylated and do not have signal peptides, calcium ions, or disulfide bridges. They contain five main groups of proteins: (i) Catalase peroxidases (CP) present in prokaryotes and in some eukaryotes following a gene transfer, (ii) Cytochrome c peroxidases (CcP) found in mitochondria containing organisms but not detectable in Viridiplantae, (iii) Ascorbate peroxidases (APx) found only in chloroplastic organisms, and (iv)(v) two hybrid-type peroxidases detected in fungi and different kingdoms. A previous phylogenetic study suggested evolutionary relation between CI Prxs and CII Prxs 47 .
When comparing ascomycetes and basidiomycetes for the presence of CcP and ligninase encoding sequences (Table 1 and Supplementary Table S1), we can observe that: (i) in most cases, two CcP sequences are detected in fungi which do not possess CII Prx sequences and (ii) only one CcP sequence is detected when the genome contains at least one CII Prx sequence. This suggests that CII Prxs would have emerged from an ancestral sequence that could be a CcP sequence. A similar theory of evolution has already been described for the CIII Prxs 46 . Indeed, CIII Prxs are only detected in plants, which lack CcP. CIII Prxs and CII Prxs are both subjected to numerous species specific duplications (tandem and segmental duplications), contain highly conserved cysteine residues necessary for disulfite bridges and stability of secreted proteins. As an alternative hypothesis, all fungi would possess at least one CII Prx sequence and loss events occurred more recently. On the principle of maximum parsimony (Supplementary Table S1), this hypothesis seems unlikely since it would require many independent events of gene loss.
In addition, intron positions and numbers are not conserved between the CII Prxs found in ascomycetes and in basidiomycetes. Basidiomycetes contain more introns than ascomycetes (on average 8 and 2 respectively), but intron sizes are higher, on average, in ascomycetes (74 nt) than in basidiomycetes (54 nt). Similar phenomena regarding intron size and number are also observed with other families such as glutathione peroxidases (1 or 2 introns for ascomycetes and 3 to 6 for basidiomycetes). Altogether, these results could suggest independente evolution from two ancestral sequences containing no or few introns. This is in accordance with the hypothesis of a convergent evolution from an existing CI Prx sequence for these two lineages, after the ascomycetes/basidiomycetes separation.
Peroxidase family expansion and recent gene loss processes are likely to be both involved in the history of these genes in fungi. This recent evolutionary history seems particularly driven by fungal life style and leads to numerous adaptive convergence to environment, particularly to host immunity. Typically, necrotrophic or hemibiotrophic fungi, either ascomycetes or basidiomycetes, possess CII Prxs while symbiotic, endophytic or biotrophic fungi mostly do not. WR basidiomycetes and necrotrophic pezizomycetes (ascomycetes) present the highest levels of CII Prxs (Table 1).
Even if numerous key residues are well conserved between the different CII Prx classes, the residues described as necessary for electron transfer are missing in Ascomycetes. Furthermore, the position of the conserved cysteines varies between CII Prxs of ascomycetes and basidiomycetes. These divergences support the independent emergence of CII Prxs in ascomycetes and basidiomycetes and lead to suggest divergent functions (or different catalytic mechanisms). Lignin-degrading fungi possess large batteries of ligninase encoding sequences which enable them to oxidize the lignin polymer and then to use it as a source of carbon. Saprophytic ascomycetes, as well as BR basidiomycetes, described as not being lignivor, present none or low number of ligninase encoding sequences. They probably just use them to depolymerize the lignin by oxidation in order to increase the accessibility to other cell wall components such as cellulose and hemicellulose. Finally, plant pathogenic ascomycetes (either necrotrophic or hemibiotrophic), which contain up to 7 ligninases encoding sequences, use these proteins to depolymerize the lignin in order to access and to infect the host cell. This conclusion is in agreement with the fact that plant pathogen fungi also possess more CAZymes 48 . A more detailed analysis of the distribution into classes in basidiomycete white rot fungi reveals that most species possess only one enzyme among the 3 Figure 1. Weblogo of different CII Prxs from ascomycetes (a) and basidiomycetes (b). basidiomycete and ascomycete CII Prxs were aligned with MAFFT, and then separated into sub-classes. Weblogos were created for each group with Weblogo3 and aligned manually with the others in order to easily identify the conserved AA between the sub-classes highlighted in yellow and those that are specific to one or several sub-classes are highlighted in others colors. AA, AB and AC stand respectively for ascomycete CII Prxs sub-class A, B, and C; BA, BB and BC stand respectively for basidiomycete CII Prxs sub-class A,, B and C,, VP: Versatile peroxidases, MnP: Manganese peroxidases, LiP: Lignin peroxidases. Green highlight: eight cysteines forming four disulfide bridges; blue highlight: two active site histidines; orange highlight: three acidic residues forming the Mn 2+ oxidation site; red dot: nine ligands of two structural Ca 2+ ions; blue dot: one tryptophan responsible for aromatic substrate oxidation by LiP; dark gray: position specific to one class. *Conserved residues between basidiomycete and ascomycete CII Prx classes.
www.nature.com/scientificreports www.nature.com/scientificreports/ present highly divergent gene structures, and sequence conservation only for key residues scattered throughout the sequences. Altogether, this suggests that actual CII Prxs found in basidiomycetes and ascomycetes probably originate from at least two independent events after the separation between ascomycetes and basidiomycetes. This confirms the conclusion obtained for basidiomycetes with tree reconciliation 24 . In addition, the sequence similarity with the CI Prx, CcP sequence, and the correlation between lack of the second CcP copy and presence of CII Prx suggest this ancestral sequence could be a CcP. Four residues already described in MnP, LiP and VP as necessary for electron transfer are sometimes missing in the new closely related ascomycete and basidiomycete sub-classes. The discrepancy between catalytic activity based on few residues and the global sequence conservation is not antagonistic. Indeed, numerous other residues . Phylogenetic tree of basidiomycetes CII Prx sequences. 267 sequences coming from 28 basidiomycetes have been aligned to generate the tree. One Ascomycete sequence was used as outgroup. LiP sequences are represented in red, VP in orange, CIIBB in green, CIIBC in pink, MnP in blue and CIIBA in azure. Bootstrap values higher than 50% are indicated. Presence/absence of 3 conserved residues (E(35), E (39) and D(179)) responsible for Mn 2+ oxidation in MnP, as well as the catalytic W typical from LiP are displayed aside, respectively with green, azure, blue and red square. www.nature.com/scientificreports www.nature.com/scientificreports/ are highly conserved between all CII and within the different sub-classes, suggesting that new CII Prxs could be able to oxidized lignin but with a different electron transfer mechanism. In all cases, the catalytic activity of CIIAA, CIIAB, CIIAC, CIIBA, CIIBB and CIIBC proteins is not yet known and needs to be demonstrated.
In basidiomycetes, the presence of MnP, LiP and VP have been clearly associated with a specific wood material decaying activity thanks to their lignin degradation capacity. CIIBA, CIIBB and CIIBC are mainly detected in wood degrading fungi (white rot), alone or associated with the main CII Prxs (MnP, LiP or VP). But they can also be found alone in brown rot fungi, plant pathogens, litter decomposing fungi and fungi with no defined decaying machinery. On the other hand, CIIAA, CIIAB and CIIAC are found in rather low copy number mainly in plant pathogens ascomycetes. The specific sequence distribution among organisms suggests that these proteins could have two separated purposes: lignin degradation as carbon source and cell penetration.  www.nature.com/scientificreports www.nature.com/scientificreports/ Early diverging fungi such as Chytridiomycota species are capable to degrade cellulose and pectins which allow them to used wall polymers as a carbon source 51 . But the lack of CII Prx in all early diverging fungi tested, questions about the accessibility to these carbon source for fungi that do not have the tools to degrade lignin. The absence of ligninases in early divergent fungi has raised the hypothesis of the incidence of microbial on carbon burial at the end of Paleozoic (Floudas et al. 2012), in opposition to a geological hypothesis 52,53 .
The following questions to address are the respective roles of these different enzymes during wood decay. This would help to better understand the biology of these fungi, and the chemical mechanisms involved in the biological decomposition of wood.